English
glove
lora
distillation
hard-negatives
qwen3-embedding
File size: 1,418 Bytes
f3415d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
language:
- en
license: apache-2.0
tags:
- glove
- lora
- distillation
- hard-negatives
- qwen3-embedding
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
datasets:
- jsanzolac/qwen3_emb_300_packed_cl100k
- jsanzolac/qwen3_emb_512_hard_negatives
---

# bpe_glove_300_lora_r300_qwen3_hardnegs

Continuation of `jsanzolac/bpe_glove_300_lora_r300_qwen3` — same `DriftingGloVeStudent` rank=300 over a frozen
300-d cl100k BPE-GloVe — trained for an additional 150,000 steps with **mined
hard negatives** from `jsanzolac/qwen3_emb_512_hard_negatives`.

**Loss:** `cross_entropy(v @ [v_T ‖ v_hards]^T / τ) + λ_MSE · MSE(v, v_T)`  with
`τ = 0.05`, `λ_MSE = 1.0`, `H = 64` mined hard negatives per anchor.

**Warm-start:** `A.weight` + `B.weight` from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`. Optimizer
state was not in the source checkpoint, so this run uses a fresh LR schedule
(5e-4 → 1e-5 cosine over 150,000 steps).

**Frozen:** `E` (300-d GloVe from `jsanzolac/drifting-glove-distilled-r300`), teacher (only used to produce the
cached `v_T` targets in `jsanzolac/qwen3_emb_300_packed_cl100k` — not loaded here).

Files under `rank_300/`:
- `checkpoint_final.pt``A.weight` + `B.weight` (E excluded; reinject from `jsanzolac/drifting-glove-distilled-r300`).
- `config.json`
- `vectors_drifted.txt` / `.parquet``E + B(A(·))` per vocab row.
- `train_log.jsonl`