jsanzolac
/

bpe_glove_300_lora_r300_qwen3_hardnegs_nce_only

+---
+language:
+- en
+license: apache-2.0
+tags:
+- glove
+- lora
+- distillation
+- hard-negatives
+- qwen3-embedding
+base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
+datasets:
+- jsanzolac/qwen3_emb_300_packed_cl100k
+- jsanzolac/qwen3_emb_512_hard_negatives
+---
+# bpe_glove_300_lora_r300_qwen3_hardnegs_nce_only
+Continuation of `jsanzolac/bpe_glove_300_lora_r300_qwen3` — same `DriftingGloVeStudent` rank=300 over a frozen
+300-d cl100k BPE-GloVe — trained for an additional 150,000 steps with **mined
+hard negatives** from `jsanzolac/qwen3_emb_512_hard_negatives`.
+**Loss:** `cross_entropy(v @ [v_T ‖ v_hards]^T / τ)`   *— MSE term disabled in this variant.*  with
+`τ = 0.05`, `H = 64` mined hard negatives per anchor.
+**Warm-start:** `A.weight` + `B.weight` from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`. Optimizer
+state was not in the source checkpoint, so this run uses a fresh LR schedule
+(5e-4 → 1e-5 cosine over 150,000 steps).
+**Frozen:** `E` (300-d GloVe from `jsanzolac/drifting-glove-distilled-r300`), teacher (only used to produce the
+cached `v_T` targets in `jsanzolac/qwen3_emb_300_packed_cl100k` — not loaded here).
+Files under `rank_300/`:
+- `checkpoint_final.pt` — `A.weight` + `B.weight` (E excluded; reinject from `jsanzolac/drifting-glove-distilled-r300`).
+- `config.json`
+- `vectors_drifted.txt` / `.parquet` — `E + B(A(·))` per vocab row.
+- `train_log.jsonl`