--- language: - en license: apache-2.0 tags: - glove - lora - distillation - hard-negatives - projection-head base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3 datasets: - jsanzolac/qwen3_emb_300_packed_cl100k - jsanzolac/qwen3_emb_512_hard_negatives --- # bpe_glove_300_lora_r300_qwen3_proj_hardnegs r300 backbone (`A`, `B` warm-started from `jsanzolac/bpe_glove_300_lora_r300_qwen3`) + a **sentence-level projection head** (`Linear(300, 1200) → GELU → Dropout(0.1) → Linear(1200, 300) → L2-norm`). Both backbone (`A`, `B`) and the projection head are trainable. **Loss:** `InfoNCE(v_proj vs [v_T ‖ v_hards], τ=0.01)` with `H = 64` mined hard negatives per anchor at batch=256. **No MSE term.** **Schedule:** lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000. Optimizer: AdamW, weight decay 0.01. Files under `rank_300/`: - `checkpoint_final.pt` — `A.weight`, `B.weight`, plus the projection head's `proj.l1.{weight,bias}` and `proj.l2.{weight,bias}`. `E` is excluded (non-persistent buffer); re-inject from `jsanzolac/drifting-glove-distilled-r300` at load time. - `config.json` - `vectors_drifted_pre_proj.txt` — `E + B(A(·))` per vocab row (PRE-projection; the head is contextual). - `train_log.jsonl`