metadata
language:
- en
license: apache-2.0
tags:
- glove
- lora
- distillation
- hard-negatives
- projection-head
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
datasets:
- jsanzolac/qwen3_emb_300_packed_cl100k
- jsanzolac/qwen3_emb_512_hard_negatives
bpe_glove_300_lora_r300_qwen3_proj_hardnegs
r300 backbone (A, B warm-started from jsanzolac/bpe_glove_300_lora_r300_qwen3) + a sentence-level
projection head (Linear(300, 1200) → GELU → Dropout(0.1) → Linear(1200, 300) → L2-norm).
Both backbone (A, B) and the projection head are trainable.
Loss: InfoNCE(v_proj vs [v_T ‖ v_hards], τ=0.01) with H = 64 mined hard negatives
per anchor at batch=256. No MSE term.
Schedule: lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000. Optimizer: AdamW, weight decay 0.01.
Files under rank_300/:
checkpoint_final.pt—A.weight,B.weight, plus the projection head'sproj.l1.{weight,bias}andproj.l2.{weight,bias}.Eis excluded (non-persistent buffer); re-inject fromjsanzolac/drifting-glove-distilled-r300at load time.config.jsonvectors_drifted_pre_proj.txt—E + B(A(·))per vocab row (PRE-projection; the head is contextual).train_log.jsonl