English
glove
lora
distillation
hard-negatives
projection-head

bpe_glove_300_lora_r300_qwen3_proj_hardnegs

r300 backbone (A, B warm-started from jsanzolac/bpe_glove_300_lora_r300_qwen3) + a sentence-level projection head (Linear(300, 1200) β†’ GELU β†’ Dropout(0.1) β†’ Linear(1200, 300) β†’ L2-norm). Both backbone (A, B) and the projection head are trainable.

Loss: InfoNCE(v_proj vs [v_T β€– v_hards], Ο„=0.01) with H = 64 mined hard negatives per anchor at batch=256. No MSE term.

Schedule: lr 0.0005 β†’ 1e-05 cosine over 150,000 steps, warmup 1000. Optimizer: AdamW, weight decay 0.01.

Files under rank_300/:

  • checkpoint_final.pt β€” A.weight, B.weight, plus the projection head's proj.l1.{weight,bias} and proj.l2.{weight,bias}. E is excluded (non-persistent buffer); re-inject from jsanzolac/drifting-glove-distilled-r300 at load time.
  • config.json
  • vectors_drifted_pre_proj.txt β€” E + B(A(Β·)) per vocab row (PRE-projection; the head is contextual).
  • train_log.jsonl
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jsanzolac/bpe_glove_300_lora_r300_qwen3_proj_hardnegs

Datasets used to train jsanzolac/bpe_glove_300_lora_r300_qwen3_proj_hardnegs