jsanzolac/qwen3_emb_300_packed_cl100k
Viewer β’ Updated β’ 6M β’ 48
r300 backbone (A, B warm-started from jsanzolac/bpe_glove_300_lora_r300_qwen3) + a sentence-level
projection head (Linear(300, 1200) β GELU β Dropout(0.1) β Linear(1200, 300) β L2-norm).
Both backbone (A, B) and the projection head are trainable.
Loss: InfoNCE(v_proj vs [v_T β v_hards], Ο=0.01) with H = 64 mined hard negatives
per anchor at batch=256. No MSE term.
Schedule: lr 0.0005 β 1e-05 cosine over 150,000 steps, warmup 1000. Optimizer: AdamW, weight decay 0.01.
Files under rank_300/:
checkpoint_final.pt β A.weight, B.weight, plus the projection head's proj.l1.{weight,bias} and proj.l2.{weight,bias}.
E is excluded (non-persistent buffer); re-inject from jsanzolac/drifting-glove-distilled-r300 at load time.config.jsonvectors_drifted_pre_proj.txt β E + B(A(Β·)) per vocab row (PRE-projection; the head is contextual).train_log.jsonlBase model
jsanzolac/drifting-glove-distilled-r300