bpe_glove_300_lora_r300_qwen3_proj_hardnegs

r300 backbone (A, B warm-started from jsanzolac/bpe_glove_300_lora_r300_qwen3) + a sentence-level projection head (Linear(300, 1200) → GELU → Dropout(0.1) → Linear(1200, 300) → L2-norm). Both backbone (A, B) and the projection head are trainable.

Loss: InfoNCE(v_proj vs [v_T ‖ v_hards], τ=0.01) with H = 64 mined hard negatives per anchor at batch=256. No MSE term.

Schedule: lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000. Optimizer: AdamW, weight decay 0.01.

Files under rank_300/:

checkpoint_final.pt — A.weight, B.weight, plus the projection head's proj.l1.{weight,bias} and proj.l2.{weight,bias}. E is excluded (non-persistent buffer); re-inject from jsanzolac/drifting-glove-distilled-r300 at load time.
config.json
vectors_drifted_pre_proj.txt — E + B(A(·)) per vocab row (PRE-projection; the head is contextual).
train_log.jsonl

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsanzolac/bpe_glove_300_lora_r300_qwen3_proj_hardnegs

Base model

jsanzolac/drifting-glove-distilled-r300

Adapter

jsanzolac/bpe_glove_300_lora_r300_qwen3

Adapter

(6)

this model

jsanzolac
/

bpe_glove_300_lora_r300_qwen3_proj_hardnegs

bpe_glove_300_lora_r300_qwen3_proj_hardnegs

Model tree for jsanzolac/bpe_glove_300_lora_r300_qwen3_proj_hardnegs

Datasets used to train jsanzolac/bpe_glove_300_lora_r300_qwen3_proj_hardnegs