bpe_glove_300_lora_r300_qwen3_hardnegs_nce_only

Continuation of jsanzolac/bpe_glove_300_lora_r300_qwen3 β€” same DriftingGloVeStudent rank=300 over a frozen 300-d cl100k BPE-GloVe β€” trained for an additional 150,000 steps with mined hard negatives from jsanzolac/qwen3_emb_512_hard_negatives.

Loss: cross_entropy(v @ [v_T β€– v_hards]^T / Ο„) β€” MSE term disabled in this variant. with Ο„ = 0.05, H = 64 mined hard negatives per anchor.

Warm-start: A.weight + B.weight from jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt. Optimizer state was not in the source checkpoint, so this run uses a fresh LR schedule (5e-4 β†’ 1e-5 cosine over 150,000 steps).

Frozen: E (300-d GloVe from jsanzolac/drifting-glove-distilled-r300), teacher (only used to produce the cached v_T targets in jsanzolac/qwen3_emb_300_packed_cl100k β€” not loaded here).

Files under rank_300/:

  • checkpoint_final.pt β€” A.weight + B.weight (E excluded; reinject from jsanzolac/drifting-glove-distilled-r300).
  • config.json
  • vectors_drifted.txt / .parquet β€” E + B(A(Β·)) per vocab row.
  • train_log.jsonl
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jsanzolac/bpe_glove_300_lora_r300_qwen3_hardnegs_nce_only

Datasets used to train jsanzolac/bpe_glove_300_lora_r300_qwen3_hardnegs_nce_only