Upload README.md with huggingface_hub

db4ef85 verified 4 days ago

1.24 kB

language:
  - en
license: apache-2.0
tags:
  - glove
  - lora
  - distillation
  - hard-negatives
  - projection-head
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
datasets:
  - jsanzolac/qwen3_emb_300_packed_cl100k
  - jsanzolac/qwen3_emb_512_hard_negatives

bpe_glove_300_lora_r300_qwen3_proj_hardnegs

r300 backbone (A, B warm-started from jsanzolac/bpe_glove_300_lora_r300_qwen3) + a sentence-level projection head (Linear(300, 1200) → GELU → Dropout(0.1) → Linear(1200, 300) → L2-norm). Both backbone (A, B) and the projection head are trainable.

Loss: InfoNCE(v_proj vs [v_T ‖ v_hards], τ=0.01) with H = 64 mined hard negatives per anchor at batch=256. No MSE term.

Schedule: lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000. Optimizer: AdamW, weight decay 0.01.

Files under rank_300/:

checkpoint_final.pt — A.weight, B.weight, plus the projection head's proj.l1.{weight,bias} and proj.l2.{weight,bias}. E is excluded (non-persistent buffer); re-inject from jsanzolac/drifting-glove-distilled-r300 at load time.
config.json
vectors_drifted_pre_proj.txt — E + B(A(·)) per vocab row (PRE-projection; the head is contextual).
train_log.jsonl