File size: 1,240 Bytes
5fd7095 db4ef85 5fd7095 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ---
language:
- en
license: apache-2.0
tags:
- glove
- lora
- distillation
- hard-negatives
- projection-head
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
datasets:
- jsanzolac/qwen3_emb_300_packed_cl100k
- jsanzolac/qwen3_emb_512_hard_negatives
---
# bpe_glove_300_lora_r300_qwen3_proj_hardnegs
r300 backbone (`A`, `B` warm-started from `jsanzolac/bpe_glove_300_lora_r300_qwen3`) + a **sentence-level
projection head** (`Linear(300, 1200) → GELU → Dropout(0.1) → Linear(1200, 300) → L2-norm`).
Both backbone (`A`, `B`) and the projection head are trainable.
**Loss:** `InfoNCE(v_proj vs [v_T ‖ v_hards], τ=0.01)` with `H = 64` mined hard negatives
per anchor at batch=256. **No MSE term.**
**Schedule:** lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000.
Optimizer: AdamW, weight decay 0.01.
Files under `rank_300/`:
- `checkpoint_final.pt` — `A.weight`, `B.weight`, plus the projection head's `proj.l1.{weight,bias}` and `proj.l2.{weight,bias}`.
`E` is excluded (non-persistent buffer); re-inject from `jsanzolac/drifting-glove-distilled-r300` at load time.
- `config.json`
- `vectors_drifted_pre_proj.txt` — `E + B(A(·))` per vocab row (PRE-projection; the head is contextual).
- `train_log.jsonl`
|