| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - glove |
| - lora |
| - distillation |
| - hard-negatives |
| - qwen3-embedding |
| base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3 |
| datasets: |
| - jsanzolac/qwen3_emb_300_packed_cl100k |
| - jsanzolac/qwen3_emb_512_hard_negatives |
| --- |
| |
| # bpe_glove_300_lora_r300_qwen3_hardnegs |
|
|
| Continuation of `jsanzolac/bpe_glove_300_lora_r300_qwen3` — same `DriftingGloVeStudent` rank=300 over a frozen |
| 300-d cl100k BPE-GloVe — trained for an additional 150,000 steps with **mined |
| hard negatives** from `jsanzolac/qwen3_emb_512_hard_negatives`. |
|
|
| **Loss:** `cross_entropy(v @ [v_T ‖ v_hards]^T / τ) + λ_MSE · MSE(v, v_T)` with |
| `τ = 0.05`, `λ_MSE = 1.0`, `H = 64` mined hard negatives per anchor. |
|
|
| **Warm-start:** `A.weight` + `B.weight` from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`. Optimizer |
| state was not in the source checkpoint, so this run uses a fresh LR schedule |
| (5e-4 → 1e-5 cosine over 150,000 steps). |
|
|
| **Frozen:** `E` (300-d GloVe from `jsanzolac/drifting-glove-distilled-r300`), teacher (only used to produce the |
| cached `v_T` targets in `jsanzolac/qwen3_emb_300_packed_cl100k` — not loaded here). |
|
|
| Files under `rank_300/`: |
| - `checkpoint_final.pt` — `A.weight` + `B.weight` (E excluded; reinject from `jsanzolac/drifting-glove-distilled-r300`). |
| - `config.json` |
| - `vectors_drifted.txt` / `.parquet` — `E + B(A(·))` per vocab row. |
| - `train_log.jsonl` |
|
|