| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - glove |
| - lora |
| - distillation |
| - hard-negatives |
| - qkv-split |
| base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3 |
| datasets: |
| - jsanzolac/qwen3_emb_300_packed_cl100k |
| - jsanzolac/qwen3_emb_512_hard_negatives |
| --- |
| |
| # bpe_glove_300_qkv_v_only_hardnegs |
|
|
| QKV-split LoRA student on top of the 300-d cl100k BPE-GloVe (`jsanzolac/drifting-glove-distilled-r300`). |
|
|
| - **Q** = frozen E. |
| - **K** = E + frozen A_K·B_K, loaded from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`. |
| - **V** = E + trainable A_V·B_V (rank=300, full). |
|
|
| **Loss:** `InfoNCE(v_S vs [v_T ‖ v_hards], τ=0.05) + 0.1·MSE(v_S, v_T)` |
| with `H = 64` mined hard negatives per anchor at batch=256. |
|
|
| **Schedule:** lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000. |
| Optimizer: AdamW, weight decay 0.01. |
|
|
| Files under `rank_300/`: |
| - `checkpoint_final.pt` — A_V.weight + B_V.weight (frozen E, A_K, B_K NOT included). |
| - `config.json` |
| - `vectors_drifted_V.txt` — `E + B_V(A_V(·))` per vocab row (V-side static drift only). |
| - `train_log.jsonl` |
|
|
| **To reconstruct the full model at inference:** load E + (A_K, B_K) from `jsanzolac/bpe_glove_300_lora_r300_qwen3`, load (A_V, B_V) from this repo, then run the QKV forward pass. |
|
|