Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- glove
|
| 7 |
+
- lora
|
| 8 |
+
- distillation
|
| 9 |
+
- hard-negatives
|
| 10 |
+
- qkv-split
|
| 11 |
+
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
|
| 12 |
+
datasets:
|
| 13 |
+
- jsanzolac/qwen3_emb_300_packed_cl100k
|
| 14 |
+
- jsanzolac/qwen3_emb_512_hard_negatives
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# bpe_glove_300_qkv_v_only_hardnegs
|
| 18 |
+
|
| 19 |
+
QKV-split LoRA student on top of the 300-d cl100k BPE-GloVe (`jsanzolac/drifting-glove-distilled-r300`).
|
| 20 |
+
|
| 21 |
+
- **Q** = frozen E.
|
| 22 |
+
- **K** = E + frozen A_K·B_K, loaded from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`.
|
| 23 |
+
- **V** = E + trainable A_V·B_V (rank=300, full).
|
| 24 |
+
|
| 25 |
+
**Loss:** `InfoNCE(v_S vs [v_T ‖ v_hards], τ=0.05) + 0.1·MSE(v_S, v_T)`
|
| 26 |
+
with `H = 64` mined hard negatives per anchor at batch=256.
|
| 27 |
+
|
| 28 |
+
**Schedule:** lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000.
|
| 29 |
+
Optimizer: AdamW, weight decay 0.01.
|
| 30 |
+
|
| 31 |
+
Files under `rank_300/`:
|
| 32 |
+
- `checkpoint_final.pt` — A_V.weight + B_V.weight (frozen E, A_K, B_K NOT included).
|
| 33 |
+
- `config.json`
|
| 34 |
+
- `vectors_drifted_V.txt` — `E + B_V(A_V(·))` per vocab row (V-side static drift only).
|
| 35 |
+
- `train_log.jsonl`
|
| 36 |
+
|
| 37 |
+
**To reconstruct the full model at inference:** load E + (A_K, B_K) from `jsanzolac/bpe_glove_300_lora_r300_qwen3`, load (A_V, B_V) from this repo, then run the QKV forward pass.
|