English
glove
lora
distillation
hard-negatives
qkv-split
jsanzolac commited on
Commit
62d9999
·
verified ·
1 Parent(s): dbefd45

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - glove
7
+ - lora
8
+ - distillation
9
+ - hard-negatives
10
+ - qkv-split
11
+ base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
12
+ datasets:
13
+ - jsanzolac/qwen3_emb_300_packed_cl100k
14
+ - jsanzolac/qwen3_emb_512_hard_negatives
15
+ ---
16
+
17
+ # bpe_glove_300_qkv_v_only_hardnegs
18
+
19
+ QKV-split LoRA student on top of the 300-d cl100k BPE-GloVe (`jsanzolac/drifting-glove-distilled-r300`).
20
+
21
+ - **Q** = frozen E.
22
+ - **K** = E + frozen A_K·B_K, loaded from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`.
23
+ - **V** = E + trainable A_V·B_V (rank=300, full).
24
+
25
+ **Loss:** `InfoNCE(v_S vs [v_T ‖ v_hards], τ=0.05) + 0.1·MSE(v_S, v_T)`
26
+ with `H = 64` mined hard negatives per anchor at batch=256.
27
+
28
+ **Schedule:** lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000.
29
+ Optimizer: AdamW, weight decay 0.01.
30
+
31
+ Files under `rank_300/`:
32
+ - `checkpoint_final.pt` — A_V.weight + B_V.weight (frozen E, A_K, B_K NOT included).
33
+ - `config.json`
34
+ - `vectors_drifted_V.txt` — `E + B_V(A_V(·))` per vocab row (V-side static drift only).
35
+ - `train_log.jsonl`
36
+
37
+ **To reconstruct the full model at inference:** load E + (A_K, B_K) from `jsanzolac/bpe_glove_300_lora_r300_qwen3`, load (A_V, B_V) from this repo, then run the QKV forward pass.