English
glove
lora
distillation
hard-negatives
qkv-split
File size: 1,250 Bytes
62d9999
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
language:
- en
license: apache-2.0
tags:
- glove
- lora
- distillation
- hard-negatives
- qkv-split
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
datasets:
- jsanzolac/qwen3_emb_300_packed_cl100k
- jsanzolac/qwen3_emb_512_hard_negatives
---

# bpe_glove_300_qkv_v_only_hardnegs

QKV-split LoRA student on top of the 300-d cl100k BPE-GloVe (`jsanzolac/drifting-glove-distilled-r300`).

- **Q** = frozen E.
- **K** = E + frozen A_K·B_K, loaded from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`.
- **V** = E + trainable A_V·B_V (rank=300, full).

**Loss:** `InfoNCE(v_S vs [v_T ‖ v_hards], τ=0.05) + 0.1·MSE(v_S, v_T)`
with `H = 64` mined hard negatives per anchor at batch=256.

**Schedule:** lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000.
Optimizer: AdamW, weight decay 0.01.

Files under `rank_300/`:
- `checkpoint_final.pt` — A_V.weight + B_V.weight (frozen E, A_K, B_K NOT included).
- `config.json`
- `vectors_drifted_V.txt``E + B_V(A_V(·))` per vocab row (V-side static drift only).
- `train_log.jsonl`

**To reconstruct the full model at inference:** load E + (A_K, B_K) from `jsanzolac/bpe_glove_300_lora_r300_qwen3`, load (A_V, B_V) from this repo, then run the QKV forward pass.