jsanzolac commited on
Commit
1c3074d
·
verified ·
1 Parent(s): 82c6b4d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - glove
7
+ - lora
8
+ - distillation
9
+ - hard-negatives
10
+ - qwen3-embedding
11
+ base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
12
+ datasets:
13
+ - jsanzolac/qwen3_emb_300_packed_cl100k
14
+ - jsanzolac/qwen3_emb_512_hard_negatives
15
+ ---
16
+
17
+ # bpe_glove_300_lora_r300_qwen3_hardnegs_nce_only
18
+
19
+ Continuation of `jsanzolac/bpe_glove_300_lora_r300_qwen3` — same `DriftingGloVeStudent` rank=300 over a frozen
20
+ 300-d cl100k BPE-GloVe — trained for an additional 150,000 steps with **mined
21
+ hard negatives** from `jsanzolac/qwen3_emb_512_hard_negatives`.
22
+
23
+ **Loss:** `cross_entropy(v @ [v_T ‖ v_hards]^T / τ)` *— MSE term disabled in this variant.* with
24
+ `τ = 0.05`, `H = 64` mined hard negatives per anchor.
25
+
26
+ **Warm-start:** `A.weight` + `B.weight` from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`. Optimizer
27
+ state was not in the source checkpoint, so this run uses a fresh LR schedule
28
+ (5e-4 → 1e-5 cosine over 150,000 steps).
29
+
30
+ **Frozen:** `E` (300-d GloVe from `jsanzolac/drifting-glove-distilled-r300`), teacher (only used to produce the
31
+ cached `v_T` targets in `jsanzolac/qwen3_emb_300_packed_cl100k` — not loaded here).
32
+
33
+ Files under `rank_300/`:
34
+ - `checkpoint_final.pt` — `A.weight` + `B.weight` (E excluded; reinject from `jsanzolac/drifting-glove-distilled-r300`).
35
+ - `config.json`
36
+ - `vectors_drifted.txt` / `.parquet` — `E + B(A(·))` per vocab row.
37
+ - `train_log.jsonl`