jsanzolac commited on
Commit
c99671d
·
verified ·
1 Parent(s): 30bf331

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - glove
7
+ - lora
8
+ - distillation
9
+ - bpe
10
+ - cl100k_base
11
+ - ffn
12
+ base_model: jsanzolac/bpe_glove_512
13
+ datasets:
14
+ - jsanzolac/qwen3_emb_512
15
+ - jsanzolac/qwen3_emb_512_packed
16
+ ---
17
+
18
+ # bpe_glove_512_lora_v1_ffn
19
+
20
+ Warm-start from `jsanzolac/bpe_glove_512_lora_v1/rank_512` plus a per-token FFN inserted
21
+ between the GloVe-attention output and the alpha-pool collapse.
22
+
23
+ **Trainable:** `A`, `B`, FFN. **Frozen:** `E`, teacher.
24
+ **Loss:** `λ_c·InfoNCE + λ_D·‖ρ_T − ρ_S‖²_F` with `λ_c=1.0`, `λ_D=0.1`.
25
+ Density is computed on the **post-FFN** per-token states; InfoNCE is on the alpha-pooled sentence vector.
26
+
27
+ Files:
28
+ - `rank_512/checkpoint_final.pt` — A + B + FFN state dict (E is non-persistent; re-inject from `jsanzolac/bpe_glove_512/vectors.txt`).
29
+ - `rank_512/config.json` — full hyperparameters.
30
+ - `rank_512/vectors_drifted.txt` — `E + B(A(·))` per vocab row, GloVe text format. Note: this captures only the static drifted embedding lookup, **not** the FFN's effect (which is contextual). To use the model end-to-end, instantiate `DriftingGloVeStudentFFN` and run forward.
31
+ - `rank_512/train_log.jsonl` — per-step metrics.