| language: | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - glove | |
| - lora | |
| - distillation | |
| - bpe | |
| - cl100k_base | |
| - ffn | |
| base_model: jsanzolac/bpe_glove_512 | |
| datasets: | |
| - jsanzolac/qwen3_emb_512 | |
| - jsanzolac/qwen3_emb_512_packed | |
| # bpe_glove_512_lora_v1_ffn | |
| Warm-start from `jsanzolac/bpe_glove_512_lora_v1/rank_512` plus a per-token FFN inserted | |
| between the GloVe-attention output and the alpha-pool collapse. | |
| **Trainable:** `A`, `B`, FFN. **Frozen:** `E`, teacher. | |
| **Loss:** `λ_c·InfoNCE + λ_D·‖ρ_T − ρ_S‖²_F` with `λ_c=1.0`, `λ_D=0.1`. | |
| Density is computed on the **post-FFN** per-token states; InfoNCE is on the alpha-pooled sentence vector. | |
| Files: | |
| - `rank_512/checkpoint_final.pt` — A + B + FFN state dict (E is non-persistent; re-inject from `jsanzolac/bpe_glove_512/vectors.txt`). | |
| - `rank_512/config.json` — full hyperparameters. | |
| - `rank_512/vectors_drifted.txt` — `E + B(A(·))` per vocab row, GloVe text format. Note: this captures only the static drifted embedding lookup, **not** the FFN's effect (which is contextual). To use the model end-to-end, instantiate `DriftingGloVeStudentFFN` and run forward. | |
| - `rank_512/train_log.jsonl` — per-step metrics. | |