jsanzolac
/

bpe_glove_512_lora_v1_ffn

Model card Files Files and versions

bpe_glove_512_lora_v1_ffn / README.md

jsanzolac's picture

Upload README.md with huggingface_hub

c99671d verified 9 days ago

|

history blame contribute delete

1.18 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- glove
	- lora
	- distillation
	- bpe
	- cl100k_base
	- ffn
	base_model: jsanzolac/bpe_glove_512
	datasets:
	- jsanzolac/qwen3_emb_512
	- jsanzolac/qwen3_emb_512_packed
	---

	# bpe_glove_512_lora_v1_ffn

	Warm-start from `jsanzolac/bpe_glove_512_lora_v1/rank_512` plus a per-token FFN inserted
	between the GloVe-attention output and the alpha-pool collapse.

	Trainable: `A`, `B`, FFN. Frozen: `E`, teacher.
	Loss: `λ_c·InfoNCE + λ_D·‖ρ_T − ρ_S‖²_F` with `λ_c=1.0`, `λ_D=0.1`.
	Density is computed on the post-FFN per-token states; InfoNCE is on the alpha-pooled sentence vector.

	Files:
	- `rank_512/checkpoint_final.pt` — A + B + FFN state dict (E is non-persistent; re-inject from `jsanzolac/bpe_glove_512/vectors.txt`).
	- `rank_512/config.json` — full hyperparameters.
	- `rank_512/vectors_drifted.txt` — `E + B(A(·))` per vocab row, GloVe text format. Note: this captures only the static drifted embedding lookup, not the FFN's effect (which is contextual). To use the model end-to-end, instantiate `DriftingGloVeStudentFFN` and run forward.
	- `rank_512/train_log.jsonl` — per-step metrics.