jsanzolac
/

bpe_glove_300_qkv_v_only_hardnegs

Model card Files Files and versions

bpe_glove_300_qkv_v_only_hardnegs / README.md

jsanzolac's picture

Upload README.md with huggingface_hub

62d9999 verified 5 days ago

|

history blame contribute delete

1.25 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- glove
	- lora
	- distillation
	- hard-negatives
	- qkv-split
	base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
	datasets:
	- jsanzolac/qwen3_emb_300_packed_cl100k
	- jsanzolac/qwen3_emb_512_hard_negatives
	---

	# bpe_glove_300_qkv_v_only_hardnegs

	QKV-split LoRA student on top of the 300-d cl100k BPE-GloVe (`jsanzolac/drifting-glove-distilled-r300`).

	- Q = frozen E.
	- K = E + frozen A_K·B_K, loaded from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`.
	- V = E + trainable A_V·B_V (rank=300, full).

	Loss: `InfoNCE(v_S vs [v_T ‖ v_hards], τ=0.05) + 0.1·MSE(v_S, v_T)`
	with `H = 64` mined hard negatives per anchor at batch=256.

	Schedule: lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000.
	Optimizer: AdamW, weight decay 0.01.

	Files under `rank_300/`:
	- `checkpoint_final.pt` — A_V.weight + B_V.weight (frozen E, A_K, B_K NOT included).
	- `config.json`
	- `vectors_drifted_V.txt` — `E + B_V(A_V(·))` per vocab row (V-side static drift only).
	- `train_log.jsonl`

	To reconstruct the full model at inference: load E + (A_K, B_K) from `jsanzolac/bpe_glove_300_lora_r300_qwen3`, load (A_V, B_V) from this repo, then run the QKV forward pass.