jsanzolac
/

bpe_glove_300_lora_r300_qwen3_hardnegs

qwen3-embedding

Model card Files Files and versions

bpe_glove_300_lora_r300_qwen3_hardnegs / README.md

jsanzolac's picture

Upload README.md with huggingface_hub

f3415d8 verified 5 days ago

|

history blame contribute delete

1.42 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- glove
	- lora
	- distillation
	- hard-negatives
	- qwen3-embedding
	base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
	datasets:
	- jsanzolac/qwen3_emb_300_packed_cl100k
	- jsanzolac/qwen3_emb_512_hard_negatives
	---

	# bpe_glove_300_lora_r300_qwen3_hardnegs

	Continuation of `jsanzolac/bpe_glove_300_lora_r300_qwen3` — same `DriftingGloVeStudent` rank=300 over a frozen
	300-d cl100k BPE-GloVe — trained for an additional 150,000 steps with **mined
	hard negatives** from `jsanzolac/qwen3_emb_512_hard_negatives`.

	Loss: `cross_entropy(v @ [v_T ‖ v_hards]^T / τ) + λ_MSE · MSE(v, v_T)` with
	`τ = 0.05`, `λ_MSE = 1.0`, `H = 64` mined hard negatives per anchor.

	Warm-start: `A.weight` + `B.weight` from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`. Optimizer
	state was not in the source checkpoint, so this run uses a fresh LR schedule
	(5e-4 → 1e-5 cosine over 150,000 steps).

	Frozen: `E` (300-d GloVe from `jsanzolac/drifting-glove-distilled-r300`), teacher (only used to produce the
	cached `v_T` targets in `jsanzolac/qwen3_emb_300_packed_cl100k` — not loaded here).

	Files under `rank_300/`:
	- `checkpoint_final.pt` — `A.weight` + `B.weight` (E excluded; reinject from `jsanzolac/drifting-glove-distilled-r300`).
	- `config.json`
	- `vectors_drifted.txt` / `.parquet` — `E + B(A(·))` per vocab row.
	- `train_log.jsonl`