bpe_glove_300_qkv_v_only_hardnegs

QKV-split LoRA student on top of the 300-d cl100k BPE-GloVe (jsanzolac/drifting-glove-distilled-r300).

Q = frozen E.
K = E + frozen A_K·B_K, loaded from jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt.
V = E + trainable A_V·B_V (rank=300, full).

Loss: InfoNCE(v_S vs [v_T ‖ v_hards], τ=0.05) + 0.1·MSE(v_S, v_T) with H = 64 mined hard negatives per anchor at batch=256.

Schedule: lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000. Optimizer: AdamW, weight decay 0.01.

Files under rank_300/:

checkpoint_final.pt — A_V.weight + B_V.weight (frozen E, A_K, B_K NOT included).
config.json
vectors_drifted_V.txt — E + B_V(A_V(·)) per vocab row (V-side static drift only).
train_log.jsonl

To reconstruct the full model at inference: load E + (A_K, B_K) from jsanzolac/bpe_glove_300_lora_r300_qwen3, load (A_V, B_V) from this repo, then run the QKV forward pass.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsanzolac/bpe_glove_300_qkv_v_only_hardnegs

Base model

jsanzolac/drifting-glove-distilled-r300

Adapter

jsanzolac/bpe_glove_300_lora_r300_qwen3

Adapter

(6)

this model

jsanzolac
/

bpe_glove_300_qkv_v_only_hardnegs

bpe_glove_300_qkv_v_only_hardnegs

Model tree for jsanzolac/bpe_glove_300_qkv_v_only_hardnegs

Datasets used to train jsanzolac/bpe_glove_300_qkv_v_only_hardnegs