bpe_glove_300_lora_r300_qwen3

DriftingGloVeStudent rank=300 over a frozen 300-d cl100k BPE-GloVe, distilled from Qwen/Qwen3-Embedding-8B (MRL-truncated to the first 300 dims, then re-L2-normalized).

Same architecture and hyperparameters as the previous best 300-d run (r300/one_more_try_train_consolidated (3).ipynb); the only change is the teacher source.

Loss: cross_entropy(v @ v_T^T / τ) + λ_MSE · MSE(v, v_T) with τ = 0.05, λ_MSE = 1.0.
Training: 150,000 steps × batch 256 × lr 0.0005 (cosine → 1e-05).

Files under rank_300/:

checkpoint_final.pt — LoRA A.weight + B.weight (E excluded; re-inject from jsanzolac/drifting-glove-distilled-r300/vectors.txt).
config.json
vectors_drifted.txt / .parquet — E + B(A(·)) per vocab row.
train_log.jsonl

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsanzolac/bpe_glove_300_lora_r300_qwen3

Base model

jsanzolac/drifting-glove-distilled-r300

Adapter

(1)

this model

Adapters

5 models

jsanzolac
/

bpe_glove_300_lora_r300_qwen3

bpe_glove_300_lora_r300_qwen3

Model tree for jsanzolac/bpe_glove_300_lora_r300_qwen3

Datasets used to train jsanzolac/bpe_glove_300_lora_r300_qwen3