bpe_glove_300_lora_r300_qwen3

DriftingGloVeStudent rank=300 over a frozen 300-d cl100k BPE-GloVe, distilled from Qwen/Qwen3-Embedding-8B (MRL-truncated to the first 300 dims, then re-L2-normalized).

Same architecture and hyperparameters as the previous best 300-d run (r300/one_more_try_train_consolidated (3).ipynb); the only change is the teacher source.

Loss: cross_entropy(v @ v_T^T / τ) + λ_MSE · MSE(v, v_T) with τ = 0.05, λ_MSE = 1.0.
Training: 150,000 steps × batch 256 × lr 0.0005 (cosine → 1e-05).

Files under rank_300/:

  • checkpoint_final.pt — LoRA A.weight + B.weight (E excluded; re-inject from jsanzolac/drifting-glove-distilled-r300/vectors.txt).
  • config.json
  • vectors_drifted.txt / .parquetE + B(A(·)) per vocab row.
  • train_log.jsonl
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsanzolac/bpe_glove_300_lora_r300_qwen3

Adapter
(1)
this model
Adapters
5 models

Datasets used to train jsanzolac/bpe_glove_300_lora_r300_qwen3