Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- glove
|
| 7 |
+
- lora
|
| 8 |
+
- distillation
|
| 9 |
+
- hard-negatives
|
| 10 |
+
- qwen3-embedding
|
| 11 |
+
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
|
| 12 |
+
datasets:
|
| 13 |
+
- jsanzolac/qwen3_emb_300_packed_cl100k
|
| 14 |
+
- jsanzolac/qwen3_emb_512_hard_negatives
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# bpe_glove_300_lora_r300_qwen3_hardnegs_nce_only
|
| 18 |
+
|
| 19 |
+
Continuation of `jsanzolac/bpe_glove_300_lora_r300_qwen3` — same `DriftingGloVeStudent` rank=300 over a frozen
|
| 20 |
+
300-d cl100k BPE-GloVe — trained for an additional 150,000 steps with **mined
|
| 21 |
+
hard negatives** from `jsanzolac/qwen3_emb_512_hard_negatives`.
|
| 22 |
+
|
| 23 |
+
**Loss:** `cross_entropy(v @ [v_T ‖ v_hards]^T / τ)` *— MSE term disabled in this variant.* with
|
| 24 |
+
`τ = 0.05`, `H = 64` mined hard negatives per anchor.
|
| 25 |
+
|
| 26 |
+
**Warm-start:** `A.weight` + `B.weight` from `jsanzolac/bpe_glove_300_lora_r300_qwen3/rank_300/checkpoint_final.pt`. Optimizer
|
| 27 |
+
state was not in the source checkpoint, so this run uses a fresh LR schedule
|
| 28 |
+
(5e-4 → 1e-5 cosine over 150,000 steps).
|
| 29 |
+
|
| 30 |
+
**Frozen:** `E` (300-d GloVe from `jsanzolac/drifting-glove-distilled-r300`), teacher (only used to produce the
|
| 31 |
+
cached `v_T` targets in `jsanzolac/qwen3_emb_300_packed_cl100k` — not loaded here).
|
| 32 |
+
|
| 33 |
+
Files under `rank_300/`:
|
| 34 |
+
- `checkpoint_final.pt` — `A.weight` + `B.weight` (E excluded; reinject from `jsanzolac/drifting-glove-distilled-r300`).
|
| 35 |
+
- `config.json`
|
| 36 |
+
- `vectors_drifted.txt` / `.parquet` — `E + B(A(·))` per vocab row.
|
| 37 |
+
- `train_log.jsonl`
|