English
glove
lora
distillation
hard-negatives
projection-head
File size: 1,240 Bytes
5fd7095
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db4ef85
5fd7095
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
language:
- en
license: apache-2.0
tags:
- glove
- lora
- distillation
- hard-negatives
- projection-head
base_model: jsanzolac/bpe_glove_300_lora_r300_qwen3
datasets:
- jsanzolac/qwen3_emb_300_packed_cl100k
- jsanzolac/qwen3_emb_512_hard_negatives
---

# bpe_glove_300_lora_r300_qwen3_proj_hardnegs

r300 backbone (`A`, `B` warm-started from `jsanzolac/bpe_glove_300_lora_r300_qwen3`) + a **sentence-level
projection head** (`Linear(300, 1200) → GELU → Dropout(0.1) → Linear(1200, 300) → L2-norm`).
Both backbone (`A`, `B`) and the projection head are trainable.

**Loss:** `InfoNCE(v_proj vs [v_T ‖ v_hards], τ=0.01)` with `H = 64` mined hard negatives
per anchor at batch=256. **No MSE term.**

**Schedule:** lr 0.0005 → 1e-05 cosine over 150,000 steps, warmup 1000.
Optimizer: AdamW, weight decay 0.01.

Files under `rank_300/`:
- `checkpoint_final.pt` — `A.weight`, `B.weight`, plus the projection head's `proj.l1.{weight,bias}` and `proj.l2.{weight,bias}`.
  `E` is excluded (non-persistent buffer); re-inject from `jsanzolac/drifting-glove-distilled-r300` at load time.
- `config.json`
- `vectors_drifted_pre_proj.txt` — `E + B(A(·))` per vocab row (PRE-projection; the head is contextual).
- `train_log.jsonl`