English
glove
lora
distillation
hard-negatives
projection-head
jsanzolac commited on
Commit
db4ef85
Β·
verified Β·
1 Parent(s): 50cb8a6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ r300 backbone (`A`, `B` warm-started from `jsanzolac/bpe_glove_300_lora_r300_qwe
20
  projection head** (`Linear(300, 1200) β†’ GELU β†’ Dropout(0.1) β†’ Linear(1200, 300) β†’ L2-norm`).
21
  Both backbone (`A`, `B`) and the projection head are trainable.
22
 
23
- **Loss:** `InfoNCE(v_proj vs [v_T β€– v_hards], Ο„=0.05)` with `H = 64` mined hard negatives
24
  per anchor at batch=256. **No MSE term.**
25
 
26
  **Schedule:** lr 0.0005 β†’ 1e-05 cosine over 150,000 steps, warmup 1000.
 
20
  projection head** (`Linear(300, 1200) β†’ GELU β†’ Dropout(0.1) β†’ Linear(1200, 300) β†’ L2-norm`).
21
  Both backbone (`A`, `B`) and the projection head are trainable.
22
 
23
+ **Loss:** `InfoNCE(v_proj vs [v_T β€– v_hards], Ο„=0.01)` with `H = 64` mined hard negatives
24
  per anchor at batch=256. **No MSE term.**
25
 
26
  **Schedule:** lr 0.0005 β†’ 1e-05 cosine over 150,000 steps, warmup 1000.