Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ r300 backbone (`A`, `B` warm-started from `jsanzolac/bpe_glove_300_lora_r300_qwe
|
|
| 20 |
projection head** (`Linear(300, 1200) β GELU β Dropout(0.1) β Linear(1200, 300) β L2-norm`).
|
| 21 |
Both backbone (`A`, `B`) and the projection head are trainable.
|
| 22 |
|
| 23 |
-
**Loss:** `InfoNCE(v_proj vs [v_T β v_hards], Ο=0.
|
| 24 |
per anchor at batch=256. **No MSE term.**
|
| 25 |
|
| 26 |
**Schedule:** lr 0.0005 β 1e-05 cosine over 150,000 steps, warmup 1000.
|
|
|
|
| 20 |
projection head** (`Linear(300, 1200) β GELU β Dropout(0.1) β Linear(1200, 300) β L2-norm`).
|
| 21 |
Both backbone (`A`, `B`) and the projection head are trainable.
|
| 22 |
|
| 23 |
+
**Loss:** `InfoNCE(v_proj vs [v_T β v_hards], Ο=0.01)` with `H = 64` mined hard negatives
|
| 24 |
per anchor at batch=256. **No MSE term.**
|
| 25 |
|
| 26 |
**Schedule:** lr 0.0005 β 1e-05 cosine over 150,000 steps, warmup 1000.
|