Theta-freeze KDA diag-rot MQAR checkpoint
This repository stores the checkpoint from hypo-test for the MQAR theta-freeze diagnostic run.
Checkpoint
- File:
theta_freeze3500_t255_kda_diag_rot_lr5e-4_16k_b32_seed42.pt - SHA256:
f1cd8d048d50fefa562f2d06d04bf43801e039c25a5273911b24ba02b1835035 - Params:
3519748 - Train steps:
16000 - Seed:
42 - N:
95 - Vocab size:
4096 - Num queries:
32 - Baseline:
kda_diag_rot - Theta freeze steps:
3500 - Final eval accuracy:
0.621704(5093/8192) - Final loss:
1.640130
Theta-scale ablation
| theta scale | eval accuracy | correct/total |
|---|---|---|
| 1.0 | 0.621704 | 5093/8192 |
| 0.0 | 0.625122 | 5121/8192 |
| 0.25 | 0.623901 | 5111/8192 |
| 0.5 | 0.623169 | 5105/8192 |
| 2.0 | 0.618896 | 5070/8192 |
The checkpoint was uploaded separately from the GitHub repo to avoid storing binary model weights in git.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support