Theta-freeze KDA diag-rot MQAR checkpoint

This repository stores the checkpoint from hypo-test for the MQAR theta-freeze diagnostic run.

Checkpoint

  • File: theta_freeze3500_t255_kda_diag_rot_lr5e-4_16k_b32_seed42.pt
  • SHA256: f1cd8d048d50fefa562f2d06d04bf43801e039c25a5273911b24ba02b1835035
  • Params: 3519748
  • Train steps: 16000
  • Seed: 42
  • N: 95
  • Vocab size: 4096
  • Num queries: 32
  • Baseline: kda_diag_rot
  • Theta freeze steps: 3500
  • Final eval accuracy: 0.621704 (5093/8192)
  • Final loss: 1.640130

Theta-scale ablation

theta scale eval accuracy correct/total
1.0 0.621704 5093/8192
0.0 0.625122 5121/8192
0.25 0.623901 5111/8192
0.5 0.623169 5105/8192
2.0 0.618896 5070/8192

The checkpoint was uploaded separately from the GitHub repo to avoid storing binary model weights in git.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support