Pin determinism flags in load() (eligibility insurance)

The leaderboard ranks ONLY deterministic submissions (leaderboard/store.py).
Pin the backend flags that govern run-to-run reproducibility instead of
relying on host defaults: cudnn.benchmark=False (fixes the conv algorithm),
and the TF32 flags to the exact validated mode (matmul TF32 off, cuDNN TF32
on). TF32 is deterministic, so this only makes the validated numerics
host-independent; it does not change the determinism gate or accuracy.

Validated: full public 1100 re-eval is byte-identical to shipped --
overall 0.995, highest_tier_above_90=10, deterministic=True, tiers
1/1/1/1/1.00/.98/1.00/1.00/.99/.98, inference within 300s. A TF32-vs-FP32
probe confirmed 0/100 cases differ on tiers 9 and 10 (the output is
float-mode-invariant), so the near-max ~0.974 is intrinsic cell margin,
not a precision artifact -- keeping TF32 (faster, identical) is correct.

Files changed (1) hide show

model.py +12 -0

model.py CHANGED Viewed

@@ -211,6 +211,18 @@ class HornerRNN(ModularMultiplicationModel):
         self.device: torch.device | None = None
     def load(self, model_dir: str) -> None:
         if torch.cuda.is_available():
             self.device = torch.device("cuda")
         elif torch.backends.mps.is_available():

         self.device: torch.device | None = None
     def load(self, model_dir: str) -> None:
+        # The leaderboard ranks ONLY deterministic submissions, so pin the backend flags
+        # that govern run-to-run reproducibility instead of relying on host defaults.
+        # cudnn.benchmark=False fixes the conv algorithm (benchmark mode is the main source
+        # of run-to-run variation); the TF32 flags are pinned to the exact mode the shipped
+        # accuracy was validated under (matmul TF32 off, cuDNN TF32 on). TF32 is itself
+        # deterministic, so this only makes the validated numerics host-independent; it does
+        # not affect the determinism check. Inference is no_grad, so no backward-only
+        # nondeterministic kernels are involved.
+        torch.backends.cudnn.benchmark = False
+        torch.backends.cuda.matmul.allow_tf32 = False
+        torch.backends.cudnn.allow_tf32 = True
         if torch.cuda.is_available():
             self.device = torch.device("cuda")
         elif torch.backends.mps.is_available():