etwk commited on
Commit
759f10f
·
1 Parent(s): fff63d1

Pin determinism flags in load() (eligibility insurance)

Browse files

The leaderboard ranks ONLY deterministic submissions (leaderboard/store.py).
Pin the backend flags that govern run-to-run reproducibility instead of
relying on host defaults: cudnn.benchmark=False (fixes the conv algorithm),
and the TF32 flags to the exact validated mode (matmul TF32 off, cuDNN TF32
on). TF32 is deterministic, so this only makes the validated numerics
host-independent; it does not change the determinism gate or accuracy.

Validated: full public 1100 re-eval is byte-identical to shipped --
overall 0.995, highest_tier_above_90=10, deterministic=True, tiers
1/1/1/1/1.00/.98/1.00/1.00/.99/.98, inference within 300s. A TF32-vs-FP32
probe confirmed 0/100 cases differ on tiers 9 and 10 (the output is
float-mode-invariant), so the near-max ~0.974 is intrinsic cell margin,
not a precision artifact -- keeping TF32 (faster, identical) is correct.

Files changed (1) hide show
  1. model.py +12 -0
model.py CHANGED
@@ -211,6 +211,18 @@ class HornerRNN(ModularMultiplicationModel):
211
  self.device: torch.device | None = None
212
 
213
  def load(self, model_dir: str) -> None:
 
 
 
 
 
 
 
 
 
 
 
 
214
  if torch.cuda.is_available():
215
  self.device = torch.device("cuda")
216
  elif torch.backends.mps.is_available():
 
211
  self.device: torch.device | None = None
212
 
213
  def load(self, model_dir: str) -> None:
214
+ # The leaderboard ranks ONLY deterministic submissions, so pin the backend flags
215
+ # that govern run-to-run reproducibility instead of relying on host defaults.
216
+ # cudnn.benchmark=False fixes the conv algorithm (benchmark mode is the main source
217
+ # of run-to-run variation); the TF32 flags are pinned to the exact mode the shipped
218
+ # accuracy was validated under (matmul TF32 off, cuDNN TF32 on). TF32 is itself
219
+ # deterministic, so this only makes the validated numerics host-independent; it does
220
+ # not affect the determinism check. Inference is no_grad, so no backward-only
221
+ # nondeterministic kernels are involved.
222
+ torch.backends.cudnn.benchmark = False
223
+ torch.backends.cuda.matmul.allow_tf32 = False
224
+ torch.backends.cudnn.allow_tf32 = True
225
+
226
  if torch.cuda.is_available():
227
  self.device = torch.device("cuda")
228
  elif torch.backends.mps.is_available():