smoothquant: fix istupakov int8 size note comparing output to itself

The post-export download-size note stat'd model_dir/encoder-model.int8.onnx
as the istupakov baseline, but that is the same path the script writes its
output to when --out-name is the canonical encoder-model.int8.onnx, so it read
its own freshly-written output and printed the tautology
"841.6 MB (istupakov int8 is 841.6 MB)".

The real upstream encoder-model.int8.onnx on HF
(istupakov/parakeet-tdt-0.6b-v3-onnx) is 652,183,999 B (622 MiB), not 841.6.
Hardcode that as ISTUPAKOV_INT8_ENCODER_BYTES (sourced from HF, dated) and only
stat an on-disk file when it is a different path than the output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files changed (1) hide show

scripts/quantize-int8-smoothquant.py +20 -1

scripts/quantize-int8-smoothquant.py CHANGED Viewed

@@ -198,6 +198,16 @@ DEFAULT_CALIB_DIR = "calibration_audio"
 SAMPLE_RATE = 16000
 def expand_audio(inputs):
     """Resolve --audio entries (files and/or folders) to a flat list of audio files.
@@ -543,8 +553,17 @@ def main():
         logger.info(f"[sq] pruned {pruned} orphaned initializer(s) (folded smooth scales)")
     out_size = os.path.getsize(out_encoder)
     baseline = model_dir / "encoder-model.int8.onnx"
-    base_note = f" (istupakov int8 is {human(os.path.getsize(baseline))})" if baseline.exists() else ""
     logger.info(f"[sq] done in {dt:.0f}s -> {out_encoder.name} {human(out_size)}{base_note}")
     # Fidelity smoke test (NOT just shape): run one calibration window through both

 SAMPLE_RATE = 16000
+# Upstream istupakov int8 encoder size, for the post-export download-size note.
+# Measured from HF on 2026-06-09:
+#   istupakov/parakeet-tdt-0.6b-v3-onnx / encoder-model.int8.onnx = 652,183,999 B.
+# Hardcoded because this script's own output usually overwrites that filename in
+# the model dir (when --out-name is the canonical encoder-model.int8.onnx), so the
+# on-disk copy can't be stat'd as a baseline without reading our own output back.
+# NOTE: istupakov also quantizes the convs (--op-types MatMul,Conv), which is why
+# their encoder is smaller than this script's MatMul-only default.
+ISTUPAKOV_INT8_ENCODER_BYTES = 652_183_999
 def expand_audio(inputs):
     """Resolve --audio entries (files and/or folders) to a flat list of audio files.
         logger.info(f"[sq] pruned {pruned} orphaned initializer(s) (folded smooth scales)")
     out_size = os.path.getsize(out_encoder)
+    # Download-size comparison vs the upstream istupakov int8 encoder. Only stat an
+    # on-disk istupakov file when it is a DIFFERENT path than our output: when
+    # --out-name is the canonical encoder-model.int8.onnx, out_encoder overwrites
+    # that file, so stat'ing it would read our own output back and print the
+    # tautology "X (istupakov int8 is X)". Otherwise fall back to the HF size.
     baseline = model_dir / "encoder-model.int8.onnx"
+    if baseline.exists() and baseline.resolve() != out_encoder.resolve():
+        base_bytes = os.path.getsize(baseline)
+    else:
+        base_bytes = ISTUPAKOV_INT8_ENCODER_BYTES
+    base_note = f" (istupakov int8 is {human(base_bytes)})"
     logger.info(f"[sq] done in {dt:.0f}s -> {out_encoder.name} {human(out_size)}{base_note}")
     # Fidelity smoke test (NOT just shape): run one calibration window through both