Lossless Mechanistic Compression and Surgical Correction of Medical Imaging Models
Artifacts for the paper by Yeonseong Cynn (River Lab, May 2026).
Summary
A compressed CheXNet (DenseNet121) at 51.43% parameter reduction (6,966,034 β 3,383,248) with mean AUROC preserved within sampling noise on n=1045 NIH ChestX-ray14 test images (Ξ +0.0004, per-pathology max |Ξ| = 0.0033). Output identity to numerical precision (max |Ξ logit| < 5Γ10β»βΆ).
The compressed model exposes classifier channels at a granularity that makes mechanistic interventions practical:
- Surgical correction: 5-channel classifier weight zeroing softly reduces a target false-positive probability with bounded side effects.
- Mutual exclusivity insight: 89 of 100 polarized classifier channels are not architectural conflicts but bipolar discriminative axes exploiting label mutual exclusivity (Jaccard < 0.1).
- Cost-aware operations: threshold calibration and minimal retraining routed by a decision system per pathology.
- Clinical report auto-generation: combining channel-level evidence, Grad-CAM region mapping, and mutual-exclusivity exclusion.
Files
Weights
| File | Size | Description |
|---|---|---|
compressed_model.pt |
14.2 MB | Compressed CheXNet backbone + classifier (3.38M params) |
classifier_finetuned.pt |
75 KB | Optional fine-tuned classifier head (18K params) |
Code
| File | Description |
|---|---|
inference.py |
Minimal CLI inference (load + forward) |
requirements.txt |
Pip dependencies |
Metrics (JSON)
metrics/baseline_vs_compressed.jsonβ Per-pathology AUROC (baseline vs compressed, n=1045)metrics/eval_nih_weights.jsonβ All 5 torchxrayvision DenseNet121 checkpoints on NIH testmetrics/analyze_binary_axis.jsonβ Pathology independence + Jaccard matrixmetrics/q_conflict_legitimacy.jsonβ Polarized channel legitimacy classificationmetrics/surgery_channel_ablation.jsonβ Surgical correction K-sweepmetrics/apply_threshold_calibration.jsonβ Per-class Youden threshold + F1/Recallmetrics/minimal_retrain.json,minimal_retrain_v2.jsonβ Classifier-head fine-tune costs
Figures (paper)
figures/fig1_compression.{pdf,png}β Headline numbersfigures/fig2_sparse_layers.{pdf,png}β Per-block sparsityfigures/fig3_mutual_exclusivity.{pdf,png}β Jaccard β channel-usage mirrorfigures/fig4_legitimacy.{pdf,png}β Polarized-channel legitimacyfigures/fig5_surgery.{pdf,png}β Surgical-correction K-sweepfigures/fig6_treatment.{pdf,png}β Per-pathology treatment recommendationsfigures/fig7_minimal_retrain.{pdf,png}β Fine-tuning + threshold calibrationfigures/fig8_clinical_report.{pdf,png}β Clinical report sample
Setup
pip install -r requirements.txt
Usage
Loading and inference
python inference.py path/to/xray.png
python inference.py path/to/xray.png --classifier classifier_finetuned.pt
Loading the model in your own code
import torch
import torch.nn as nn
import torchxrayvision as xrv
model = xrv.models.DenseNet(weights="densenet121-res224-all").eval()
ckpt = torch.load("compressed_model.pt", weights_only=False)
for block_idx in [1, 2, 3, 4]:
block = getattr(model.features, f"denseblock{block_idx}")
block_alive = ckpt["alive_per_block"][block_idx]
for dl_key, n_alive in block_alive.items():
i = int(dl_key[2:])
L = getattr(block, f"denselayer{i}")
in_ch = L.conv1.in_channels
L.conv1 = nn.Conv2d(in_ch, n_alive, 1, bias=True).eval()
L.norm2 = nn.BatchNorm2d(n_alive, eps=L.norm2.eps).eval()
L.conv2 = nn.Conv2d(n_alive, 32, 3, padding=1, bias=False).eval()
model.load_state_dict(ckpt["state_dict"])
for block_idx in [1, 2, 3, 4]:
block = getattr(model.features, f"denseblock{block_idx}")
for i in range(1, {1:6, 2:12, 3:24, 4:16}[block_idx] + 1):
getattr(block, f"denselayer{i}").norm2 = nn.Identity()
# Optional fine-tuned classifier
cls_ft = nn.Linear(1024, 18)
cls_ft.load_state_dict(torch.load("classifier_finetuned.pt", weights_only=True))
model.classifier = cls_ft
model.eval()
Verification
NIH ChestX-ray14 official test split (1045 images)
| Configuration | Parameters | Mean AUROC | Latency (ms/image) |
|---|---|---|---|
Baseline (densenet121-res224-all) |
6,966,034 | 0.7781 | 15.17 |
| Compressed | 3,383,248 (-51.43%) | 0.7785 (+0.0004) | 14.73 (-2.9%) |
Per-pathology max |Ξ AUROC| = 0.0033 (Emphysema +); all within sampling noise.
Choice of baseline checkpoint
We compared all 5 torchxrayvision DenseNet121 checkpoints on the same
NIH test subset. The multi-source all is the strongest:
| Checkpoint | Mean AUROC |
|---|---|
densenet121-res224-all |
0.7781 |
densenet121-res224-nih |
0.7524 |
densenet121-res224-chex |
0.7425 |
densenet121-res224-mimic_ch |
0.7178 |
densenet121-res224-mimic_nb |
0.7049 |
Higher published NIH-only DenseNet121 numbers (e.g., 0.84) come from
corpus-specific hyperparameter and augmentation tuning not part of the
open torchxrayvision release.
Threshold calibration (Youden-J)
The default decision threshold 0.5 is overly conservative for this multi-label model. Per-class Youden-J on a held-out validation set shifts the cohort-average operating point:
| Setting | Mean F1 | Mean Recall |
|---|---|---|
| Default threshold 0.5 | 0.127 | 0.111 |
| Youden-J calibrated | 0.20 | 0.78 |
Caveat: this trades precision for recall sharply. Best-performing classes (Cardiomegaly: precision 1.0 β 0.11, F1 0.57 β 0.20; Mass: F1 0.25 β 0.07) are degraded. F1 average is dominated by previously zero-recall classes (Infiltration, Atelectasis). For deployment, F1-optimal thresholds or explicit clinical precision floors are preferable.
Surgical correction (representative Cardiomegaly false positive)
| K (channels zeroed) | Target prob | TP loss | Other 13 pathology AUROC Ξ |
|---|---|---|---|
| 0 (baseline) | 0.89 | β | β |
| 5 | 0.76 | 0 | exactly 0 |
| 10 | 0.67 | 4 | exactly 0 |
| 20 | 0.54 | 7 | exactly 0 |
At K=5 the decision (threshold 0.5) is not flipped; the correction is a soft probability reduction, not a hard decision change. K=20 crosses the boundary but loses 7 true positives. Treat surgical correction as a confidence-shaping tool, not a binary error eraser.
The exact-zero AUROC isolation guarantee on the other 13 pathologies holds by construction (only one classifier row is modified).
Method Disclosure
Compression method specifics are proprietary; the foundational procedure is covered by Korean patent applications. The released artifacts (weights, inference code, downstream analysis scripts) are sufficient for reproduction of the reported results.
Base Model
torchxrayvision densenet121-res224-all (DenseNet121 trained on NIH ChestX-ray14, CheXpert, MIMIC-CXR, PadChest).
Citation
Zenodo preprint: 10.5281/zenodo.20131680
@misc{cynn2026chexnet,
title={Lossless Mechanistic Compression and Surgical Correction of Medical Imaging Models},
author={Cynn, Yeonseong},
year={2026},
publisher={Zenodo},
doi={10.5281/zenodo.20131680},
url={https://doi.org/10.5281/zenodo.20131680}
}
License
MIT for the released code; the underlying compression method is proprietary (see Method Disclosure above).
Contact
For questions or commercial inquiries: whitepep@gmail.com