retinasense-vit / SESSION_CHANGES_2026_03_10.md
tanishq74's picture
Add session changes log with datasets, fixes, and GPU plan
bb0df05 verified
|
raw
history blame contribute delete
12.1 kB

RetinaSense-ViT β€” Complete Changes Log (2026-03-10)

This document captures every change made during the session, new datasets integrated, bugs found and fixed, new training code written, and the GPU execution plan.


1. Summary of All Changes

App Improvements (app.py β€” working NOW on CPU)

Change Before After Impact
Model ViT-Base/16 only ViT + EfficientNet-B3 Ensemble 49.1% β†’ 74.7% accuracy
Ensemble Weights N/A 35% ViT + 65% EfficientNet Optimized on calibration set
Preprocessing Broken (missing 3 steps) Matches training pipeline exactly Root cause of wrong predictions fixed
TTA None 4 augmented versions averaged ~1-3% accuracy boost
Triage System None AUTO-SCREEN / REVIEW / URGENT / RESCAN Clinical decision support
Model Disagreement None Shows when ViT and EfficientNet disagree Uncertainty indicator
OOD Handling Crashed when npz missing Graceful fallback No more crashes

Preprocessing Fix (the critical bug)

Root Cause: app.py preprocessing didn't match what models were trained on.

Step Training Old app.py Fixed app.py
Border Crop Yes (dark pixels < 7) MISSING Added _crop_black_borders()
Resize 224Γ—224, INTER_AREA 224Γ—224, INTER_LINEAR Fixed to INTER_AREA
CLAHE ODIR=CLAHE, APTOS=Ben Graham CLAHE for all CLAHE for all (matches majority)
Circular Mask Yes (r=0.48Γ—size) MISSING Added _apply_circular_mask()
Normalize mean=[0.4298,0.2784,0.1559] Same Same (no issue)

Impact of each missing step (from diagnostic agent):

  • Missing circular mask: Model sees corner pixels it never saw during training
  • Missing border crop: Retina shrunk within 224Γ—224 frame, wrong spatial distribution
  • CLAHE alone shifts Glaucoma probability by +43.2 percentage points

New Training Code (5 files, 4,213 lines)

File Lines Purpose GPU Time
unified_preprocessing.py 214 Rebuild image cache with single CLAHE pipeline for ALL sources ~15 min
train_dann.py 1,257 Domain-Adversarial Neural Network training ~1.5 hrs (H200)
retfound_backbone.py 459 RETFound foundation model backbone (1.6M retinal images) Drop-in replacement
enhanced_augmentation.py 677 CutMix, elastic deform, 5Γ— minority oversampling Used by training scripts
prepare_datasets.py 1,606 Download/prep 5 additional public datasets ~1-2 hrs download

2. New Datasets Available

Current Dataset

Source Images Classes Preprocessing
APTOS 2019 3,662 DR only (severity 0-4) Ben Graham enhancement
ODIR 4,878 All 5 classes CLAHE
Total 8,540 5 classes Mixed (causes domain shift)

New Datasets (via prepare_datasets.py)

Dataset Images Classes Source How to Get
EyePACS ~35,000 DR + Normal Kaggle kaggle competitions download -c diabetic-retinopathy-detection
MESSIDOR-2 1,748 DR grades ADCIS https://www.adcis.net/en/third-party/messidor2/
REFUGE ~1,200 Glaucoma + Normal Grand Challenge https://refuge.grand-challenge.org/
ADAM (iChallenge-AMD) ~1,200 AMD + Normal Grand Challenge https://amd.grand-challenge.org/
ORIGA ~650 Glaucoma + Normal Academic request https://nus.edu.sg/origa

After Expansion

Class Current Samples After Expansion Improvement
Normal ~2,500 ~12,000+ 5Γ— more
Diabetes/DR ~4,500 ~25,000+ 5Γ— more
Glaucoma ~250 ~2,100+ 8Γ— more
Cataract ~200 ~200 Same (no new source)
AMD ~90 ~1,290+ 14Γ— more
Total 8,540 ~40,000+ 5Γ— total

How to Use

# See available datasets
python prepare_datasets.py --list

# Download instructions for each
python prepare_datasets.py --instructions

# Prepare a specific dataset
python prepare_datasets.py --dataset eyepacs --raw-dir ./data/eyepacs
python prepare_datasets.py --dataset refuge --raw-dir ./data/refuge
python prepare_datasets.py --dataset adam --raw-dir ./data/adam

# Include existing APTOS+ODIR data
python prepare_datasets.py --include-existing

# Merge all into unified train/calib/test splits
python prepare_datasets.py --merge

# Preprocess all with unified CLAHE pipeline
python prepare_datasets.py --preprocess

All datasets are preprocessed with the unified CLAHE pipeline (crop borders β†’ resize 224 β†’ CLAHE β†’ circular mask), eliminating the domain shift caused by mixed Ben Graham/CLAHE preprocessing.


3. Investigation Findings

How Wrong Predictions Were Diagnosed

4 parallel investigation agents were spawned:

Agent Task Key Finding
Architecture Agent Compare training vs inference model definitions No mismatches β€” architectures match perfectly
Preprocessing Agent Compare training vs inference preprocessing 3 critical bugs found (circular mask, border crop, interpolation)
Diagnostic Agent Test raw model outputs on real/garbage inputs Models give 86% confidence on random noise, CLAHE shifts predictions by +43%
Normalization Agent Check if ViT and EfficientNet use different normalization No mismatch β€” both use same fundus stats

Key Diagnostic Results

Test Input ViT Prediction EfficientNet Prediction Ensemble
Synthetic fundus Glaucoma (58.7%) Cataract (91.3%) Glaucoma (69.3%)
Random noise Diabetes/DR (86.2%) Cataract (91.3%) Cataract (61.5%)
Blank black Cataract (71.4%) Cataract (89.8%) Cataract (90.6%)
Blank white Cataract (84.8%) Cataract (83.8%) Cataract (84.1%)

Findings:

  • Models are overconfident on garbage input (inherent, needs retraining to fix)
  • Both models in correct eval() mode, deterministic, no NaN weights
  • BatchNorm running stats populated (4488 batches tracked)
  • Backbone features DO differentiate inputs (cosine similarity 0.13 between fundus and noise)
  • Problem is in classification heads producing high-magnitude logits on anything

4. Clinical Triage System

How It Works

Input Image
    ↓
[Ensemble Prediction] β†’ confidence, class probabilities
[MC Dropout Γ—15]      β†’ epistemic uncertainty, aleatoric uncertainty
[Model Agreement]     β†’ ViT prediction vs EfficientNet prediction
[OOD Detection]       β†’ Mahalanobis distance (if available)
    ↓
Triage Decision

Triage Levels

Level Criteria Clinical Action
AUTO-SCREEN Confidence > 70%, low uncertainty, models agree Routine re-screening in 12 months
PRIORITY REVIEW Confidence 40-70%, OR elevated uncertainty, OR models disagree Schedule specialist review within 2 weeks
URGENT SPECIALIST Confidence < 40%, OR high uncertainty, OR models disagree on disease Refer to specialist within 48 hours
RESCAN NEEDED OOD detected Image quality issue β€” rescan required

5. Research Novelty

Novelty 1: Domain-Adversarial Retinal Screening

Problem: APTOS (Ben Graham) vs ODIR (CLAHE) preprocessing creates domain shift β†’ DR recall only 25.3% Solution: train_dann.py β€” Gradient Reversal Layer forces backbone to learn domain-invariant features Paper angle: "Domain-Invariant Retinal Disease Classification Across Heterogeneous Fundus Image Sources"

Novelty 2: Uncertainty-Guided Clinical Triage

Problem: When should AI auto-screen vs defer to human? Solution: Combine confidence + MC Dropout uncertainty + ensemble disagreement β†’ triage levels Paper angle: "Uncertainty-Aware Retinal Screening: When to Trust the AI and When to Defer"

Novelty 3: RETFound Foundation Model Transfer

Problem: ImageNet features suboptimal for retinal pathology Solution: retfound_backbone.py β€” ViT-Base pretrained on 1.6M retinal images (MAE) Paper angle: "Parameter-Efficient Transfer from Retinal Foundation Models for Small-Dataset Classification"

Novelty 4: Preprocessing-Induced Domain Shift Analysis

Problem: Nobody documented that preprocessing choices create domain shift Finding: CLAHE alone shifts Glaucoma probability by +43.2 percentage points Solution: unified_preprocessing.py β€” single pipeline for all sources Paper angle: Novel contribution β€” measurable evidence of preprocessing-induced domain shift


6. GPU Execution Plan (H200, 4 hours)

Recommended Order

Step Command Time What It Does
1 git pull 1 min Get latest code
2 python unified_preprocessing.py 15 min Rebuild cache with single CLAHE pipeline
3 python train_dann.py 90 min Domain-adversarial training (main accuracy fix)
4 python kfold_cv.py 90 min 5-fold CV for paper (confidence intervals)
5 python knowledge_distillation.py 30 min ViT→ViT-Tiny compression (if time remains)
Total ~3.25 hrs Fits in 4hr H200 window

After GPU Session

  1. Download new model files:
    • outputs_v3/dann/best_model.pth β€” DANN-trained checkpoint
    • outputs_v3/kfold/kfold_results.json β€” CV results for paper
  2. Upload new weights to HuggingFace
  3. Update app.py to load DANN model instead of original
  4. Recalibrate temperature and thresholds on calibration set
  5. Update paper with new numbers

Expected Results After DANN Training

Metric Current (Ensemble) Expected (DANN)
Overall Accuracy 74.7% 80-85%
DR Recall 25.3% 60-75%
Macro F1 0.712 0.75-0.85
Macro AUC 0.951 0.96-0.98
Domain Gap (APTOS vs ODIR) 41.2% difference <10%

7. File Inventory

Modified Files

File Changes
app.py +234 lines: ensemble, TTA, triage, fixed preprocessing, OOD fix
.gitignore Added .gradio/, test artifacts
RUN.md +305 lines: new sections for all changes

New Files

File Lines Category
train_dann.py 1,257 Training β€” Domain-Adversarial Network
prepare_datasets.py 1,606 Training β€” Dataset expansion (5 datasets)
enhanced_augmentation.py 677 Training β€” CutMix, elastic deform, oversampling
retfound_backbone.py 459 Training β€” RETFound foundation model
unified_preprocessing.py 214 Training β€” Unified CLAHE cache rebuild
ARCHITECTURE_DOCUMENT.md β€” Documentation
FINAL_COMPREHENSIVE_REPORT.md β€” Documentation
FUNCTIONAL_DOCUMENT.md β€” Documentation
FUNCTIONAL_TEST_CASE_DOCUMENT.md β€” Documentation
IEEE_RESEARCH_PAPER.md β€” Documentation
SPRINT_RETROSPECTIVE.md β€” Documentation

Published To


8. How to Resume Next Session

cd ~/Desktop/retinal\ eye\ diesease/retina-sense

# Check current state
git log --oneline -5
python -c "import torch; print('CUDA:', torch.cuda.is_available())"

# If on GPU server:
git pull                               # get latest code
python unified_preprocessing.py        # rebuild cache (step 1)
python train_dann.py                   # DANN training (step 2)
python kfold_cv.py                     # K-fold CV (step 3)

# If on local machine:
python app.py                          # launch Gradio demo
# App runs at http://localhost:7860 with ensemble + TTA + triage