Add FUNCTIONAL_DOCUMENT.md
Browse files- FUNCTIONAL_DOCUMENT.md +272 -0
FUNCTIONAL_DOCUMENT.md
ADDED
|
@@ -0,0 +1,272 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# RetinaSense-ViT: Functional Document β Module Description
|
| 2 |
+
|
| 3 |
+
**Version:** 1.0
|
| 4 |
+
**Date:** March 10, 2026
|
| 5 |
+
**Author:** Tanishq
|
| 6 |
+
**Status:** Production Ready
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## 1. System Overview
|
| 11 |
+
|
| 12 |
+
RetinaSense-ViT is a multi-class retinal disease classification system that analyzes fundus images to detect five conditions: **Normal, Diabetic Retinopathy, Glaucoma, Cataract, and AMD**. The system is organized into the following functional modules:
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
## 2. Module Map
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 20 |
+
β RetinaSense-ViT System β
|
| 21 |
+
β β
|
| 22 |
+
β M1: Data M2: Preprocessing M3: Model β
|
| 23 |
+
β Ingestion Pipeline Architecture β
|
| 24 |
+
β β
|
| 25 |
+
β M4: Training M5: Threshold M6: Inference β
|
| 26 |
+
β Engine Optimization Pipeline β
|
| 27 |
+
β β
|
| 28 |
+
β M7: Ensemble M8: Evaluation M9: Data β
|
| 29 |
+
β System & Visualization Analysis β
|
| 30 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
---
|
| 34 |
+
|
| 35 |
+
## 3. Module Descriptions
|
| 36 |
+
|
| 37 |
+
### M1: Data Ingestion Module
|
| 38 |
+
|
| 39 |
+
**Purpose:** Load, validate, and unify data from ODIR-5K and APTOS-2019 datasets.
|
| 40 |
+
|
| 41 |
+
| Attribute | Detail |
|
| 42 |
+
|-----------|--------|
|
| 43 |
+
| **Input** | ODIR images (512Γ512), APTOS images (~1949Γ1500), metadata CSVs |
|
| 44 |
+
| **Output** | `combined_dataset.csv` with 8,540 entries (path, disease_label, severity_label) |
|
| 45 |
+
| **Key Files** | `final_unified_metadata.csv`, `data/combined_dataset.csv` |
|
| 46 |
+
| **Functions** | Path cleaning (remove `./` prefixes), single-disease filtering, stratified train/val split (80/20) |
|
| 47 |
+
|
| 48 |
+
**Key Logic:**
|
| 49 |
+
- APTOS images are exclusively DR class with 5-level severity grading
|
| 50 |
+
- ODIR images span all 5 disease classes; multi-disease samples are filtered out
|
| 51 |
+
- Paths are normalized for cross-platform compatibility
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
### M2: Preprocessing Pipeline Module
|
| 56 |
+
|
| 57 |
+
**Purpose:** Apply Ben Graham contrast enhancement and caching to prepare images for model input.
|
| 58 |
+
|
| 59 |
+
| Attribute | Detail |
|
| 60 |
+
|-----------|--------|
|
| 61 |
+
| **Input** | Raw fundus image (any resolution) |
|
| 62 |
+
| **Output** | Normalized tensor (3Γ224Γ224) |
|
| 63 |
+
| **Key Files** | All training scripts (ben_graham_preprocess function) |
|
| 64 |
+
| **Dependencies** | OpenCV, NumPy, torchvision transforms |
|
| 65 |
+
|
| 66 |
+
**Functional Steps:**
|
| 67 |
+
1. **Resize** to target resolution (224Γ224 for ViT, 300Γ300 for EfficientNet)
|
| 68 |
+
2. **Ben Graham Enhancement:** `4Γimg β 4ΓGaussianBlur(Ο=10) + 128`
|
| 69 |
+
3. **Circular Mask** application (radius = 0.48 Γ image_size)
|
| 70 |
+
4. **Caching:** Pre-compute and store as `.npy` files (one-time; ~60s for 8,540 images)
|
| 71 |
+
5. **Augmentation** (training only): flip, rotate, affine, color jitter, random erasing
|
| 72 |
+
6. **ImageNet Normalization:** ΞΌ=[0.485,0.456,0.406], Ο=[0.229,0.224,0.225]
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
### M3: Model Architecture Module
|
| 77 |
+
|
| 78 |
+
**Purpose:** Define the neural network architectures for disease classification.
|
| 79 |
+
|
| 80 |
+
#### M3.1: ViT-Base-Patch16-224 (Production Model)
|
| 81 |
+
|
| 82 |
+
| Attribute | Detail |
|
| 83 |
+
|-----------|--------|
|
| 84 |
+
| **Backbone** | ViT-Base-Patch16-224 (timm, pre-trained on ImageNet) |
|
| 85 |
+
| **Parameters** | ~86M |
|
| 86 |
+
| **Feature Dim** | 768 |
|
| 87 |
+
| **Disease Head** | 768β512β256β5 (BatchNorm, ReLU, Dropout) |
|
| 88 |
+
| **Severity Head** | 768β256β5 (BatchNorm, ReLU, Dropout) |
|
| 89 |
+
| **Model Size** | 331 MB |
|
| 90 |
+
|
| 91 |
+
**Why ViT Excels:**
|
| 92 |
+
- Global self-attention captures vessel patterns across the entire fundus
|
| 93 |
+
- Position encoding preserves spatial relationships (optic disc, macula location)
|
| 94 |
+
- Handles APTOS/ODIR domain shift better than CNNs (less texture-dependent)
|
| 95 |
+
- Superior on minority classes: Glaucoma +144%, AMD +199% over CNN baseline
|
| 96 |
+
|
| 97 |
+
#### M3.2: EfficientNet-B3 (Backup Model)
|
| 98 |
+
|
| 99 |
+
| Attribute | Detail |
|
| 100 |
+
|-----------|--------|
|
| 101 |
+
| **Backbone** | EfficientNet-B3 (timm, pre-trained on ImageNet) |
|
| 102 |
+
| **Parameters** | ~12M |
|
| 103 |
+
| **Feature Dim** | 1,536 |
|
| 104 |
+
| **Model Size** | 47 MB |
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
### M4: Training Engine Module
|
| 109 |
+
|
| 110 |
+
**Purpose:** Train the model with class-imbalance-aware strategies and GPU optimization.
|
| 111 |
+
|
| 112 |
+
| Attribute | Detail |
|
| 113 |
+
|-----------|--------|
|
| 114 |
+
| **Key Files** | `retinasense_vit.py`, `retinasense_v2_extended.py`, `retinasense_v2.py` |
|
| 115 |
+
| **Loss Function** | Focal Loss (Ξ³=1.0, Ξ±=class_weights) + 0.2ΓCE (severity) |
|
| 116 |
+
| **Optimizer** | AdamW (lr=3Γ10β»β΄) |
|
| 117 |
+
| **Scheduler** | Cosine Annealing (T_max=30, Ξ·_min=1Γ10β»β·) |
|
| 118 |
+
| **Mixed Precision** | AMP with GradScaler |
|
| 119 |
+
| **Gradient Accumulation** | 2 steps (effective batch=64) |
|
| 120 |
+
| **Early Stopping** | Patience=10 on macro F1 |
|
| 121 |
+
|
| 122 |
+
**GPU Optimization Features:**
|
| 123 |
+
- Pre-cached preprocessing (100Γ faster data loading)
|
| 124 |
+
- Batch size scaling (32β128 for raw speed, 64 recommended for stability)
|
| 125 |
+
- 8 DataLoader workers with persistent_workers and prefetch_factor=2
|
| 126 |
+
- Non-blocking GPU transfers
|
| 127 |
+
|
| 128 |
+
**Training Duration:**
|
| 129 |
+
- ViT: ~6 minutes (30 epochs on H200)
|
| 130 |
+
- EfficientNet-B3 Extended: ~15 minutes (50 epochs)
|
| 131 |
+
|
| 132 |
+
---
|
| 133 |
+
|
| 134 |
+
### M5: Threshold Optimization Module
|
| 135 |
+
|
| 136 |
+
**Purpose:** Post-training optimization of per-class decision thresholds to maximize F1 score.
|
| 137 |
+
|
| 138 |
+
| Attribute | Detail |
|
| 139 |
+
|-----------|--------|
|
| 140 |
+
| **Key Files** | `threshold_optimization_vit.py`, `threshold_optimization_simple.py` |
|
| 141 |
+
| **Method** | Grid search (0.05β0.95, step 0.05) per class, one-vs-rest binary F1 |
|
| 142 |
+
| **Input** | Model softmax probabilities on validation set |
|
| 143 |
+
| **Output** | JSON file with optimal thresholds per class |
|
| 144 |
+
|
| 145 |
+
**Optimal Thresholds (ViT, Accuracy-focused):**
|
| 146 |
+
|
| 147 |
+
| Class | Threshold | Clinical Rationale |
|
| 148 |
+
|-------|-----------|-------------------|
|
| 149 |
+
| Normal | 0.540 | Balanced |
|
| 150 |
+
| Diabetes/DR | 0.240 | Lenient β high sensitivity (catch all DR) |
|
| 151 |
+
| Glaucoma | 0.810 | Strict β high specificity (require confidence) |
|
| 152 |
+
| Cataract | 0.930 | Very strict β minimize false positives |
|
| 153 |
+
| AMD | 0.850 | Strict β rare disease, need confidence |
|
| 154 |
+
|
| 155 |
+
**Impact:** +2.22% accuracy for ViT (82.26β84.48%); +9.84% for v2 baseline (63.52β73.36%).
|
| 156 |
+
|
| 157 |
+
---
|
| 158 |
+
|
| 159 |
+
### M6: Inference Pipeline Module
|
| 160 |
+
|
| 161 |
+
**Purpose:** Classify new fundus images using the trained model and optimized thresholds.
|
| 162 |
+
|
| 163 |
+
| Attribute | Detail |
|
| 164 |
+
|-----------|--------|
|
| 165 |
+
| **Key Files** | `RetinaSense_Production.ipynb` |
|
| 166 |
+
| **Latency** | ~15ms per image |
|
| 167 |
+
| **Throughput** | ~66 images/sec |
|
| 168 |
+
| **GPU Memory** | ~2 GB |
|
| 169 |
+
|
| 170 |
+
**Inference Flow:**
|
| 171 |
+
1. Load and preprocess image (Ben Graham)
|
| 172 |
+
2. Forward pass through ViT β disease logits + severity logits
|
| 173 |
+
3. Apply softmax β class probabilities
|
| 174 |
+
4. Apply per-class thresholds β final prediction
|
| 175 |
+
5. If confidence < threshold for all classes β flag for expert review
|
| 176 |
+
6. Return: class label, confidence score, all probabilities
|
| 177 |
+
|
| 178 |
+
---
|
| 179 |
+
|
| 180 |
+
### M7: Ensemble System Module
|
| 181 |
+
|
| 182 |
+
**Purpose:** Combine predictions from multiple models for improved minority class detection.
|
| 183 |
+
|
| 184 |
+
| Attribute | Detail |
|
| 185 |
+
|-----------|--------|
|
| 186 |
+
| **Key Files** | `ensemble_inference.py` |
|
| 187 |
+
| **Models** | ViT (85%), EfficientNet-Extended (10%), EfficientNet-v2 (5%) |
|
| 188 |
+
| **Strategy** | Weighted probability averaging |
|
| 189 |
+
|
| 190 |
+
**Performance Trade-off:**
|
| 191 |
+
- Ensemble: 80.44% accuracy, 0.858 macro F1, Cataract F1=0.952, AMD F1=0.920
|
| 192 |
+
- ViT Solo: 84.48% accuracy, 0.840 macro F1 (simpler, faster, recommended)
|
| 193 |
+
|
| 194 |
+
---
|
| 195 |
+
|
| 196 |
+
### M8: Evaluation & Visualization Module
|
| 197 |
+
|
| 198 |
+
**Purpose:** Comprehensive model evaluation with per-class metrics and visual dashboards.
|
| 199 |
+
|
| 200 |
+
| Attribute | Detail |
|
| 201 |
+
|-----------|--------|
|
| 202 |
+
| **Key Files** | Training scripts (eval sections), `RetinaSense_Production.ipynb` |
|
| 203 |
+
| **Primary Metrics** | Macro F1, accuracy, per-class F1/precision/recall |
|
| 204 |
+
| **Secondary Metrics** | Weighted F1, Macro AUC-ROC, confusion matrix |
|
| 205 |
+
| **Outputs** | `dashboard.png`, `threshold_comparison.png`, `training_curves.png` |
|
| 206 |
+
|
| 207 |
+
**Why Macro F1 (not accuracy):** Accuracy is misleading with 21:1 class imbalance (65% accuracy by always predicting DR). Macro F1 treats all classes equally.
|
| 208 |
+
|
| 209 |
+
---
|
| 210 |
+
|
| 211 |
+
### M9: Data Analysis Module
|
| 212 |
+
|
| 213 |
+
**Purpose:** Comprehensive dataset exploration to inform training strategy.
|
| 214 |
+
|
| 215 |
+
| Attribute | Detail |
|
| 216 |
+
|-----------|--------|
|
| 217 |
+
| **Key Files** | `data_analysis.py` |
|
| 218 |
+
| **Outputs** | `outputs_analysis/` (11 files: plots, reports, CSVs) |
|
| 219 |
+
|
| 220 |
+
**Analyses Performed:**
|
| 221 |
+
1. **Class distribution** β confirmed 21.1Γ imbalance
|
| 222 |
+
2. **Image quality metrics** β brightness, contrast, sharpness per class
|
| 223 |
+
3. **APTOS domain shift discovery** β 10.7Γ sharpness difference vs ODIR
|
| 224 |
+
4. **Error analysis** β most-confused class pairs (DRβNormal, NormalβAMD)
|
| 225 |
+
5. **Augmentation effectiveness** β light augmentation best during warmup
|
| 226 |
+
6. **Preprocessing impact** β Ben Graham boosts Glaucoma brightness most (+34.2)
|
| 227 |
+
|
| 228 |
+
---
|
| 229 |
+
|
| 230 |
+
## 4. Module Interaction Matrix
|
| 231 |
+
|
| 232 |
+
| From \ To | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
|
| 233 |
+
|-----------|----|----|----|----|----|----|----|----|-----|
|
| 234 |
+
| **M1** Data | β | β | | | | | | | β |
|
| 235 |
+
| **M2** Preprocess | | β | | β | | β | | | |
|
| 236 |
+
| **M3** Model | | | β | β | | β | β | | |
|
| 237 |
+
| **M4** Training | | | β | β | β | | | β | |
|
| 238 |
+
| **M5** Threshold | | | | | β | β | β | β | |
|
| 239 |
+
| **M6** Inference | | β | β | | β | β | | | |
|
| 240 |
+
| **M7** Ensemble | | | β | | β | | β | β | |
|
| 241 |
+
| **M8** Evaluation | | | | | | | | β | |
|
| 242 |
+
| **M9** Analysis | β | | | | | | | | β |
|
| 243 |
+
|
| 244 |
+
---
|
| 245 |
+
|
| 246 |
+
## 5. Test-Time Augmentation (TTA) Sub-Module
|
| 247 |
+
|
| 248 |
+
**Purpose:** Improve predictions by averaging over augmented versions of the input.
|
| 249 |
+
|
| 250 |
+
**8 Augmentations:** Original, H-flip, V-flip, Both flips, Rot 90Β°, Rot 180Β°, Rot 270Β°, Brightness
|
| 251 |
+
**Impact:** +0.29% accuracy (modest; optional for production)
|
| 252 |
+
**Trade-off:** 8Γ slower inference
|
| 253 |
+
**Recommendation:** Use selectively for uncertain cases (confidence < threshold)
|
| 254 |
+
|
| 255 |
+
---
|
| 256 |
+
|
| 257 |
+
## 6. Configuration Parameters
|
| 258 |
+
|
| 259 |
+
| Parameter | Default | Range | Notes |
|
| 260 |
+
|-----------|---------|-------|-------|
|
| 261 |
+
| `IMG_SIZE` | 224 | 224β300 | 224 for ViT, 300 for EfficientNet |
|
| 262 |
+
| `BATCH_SIZE` | 32 | 16β128 | 64 recommended for stability |
|
| 263 |
+
| `NUM_WORKERS` | 8 | 0β16 | Match to CPU cores |
|
| 264 |
+
| `USE_CACHE` | True | True/False | 4Γ speedup when True |
|
| 265 |
+
| `EPOCHS` | 30 | 10β100 | ViT converges by 30 |
|
| 266 |
+
| `ACCUM_STEPS` | 2 | 1β8 | Gradient accumulation factor |
|
| 267 |
+
| `PATIENCE` | 10 | 5β15 | Early stopping on macro F1 |
|
| 268 |
+
| `FOCAL_GAMMA` | 1.0 | 0.5β3.0 | Focusing parameter for class imbalance |
|
| 269 |
+
|
| 270 |
+
---
|
| 271 |
+
|
| 272 |
+
*Document Version: 1.0 | Last Updated: March 10, 2026*
|