tanishq74 commited on
Commit
a23112d
Β·
verified Β·
1 Parent(s): b87a70f

Add FUNCTIONAL_DOCUMENT.md

Browse files
Files changed (1) hide show
  1. FUNCTIONAL_DOCUMENT.md +272 -0
FUNCTIONAL_DOCUMENT.md ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RetinaSense-ViT: Functional Document β€” Module Description
2
+
3
+ **Version:** 1.0
4
+ **Date:** March 10, 2026
5
+ **Author:** Tanishq
6
+ **Status:** Production Ready
7
+
8
+ ---
9
+
10
+ ## 1. System Overview
11
+
12
+ RetinaSense-ViT is a multi-class retinal disease classification system that analyzes fundus images to detect five conditions: **Normal, Diabetic Retinopathy, Glaucoma, Cataract, and AMD**. The system is organized into the following functional modules:
13
+
14
+ ---
15
+
16
+ ## 2. Module Map
17
+
18
+ ```
19
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
20
+ β”‚ RetinaSense-ViT System β”‚
21
+ β”‚ β”‚
22
+ β”‚ M1: Data M2: Preprocessing M3: Model β”‚
23
+ β”‚ Ingestion Pipeline Architecture β”‚
24
+ β”‚ β”‚
25
+ β”‚ M4: Training M5: Threshold M6: Inference β”‚
26
+ β”‚ Engine Optimization Pipeline β”‚
27
+ β”‚ β”‚
28
+ β”‚ M7: Ensemble M8: Evaluation M9: Data β”‚
29
+ β”‚ System & Visualization Analysis β”‚
30
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
31
+ ```
32
+
33
+ ---
34
+
35
+ ## 3. Module Descriptions
36
+
37
+ ### M1: Data Ingestion Module
38
+
39
+ **Purpose:** Load, validate, and unify data from ODIR-5K and APTOS-2019 datasets.
40
+
41
+ | Attribute | Detail |
42
+ |-----------|--------|
43
+ | **Input** | ODIR images (512Γ—512), APTOS images (~1949Γ—1500), metadata CSVs |
44
+ | **Output** | `combined_dataset.csv` with 8,540 entries (path, disease_label, severity_label) |
45
+ | **Key Files** | `final_unified_metadata.csv`, `data/combined_dataset.csv` |
46
+ | **Functions** | Path cleaning (remove `./` prefixes), single-disease filtering, stratified train/val split (80/20) |
47
+
48
+ **Key Logic:**
49
+ - APTOS images are exclusively DR class with 5-level severity grading
50
+ - ODIR images span all 5 disease classes; multi-disease samples are filtered out
51
+ - Paths are normalized for cross-platform compatibility
52
+
53
+ ---
54
+
55
+ ### M2: Preprocessing Pipeline Module
56
+
57
+ **Purpose:** Apply Ben Graham contrast enhancement and caching to prepare images for model input.
58
+
59
+ | Attribute | Detail |
60
+ |-----------|--------|
61
+ | **Input** | Raw fundus image (any resolution) |
62
+ | **Output** | Normalized tensor (3Γ—224Γ—224) |
63
+ | **Key Files** | All training scripts (ben_graham_preprocess function) |
64
+ | **Dependencies** | OpenCV, NumPy, torchvision transforms |
65
+
66
+ **Functional Steps:**
67
+ 1. **Resize** to target resolution (224Γ—224 for ViT, 300Γ—300 for EfficientNet)
68
+ 2. **Ben Graham Enhancement:** `4Γ—img βˆ’ 4Γ—GaussianBlur(Οƒ=10) + 128`
69
+ 3. **Circular Mask** application (radius = 0.48 Γ— image_size)
70
+ 4. **Caching:** Pre-compute and store as `.npy` files (one-time; ~60s for 8,540 images)
71
+ 5. **Augmentation** (training only): flip, rotate, affine, color jitter, random erasing
72
+ 6. **ImageNet Normalization:** ΞΌ=[0.485,0.456,0.406], Οƒ=[0.229,0.224,0.225]
73
+
74
+ ---
75
+
76
+ ### M3: Model Architecture Module
77
+
78
+ **Purpose:** Define the neural network architectures for disease classification.
79
+
80
+ #### M3.1: ViT-Base-Patch16-224 (Production Model)
81
+
82
+ | Attribute | Detail |
83
+ |-----------|--------|
84
+ | **Backbone** | ViT-Base-Patch16-224 (timm, pre-trained on ImageNet) |
85
+ | **Parameters** | ~86M |
86
+ | **Feature Dim** | 768 |
87
+ | **Disease Head** | 768β†’512β†’256β†’5 (BatchNorm, ReLU, Dropout) |
88
+ | **Severity Head** | 768β†’256β†’5 (BatchNorm, ReLU, Dropout) |
89
+ | **Model Size** | 331 MB |
90
+
91
+ **Why ViT Excels:**
92
+ - Global self-attention captures vessel patterns across the entire fundus
93
+ - Position encoding preserves spatial relationships (optic disc, macula location)
94
+ - Handles APTOS/ODIR domain shift better than CNNs (less texture-dependent)
95
+ - Superior on minority classes: Glaucoma +144%, AMD +199% over CNN baseline
96
+
97
+ #### M3.2: EfficientNet-B3 (Backup Model)
98
+
99
+ | Attribute | Detail |
100
+ |-----------|--------|
101
+ | **Backbone** | EfficientNet-B3 (timm, pre-trained on ImageNet) |
102
+ | **Parameters** | ~12M |
103
+ | **Feature Dim** | 1,536 |
104
+ | **Model Size** | 47 MB |
105
+
106
+ ---
107
+
108
+ ### M4: Training Engine Module
109
+
110
+ **Purpose:** Train the model with class-imbalance-aware strategies and GPU optimization.
111
+
112
+ | Attribute | Detail |
113
+ |-----------|--------|
114
+ | **Key Files** | `retinasense_vit.py`, `retinasense_v2_extended.py`, `retinasense_v2.py` |
115
+ | **Loss Function** | Focal Loss (Ξ³=1.0, Ξ±=class_weights) + 0.2Γ—CE (severity) |
116
+ | **Optimizer** | AdamW (lr=3Γ—10⁻⁴) |
117
+ | **Scheduler** | Cosine Annealing (T_max=30, Ξ·_min=1Γ—10⁻⁷) |
118
+ | **Mixed Precision** | AMP with GradScaler |
119
+ | **Gradient Accumulation** | 2 steps (effective batch=64) |
120
+ | **Early Stopping** | Patience=10 on macro F1 |
121
+
122
+ **GPU Optimization Features:**
123
+ - Pre-cached preprocessing (100Γ— faster data loading)
124
+ - Batch size scaling (32β†’128 for raw speed, 64 recommended for stability)
125
+ - 8 DataLoader workers with persistent_workers and prefetch_factor=2
126
+ - Non-blocking GPU transfers
127
+
128
+ **Training Duration:**
129
+ - ViT: ~6 minutes (30 epochs on H200)
130
+ - EfficientNet-B3 Extended: ~15 minutes (50 epochs)
131
+
132
+ ---
133
+
134
+ ### M5: Threshold Optimization Module
135
+
136
+ **Purpose:** Post-training optimization of per-class decision thresholds to maximize F1 score.
137
+
138
+ | Attribute | Detail |
139
+ |-----------|--------|
140
+ | **Key Files** | `threshold_optimization_vit.py`, `threshold_optimization_simple.py` |
141
+ | **Method** | Grid search (0.05–0.95, step 0.05) per class, one-vs-rest binary F1 |
142
+ | **Input** | Model softmax probabilities on validation set |
143
+ | **Output** | JSON file with optimal thresholds per class |
144
+
145
+ **Optimal Thresholds (ViT, Accuracy-focused):**
146
+
147
+ | Class | Threshold | Clinical Rationale |
148
+ |-------|-----------|-------------------|
149
+ | Normal | 0.540 | Balanced |
150
+ | Diabetes/DR | 0.240 | Lenient β†’ high sensitivity (catch all DR) |
151
+ | Glaucoma | 0.810 | Strict β†’ high specificity (require confidence) |
152
+ | Cataract | 0.930 | Very strict β†’ minimize false positives |
153
+ | AMD | 0.850 | Strict β†’ rare disease, need confidence |
154
+
155
+ **Impact:** +2.22% accuracy for ViT (82.26β†’84.48%); +9.84% for v2 baseline (63.52β†’73.36%).
156
+
157
+ ---
158
+
159
+ ### M6: Inference Pipeline Module
160
+
161
+ **Purpose:** Classify new fundus images using the trained model and optimized thresholds.
162
+
163
+ | Attribute | Detail |
164
+ |-----------|--------|
165
+ | **Key Files** | `RetinaSense_Production.ipynb` |
166
+ | **Latency** | ~15ms per image |
167
+ | **Throughput** | ~66 images/sec |
168
+ | **GPU Memory** | ~2 GB |
169
+
170
+ **Inference Flow:**
171
+ 1. Load and preprocess image (Ben Graham)
172
+ 2. Forward pass through ViT β†’ disease logits + severity logits
173
+ 3. Apply softmax β†’ class probabilities
174
+ 4. Apply per-class thresholds β†’ final prediction
175
+ 5. If confidence < threshold for all classes β†’ flag for expert review
176
+ 6. Return: class label, confidence score, all probabilities
177
+
178
+ ---
179
+
180
+ ### M7: Ensemble System Module
181
+
182
+ **Purpose:** Combine predictions from multiple models for improved minority class detection.
183
+
184
+ | Attribute | Detail |
185
+ |-----------|--------|
186
+ | **Key Files** | `ensemble_inference.py` |
187
+ | **Models** | ViT (85%), EfficientNet-Extended (10%), EfficientNet-v2 (5%) |
188
+ | **Strategy** | Weighted probability averaging |
189
+
190
+ **Performance Trade-off:**
191
+ - Ensemble: 80.44% accuracy, 0.858 macro F1, Cataract F1=0.952, AMD F1=0.920
192
+ - ViT Solo: 84.48% accuracy, 0.840 macro F1 (simpler, faster, recommended)
193
+
194
+ ---
195
+
196
+ ### M8: Evaluation & Visualization Module
197
+
198
+ **Purpose:** Comprehensive model evaluation with per-class metrics and visual dashboards.
199
+
200
+ | Attribute | Detail |
201
+ |-----------|--------|
202
+ | **Key Files** | Training scripts (eval sections), `RetinaSense_Production.ipynb` |
203
+ | **Primary Metrics** | Macro F1, accuracy, per-class F1/precision/recall |
204
+ | **Secondary Metrics** | Weighted F1, Macro AUC-ROC, confusion matrix |
205
+ | **Outputs** | `dashboard.png`, `threshold_comparison.png`, `training_curves.png` |
206
+
207
+ **Why Macro F1 (not accuracy):** Accuracy is misleading with 21:1 class imbalance (65% accuracy by always predicting DR). Macro F1 treats all classes equally.
208
+
209
+ ---
210
+
211
+ ### M9: Data Analysis Module
212
+
213
+ **Purpose:** Comprehensive dataset exploration to inform training strategy.
214
+
215
+ | Attribute | Detail |
216
+ |-----------|--------|
217
+ | **Key Files** | `data_analysis.py` |
218
+ | **Outputs** | `outputs_analysis/` (11 files: plots, reports, CSVs) |
219
+
220
+ **Analyses Performed:**
221
+ 1. **Class distribution** β€” confirmed 21.1Γ— imbalance
222
+ 2. **Image quality metrics** β€” brightness, contrast, sharpness per class
223
+ 3. **APTOS domain shift discovery** β€” 10.7Γ— sharpness difference vs ODIR
224
+ 4. **Error analysis** β€” most-confused class pairs (DR↔Normal, Normal↔AMD)
225
+ 5. **Augmentation effectiveness** β€” light augmentation best during warmup
226
+ 6. **Preprocessing impact** β€” Ben Graham boosts Glaucoma brightness most (+34.2)
227
+
228
+ ---
229
+
230
+ ## 4. Module Interaction Matrix
231
+
232
+ | From \ To | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
233
+ |-----------|----|----|----|----|----|----|----|----|-----|
234
+ | **M1** Data | β€” | βœ“ | | | | | | | βœ“ |
235
+ | **M2** Preprocess | | β€” | | βœ“ | | βœ“ | | | |
236
+ | **M3** Model | | | β€” | βœ“ | | βœ“ | βœ“ | | |
237
+ | **M4** Training | | | βœ“ | β€” | βœ“ | | | βœ“ | |
238
+ | **M5** Threshold | | | | | β€” | βœ“ | βœ“ | βœ“ | |
239
+ | **M6** Inference | | βœ“ | βœ“ | | βœ“ | β€” | | | |
240
+ | **M7** Ensemble | | | βœ“ | | βœ“ | | β€” | βœ“ | |
241
+ | **M8** Evaluation | | | | | | | | β€” | |
242
+ | **M9** Analysis | βœ“ | | | | | | | | β€” |
243
+
244
+ ---
245
+
246
+ ## 5. Test-Time Augmentation (TTA) Sub-Module
247
+
248
+ **Purpose:** Improve predictions by averaging over augmented versions of the input.
249
+
250
+ **8 Augmentations:** Original, H-flip, V-flip, Both flips, Rot 90Β°, Rot 180Β°, Rot 270Β°, Brightness
251
+ **Impact:** +0.29% accuracy (modest; optional for production)
252
+ **Trade-off:** 8Γ— slower inference
253
+ **Recommendation:** Use selectively for uncertain cases (confidence < threshold)
254
+
255
+ ---
256
+
257
+ ## 6. Configuration Parameters
258
+
259
+ | Parameter | Default | Range | Notes |
260
+ |-----------|---------|-------|-------|
261
+ | `IMG_SIZE` | 224 | 224–300 | 224 for ViT, 300 for EfficientNet |
262
+ | `BATCH_SIZE` | 32 | 16–128 | 64 recommended for stability |
263
+ | `NUM_WORKERS` | 8 | 0–16 | Match to CPU cores |
264
+ | `USE_CACHE` | True | True/False | 4Γ— speedup when True |
265
+ | `EPOCHS` | 30 | 10–100 | ViT converges by 30 |
266
+ | `ACCUM_STEPS` | 2 | 1–8 | Gradient accumulation factor |
267
+ | `PATIENCE` | 10 | 5–15 | Early stopping on macro F1 |
268
+ | `FOCAL_GAMMA` | 1.0 | 0.5–3.0 | Focusing parameter for class imbalance |
269
+
270
+ ---
271
+
272
+ *Document Version: 1.0 | Last Updated: March 10, 2026*