tanishq74 commited on
Commit
f140eb7
Β·
verified Β·
1 Parent(s): a1ae690

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +295 -0
README.md ADDED
@@ -0,0 +1,295 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ”¬ RetinaSense-ViT: Deep Learning for Retinal Disease Classification
2
+
3
+ ![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)
4
+ ![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)
5
+ ![License](https://img.shields.io/badge/License-MIT-green.svg)
6
+
7
+ A production-ready deep learning system for multi-disease retinal classification achieving **84.48% accuracy** using Vision Transformers.
8
+
9
+ ## 🎯 Project Overview
10
+
11
+ RetinaSense-ViT is an AI-powered system for automated detection of five major retinal diseases from fundus images:
12
+ - **Normal** (healthy retina)
13
+ - **Diabetic Retinopathy (DR)**
14
+ - **Glaucoma**
15
+ - **Cataract**
16
+ - **Age-related Macular Degeneration (AMD)**
17
+
18
+ ### Key Achievements
19
+ - βœ… **84.48% accuracy** on validation set
20
+ - βœ… **0.840 macro F1** across all classes
21
+ - βœ… **+32% relative improvement** from baseline (63.52% β†’ 84.48%)
22
+ - βœ… **Production-ready** with optimized inference pipeline
23
+
24
+ ## πŸ“Š Performance Metrics
25
+
26
+ | Model | Accuracy | Macro F1 | Best Use Case |
27
+ |-------|----------|----------|---------------|
28
+ | **ViT + Thresholds** (Recommended) | **84.48%** | **0.840** | General screening |
29
+ | ViT Raw | 82.26% | 0.821 | Research baseline |
30
+ | Ensemble | 80.44% | 0.858 | Maximum rare disease detection |
31
+
32
+ ### Per-Class Performance
33
+
34
+ | Disease | F1 Score | Precision | Recall |
35
+ |---------|----------|-----------|--------|
36
+ | Normal | 0.746 | 0.707 | 0.789 |
37
+ | Diabetes/DR | 0.891 | 0.918 | 0.865 |
38
+ | Glaucoma | 0.871 | 0.900 | 0.844 |
39
+ | Cataract | 0.874 | 0.906 | 0.844 |
40
+ | AMD | 0.819 | 0.891 | 0.759 |
41
+
42
+ ## πŸ—οΈ Architecture
43
+
44
+ **Vision Transformer (ViT-Base-Patch16-224)** with multi-task learning:
45
+ - Pre-trained on ImageNet
46
+ - 86M parameters
47
+ - 768-dimensional feature vectors
48
+ - Separate heads for disease classification and severity grading
49
+
50
+ ### Key Technical Features
51
+ - **Ben Graham Preprocessing**: Specialized fundus image preprocessing
52
+ - **Focal Loss**: Handles severe class imbalance (21:1 ratio)
53
+ - **Threshold Optimization**: Per-class decision thresholds
54
+ - **Mixed Precision Training**: Faster training with AMP
55
+ - **Gradient Accumulation**: Effective batch size of 64
56
+
57
+ ## πŸ“ Project Structure
58
+
59
+ ```
60
+ retinasense/
61
+ β”œβ”€β”€ πŸ““ Notebooks
62
+ β”‚ β”œβ”€β”€ RetinaSense_Production.ipynb # Production inference (⭐ START HERE)
63
+ β”‚ β”œβ”€β”€ RetinaSense_ViT_Training.ipynb # Complete training process
64
+ β”‚ └── RetinaSense_Optimized.ipynb # Optimization experiments
65
+ β”‚
66
+ β”œβ”€β”€ 🐍 Training Scripts
67
+ β”‚ β”œβ”€β”€ retinasense_vit.py # ViT training (84.48% accuracy)
68
+ β”‚ β”œβ”€β”€ retinasense_v2_extended.py # Extended CNN training
69
+ β”‚ └── retinasense_fixed.py # Original fixed version
70
+ β”‚
71
+ β”œβ”€β”€ πŸ”§ Optimization Scripts
72
+ β”‚ β”œβ”€β”€ threshold_optimization_vit.py # Per-class thresholds (+2% boost)
73
+ β”‚ β”œβ”€β”€ ensemble_inference.py # Model ensemble evaluation
74
+ β”‚ β”œβ”€β”€ tta_evaluation.py # Test-time augmentation
75
+ β”‚ └── data_analysis.py # Dataset analysis
76
+ β”‚
77
+ β”œβ”€β”€ πŸ“Š Research Documentation
78
+ β”‚ β”œβ”€β”€ PRODUCTION_MODEL_DECISION.md # Final model selection
79
+ β”‚ β”œβ”€β”€ COMPLETE_RESEARCH_REPORT.md # Full research journey
80
+ β”‚ β”œβ”€β”€ TRAINING_NOTEBOOK_GUIDE.md # Training guide
81
+ β”‚ └── FINAL_RESULTS_COMPARISON.md # Performance comparison
82
+ β”‚
83
+ └── πŸ“š Additional Docs
84
+ β”œβ”€β”€ README.md # This file
85
+ └── .gitignore # Git ignore rules
86
+ ```
87
+
88
+ ## πŸš€ Quick Start
89
+
90
+ ### 1. Installation
91
+
92
+ ```bash
93
+ # Clone repository
94
+ git clone https://github.com/Tanishq74/retina-sense.git
95
+ cd retina-sense
96
+
97
+ # Install dependencies
98
+ pip install torch torchvision timm pandas opencv-python scikit-learn matplotlib seaborn tqdm
99
+ ```
100
+
101
+ ### 2. Download Pre-trained Model
102
+
103
+ **Note**: Model files are not included in the repository due to size (331MB). Train your own model or contact for pre-trained weights.
104
+
105
+ ### 3. Run Production Inference
106
+
107
+ ```python
108
+ # Open RetinaSense_Production.ipynb
109
+ jupyter notebook RetinaSense_Production.ipynb
110
+
111
+ # Or use Python script
112
+ from inference import predict_image
113
+
114
+ prediction = predict_image('path/to/fundus_image.jpg')
115
+ print(f"Disease: {prediction['class']}")
116
+ print(f"Confidence: {prediction['confidence']:.2%}")
117
+ ```
118
+
119
+ ## πŸŽ“ Training Your Own Model
120
+
121
+ ### Requirements
122
+ - GPU with 8GB+ VRAM
123
+ - ~8,500 labeled fundus images
124
+ - 2-3 hours training time
125
+
126
+ ### Steps
127
+
128
+ 1. **Prepare Data**: Organize images and create metadata CSV
129
+ ```csv
130
+ image_path,disease_label,severity_label
131
+ images/001.jpg,1,-1
132
+ images/002.jpg,0,-1
133
+ ```
134
+
135
+ 2. **Train Model**:
136
+ ```bash
137
+ python retinasense_vit.py
138
+ ```
139
+
140
+ 3. **Optimize Thresholds** (+2% accuracy boost):
141
+ ```bash
142
+ python threshold_optimization_vit.py
143
+ ```
144
+
145
+ 4. **Evaluate**:
146
+ ```python
147
+ # See RetinaSense_Production.ipynb for evaluation code
148
+ ```
149
+
150
+ ## πŸ“ˆ Research Journey
151
+
152
+ Our research achieved a **+32% relative improvement** through systematic optimization:
153
+
154
+ ```
155
+ Phase 0: Original Baseline
156
+ β”œβ”€ 63.52% accuracy
157
+ └─ Poor minority class performance
158
+
159
+ Phase 1: Threshold Optimization (+10 min)
160
+ β”œβ”€ 73.36% accuracy (+9.84%)
161
+ └─ Insight: Model poorly calibrated
162
+
163
+ Phase 2: Extended Training (+15 min)
164
+ β”œβ”€ 74.18% accuracy (+10.66%)
165
+ └─ Insight: Needed more epochs
166
+
167
+ Phase 3: ViT Architecture (+6 min) ⭐
168
+ β”œβ”€ 82.26% accuracy (+18.74%)
169
+ └─ Insight: Architecture matters most
170
+
171
+ Phase 4: ViT + Threshold Opt (+2 min)
172
+ β”œβ”€ 84.48% accuracy (+20.96%) πŸ†
173
+ └─ PRODUCTION READY
174
+
175
+ Total Time: ~45 min active research + 2-3 hours training
176
+ ```
177
+
178
+ ## πŸ”¬ Key Research Insights
179
+
180
+ 1. **Architecture > Everything**: Switching to ViT provided the biggest gain (+18.74%)
181
+ 2. **Threshold Optimization Works**: Simple per-class thresholds add +2.22%
182
+ 3. **Focal Loss Essential**: Critical for handling 21:1 class imbalance
183
+ 4. **Domain Shift Matters**: APTOS images 10x lower quality than ODIR
184
+ 5. **Ensemble Trade-offs**: Sacrifices 4% accuracy for +10% minority F1
185
+
186
+ ## πŸ“Š Dataset
187
+
188
+ - **Sources**: ODIR-5K + APTOS-2019
189
+ - **Total Images**: 8,540 fundus images
190
+ - **Resolution**: 224Γ—224 (preprocessed)
191
+ - **Class Distribution**:
192
+ - Normal: 2,071 (24%)
193
+ - Diabetes/DR: 5,581 (65%)
194
+ - Glaucoma: 308 (4%)
195
+ - Cataract: 315 (4%)
196
+ - AMD: 265 (3%)
197
+
198
+ **Challenge**: Severe class imbalance (21:1 ratio)
199
+
200
+ ## πŸ› οΈ Technical Stack
201
+
202
+ | Component | Technology |
203
+ |-----------|-----------|
204
+ | **Framework** | PyTorch 2.0+ |
205
+ | **Architecture** | ViT-Base-Patch16-224 (timm) |
206
+ | **Preprocessing** | OpenCV, Ben Graham method |
207
+ | **Training** | Mixed Precision (AMP), Focal Loss |
208
+ | **Optimization** | AdamW, Cosine Annealing LR |
209
+ | **Evaluation** | scikit-learn |
210
+ | **Visualization** | matplotlib, seaborn |
211
+
212
+ ## πŸ“– Documentation
213
+
214
+ Comprehensive documentation available:
215
+
216
+ - **[PRODUCTION_MODEL_DECISION.md](PRODUCTION_MODEL_DECISION.md)**: Final model selection rationale
217
+ - **[TRAINING_NOTEBOOK_GUIDE.md](TRAINING_NOTEBOOK_GUIDE.md)**: Complete training guide
218
+ - **[COMPLETE_RESEARCH_REPORT.md](COMPLETE_RESEARCH_REPORT.md)**: Full research journey (35+ pages)
219
+ - **[FINAL_RESULTS_COMPARISON.md](FINAL_RESULTS_COMPARISON.md)**: Model comparison
220
+
221
+ ## 🎯 Use Cases
222
+
223
+ ### Primary Use Case: General Screening
224
+ - **Model**: ViT + Threshold Optimization
225
+ - **Accuracy**: 84.48%
226
+ - **Speed**: ~15ms per image (66 images/sec)
227
+ - **Best for**: High-volume clinics, community health programs
228
+
229
+ ### Alternative Use Case: Rare Disease Detection
230
+ - **Model**: Ensemble + ViT Thresholds
231
+ - **Accuracy**: 80.44%
232
+ - **Macro F1**: 0.858 (best minorities)
233
+ - **Best for**: Academic medical centers, research studies
234
+
235
+ ## πŸ”’ Clinical Considerations
236
+
237
+ ⚠️ **Important**: This system is intended for research and educational purposes. Not FDA-approved for clinical use. Always consult qualified ophthalmologists for diagnosis.
238
+
239
+ ### Strengths
240
+ - βœ… High sensitivity for diabetic retinopathy (89% F1)
241
+ - βœ… Excellent glaucoma detection (87% F1)
242
+ - βœ… Fast inference (15ms per image)
243
+ - βœ… Handles class imbalance well
244
+
245
+ ### Limitations
246
+ - ⚠️ Lower performance on rare diseases (AMD: 82% F1)
247
+ - ⚠️ Trained primarily on Asian populations (ODIR dataset)
248
+ - ⚠️ May not generalize to different imaging equipment
249
+ - ⚠️ Requires high-quality fundus images
250
+
251
+ ## 🀝 Contributing
252
+
253
+ Contributions welcome! Areas for improvement:
254
+ - External validation on new datasets
255
+ - Support for additional diseases
256
+ - Deployment optimization (TensorRT, ONNX)
257
+ - Mobile/edge deployment
258
+ - Explainability (Grad-CAM, attention maps)
259
+
260
+ ## πŸ“„ License
261
+
262
+ This project is licensed under the MIT License - see LICENSE file for details.
263
+
264
+ ## πŸ™ Acknowledgments
265
+
266
+ - **Datasets**: ODIR-5K, APTOS-2019
267
+ - **Architecture**: Vision Transformer (ViT) by Google Research
268
+ - **Preprocessing**: Ben Graham method from Kaggle competitions
269
+ - **Framework**: PyTorch, timm library
270
+
271
+ ## πŸ“§ Contact
272
+
273
+ **Project Maintainer**: Tanishq
274
+ - GitHub: [@Tanishq74](https://github.com/Tanishq74)
275
+ - Repository: [retina-sense](https://github.com/Tanishq74/retina-sense)
276
+
277
+ ## πŸ“Š Citation
278
+
279
+ If you use this work in your research, please cite:
280
+
281
+ ```bibtex
282
+ @software{retinasense2026,
283
+ title={RetinaSense-ViT: Deep Learning for Retinal Disease Classification},
284
+ author={Tanishq},
285
+ year={2026},
286
+ url={https://github.com/Tanishq74/retina-sense}
287
+ }
288
+ ```
289
+
290
+ ---
291
+
292
+ **Last Updated**: February 2026
293
+ **Status**: βœ… Production Ready
294
+ **Performance**: 84.48% accuracy, 0.840 macro F1
295
+ **License**: MIT