sachinchandrankallar commited on
Commit
202f345
·
1 Parent(s): 0b59a2f
Files changed (35) hide show
  1. .vscode/settings.json +3 -0
  2. API_PROMPT_RESPONSE_UPDATE.md +0 -77
  3. CONTAINER_OPTIMIZATION_SUMMARY.md +0 -155
  4. CRITICAL_DEPLOYMENT_FIX.md +0 -288
  5. DEPLOYMENT.md +0 -106
  6. DEPLOYMENT_FIX_SUMMARY.md +0 -132
  7. DEVELOPMENT.md +0 -377
  8. DEVICE_PARAMETER_FIX_SUMMARY.md +0 -136
  9. FIX_404_SUMMARY.md +0 -170
  10. GPU_CONFIGURATION_GUIDE.md +0 -169
  11. HF_SPACES_FIXES_APPLIED.md +0 -416
  12. HF_SPACES_ISSUES_REPORT.md +0 -209
  13. HF_SPACES_RUNTIME_FIX_SUMMARY.md +0 -81
  14. HUGGINGFACE_DEPLOYMENT_FIX.md +0 -168
  15. QUICK_REFERENCE.md +0 -157
  16. README.md +348 -70
  17. README_HF_SPACES.md +0 -72
  18. SCAN_SUMMARY.md +0 -294
  19. STREAMING_FIX_SUMMARY.md +0 -175
  20. TODO.md +0 -12
  21. requirements.txt +16 -10
  22. services/ai-service/src/ai_med_extract/__pycache__/app.cpython-311.pyc +0 -0
  23. services/ai-service/src/ai_med_extract/api/__pycache__/routes_fastapi.cpython-311.pyc +0 -0
  24. services/ai-service/src/ai_med_extract/api/routes_fastapi.py +81 -52
  25. services/ai-service/src/ai_med_extract/utils/__pycache__/model_config.cpython-311.pyc +0 -0
  26. services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_gguf.cpython-311.pyc +0 -0
  27. services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_spaces.cpython-311.pyc +0 -0
  28. services/ai-service/src/ai_med_extract/utils/__pycache__/model_manager.cpython-311.pyc +0 -0
  29. services/ai-service/src/ai_med_extract/utils/__pycache__/openvino_summarizer_utils.cpython-311.pyc +0 -0
  30. services/ai-service/src/ai_med_extract/utils/model_config.py +102 -7
  31. services/ai-service/src/ai_med_extract/utils/model_loader_spaces.py +11 -4
  32. services/ai-service/src/ai_med_extract/utils/model_manager.py +102 -5
  33. services/ai-service/src/ai_med_extract/utils/openvino_summarizer_utils.py +41 -0
  34. test_device_fix.py +3 -0
  35. test_hf_spaces_fix.py +4 -1
.vscode/settings.json CHANGED
@@ -1,5 +1,8 @@
1
  {
2
  "python.analysis.extraPaths": [
3
  "./ai_med_extract/utils"
 
 
 
4
  ]
5
  }
 
1
  {
2
  "python.analysis.extraPaths": [
3
  "./ai_med_extract/utils"
4
+ ],
5
+ "cursorpyright.analysis.extraPaths": [
6
+ "./ai_med_extract/utils"
7
  ]
8
  }
API_PROMPT_RESPONSE_UPDATE.md DELETED
@@ -1,77 +0,0 @@
1
- # API Response Update: Added Full Prompt to LLM Responses
2
-
3
- ## Overview
4
- Updated all API endpoints to include the full prompt that was passed to the LLM in the response, along with the summary and other values.
5
-
6
- ## Changes Made
7
-
8
- ### 1. GGUF Model Response (`routes_fastapi.py`)
9
- **Location**: Line 642
10
- **Change**: Added `"prompt": full_prompt` to the result dictionary
11
- **Prompt Source**: `full_prompt` variable (lines 551-573) - Contains the complete system prompt with patient data
12
-
13
- ### 2. Text-Generation Model Response (`routes_fastapi.py`)
14
- **Location**: Line 726
15
- **Change**: Added `"prompt": prompt` to the result dictionary
16
- **Prompt Source**: `prompt` variable (line 699) - Built using `build_main_prompt(baseline, delta_text)`
17
-
18
- ### 3. Summarization Model Response (`routes_fastapi.py`)
19
- **Location**: Line 785
20
- **Change**: Added `"prompt": context` to the result dictionary
21
- **Prompt Source**: `context` variable (line 755) - Contains "Patient Data:\nBaseline: {baseline}\nChanges: {delta_text}"
22
-
23
- ### 4. Seq2Seq Model Response (`routes_fastapi.py`)
24
- **Location**: Line 833
25
- **Change**: Added `"prompt": context` to the result dictionary
26
- **Prompt Source**: `context` variable (line 807) - Contains "Patient Data:\nBaseline: {baseline}\nChanges: {delta_text}"
27
-
28
- ### 5. OpenVINO Patient Summary Endpoint (`routes_fastapi.py`)
29
- **Location**: Line 1960
30
- **Change**: Added `"prompt": prompt` to the JSONResponse content
31
- **Prompt Source**: `prompt` variable (line 1919) - Built using `build_main_prompt(baseline, delta_text, patient_info)`
32
-
33
- ### 6. Model Management API (`model_management_fastapi.py`)
34
- **Location**: Line 104
35
- **Change**: Added `"prompt": prompt` to the JSONResponse content
36
- **Prompt Source**: `prompt` variable from request data (line 78)
37
-
38
- ## Response Structure
39
-
40
- All API responses now include the following structure:
41
-
42
- ```json
43
- {
44
- "summary": "Generated summary text...",
45
- "baseline": "Baseline patient data...",
46
- "delta": "Delta/changes data...",
47
- "prompt": "Full prompt passed to LLM...",
48
- "timing": {
49
- "ehr_api": 0.8,
50
- "generation": 15.2,
51
- "total": 16.0
52
- },
53
- "model_used": "model_name (model_type)",
54
- "timeout_mode_used": "normal"
55
- }
56
- ```
57
-
58
- ## Benefits
59
-
60
- 1. **Transparency**: Users can see exactly what prompt was sent to the LLM
61
- 2. **Debugging**: Easier to debug issues by examining the full prompt
62
- 3. **Reproducibility**: Users can reproduce results by using the same prompt
63
- 4. **Audit Trail**: Complete record of what was sent to the model
64
- 5. **Quality Control**: Users can verify the prompt quality and make improvements
65
-
66
- ## Affected Endpoints
67
-
68
- - `POST /generate_patient_summary` - Main patient summary endpoint
69
- - `POST /api/patient_summary_openvino` - OpenVINO-specific endpoint
70
- - `POST /api/models/generate` - Model management API
71
- - All model types: GGUF, text-generation, summarization, seq2seq
72
-
73
- ## Backward Compatibility
74
-
75
- ✅ **Fully backward compatible** - Existing fields remain unchanged, only added new `prompt` field
76
- ✅ **No breaking changes** - All existing API consumers will continue to work
77
- ✅ **Optional field** - The `prompt` field is additional information, not required for basic functionality
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CONTAINER_OPTIMIZATION_SUMMARY.md DELETED
@@ -1,155 +0,0 @@
1
- # Container Log Diagnostic & Optimization Summary
2
-
3
- ## Issues Identified and Fixed
4
-
5
- ### 1️⃣ Transformers / Accelerate Dependency Mismatch ✅ FIXED
6
- **Problem**: Models like flan-t5-large and bart-large-cnn failed to load due to missing accelerate and pipeline API incompatibility.
7
-
8
- **Solution Applied**:
9
- - Updated `requirements.txt` to use compatible versions:
10
- - `transformers>=4.42.0` (was 4.53.3)
11
- - `accelerate>=0.30.0` (was 0.25.0)
12
- - Modified `model_manager.py` to handle pipeline creation with minimal parameters to avoid `assistant_model` issues
13
-
14
- ### 2️⃣ GGUF Preload Path Missing ✅ FIXED
15
- **Problem**: GGUF model preload fails because the model directory doesn't exist yet.
16
-
17
- **Solution Applied**:
18
- - Disabled GGUF preload by default in `app.py`
19
- - Added proper fallback handling in GGUF model loader
20
- - Set `PRELOAD_GGUF=false` environment variable
21
-
22
- ### 3️⃣ OpenVINO Telemetry Write Failure ✅ FIXED
23
- **Problem**: The OpenVINO runtime cannot write telemetry files to `/`.
24
-
25
- **Solution Applied**:
26
- - Added `OPENVINO_TELEMETRY_DIR=/tmp/openvino_telemetry` environment variable
27
- - Created writable directory `/tmp/openvino_telemetry` with proper permissions
28
- - Updated entrypoint script to set telemetry directory
29
-
30
- ### 4️⃣ Invalid OMP_NUM_THREADS Setting ✅ FIXED
31
- **Problem**: OpenMP runtime throws "Invalid value for environment variable OMP_NUM_THREADS".
32
-
33
- **Solution Applied**:
34
- - Set `OMP_NUM_THREADS=4` in environment variables
35
- - Added dynamic setting in entrypoint script
36
- - Configured related threading variables: `MKL_NUM_THREADS=4`, `NUMEXPR_NUM_THREADS=4`
37
-
38
- ### 5️⃣ /tmp Permission Denied During Cache Cleanup ✅ FIXED
39
- **Problem**: Entry script attempts to change /tmp permissions (chmod /tmp), not allowed in restricted environments.
40
-
41
- **Solution Applied**:
42
- - Modified entrypoint script to clean only specific cache directories
43
- - Removed `chmod /tmp` command that was causing permission errors
44
- - Changed to: `rm -rf /tmp/huggingface/* /tmp/torch/* || true`
45
-
46
- ### 6️⃣ Redis and Database Not Configured ✅ FIXED
47
- **Problem**: App logs indicate Redis/DB unavailable, switching to fallback.
48
-
49
- **Solution Applied**:
50
- - Enhanced Redis fallback logic in `app.py` lifespan function
51
- - Added proper HF Spaces detection to skip Redis/DB initialization
52
- - Implemented graceful degradation when Redis is unavailable
53
-
54
- ### 7️⃣ Matplotlib Cache Directory Issue ✅ FIXED
55
- **Problem**: Matplotlib fails to write config to `/.config/matplotlib`.
56
-
57
- **Solution Applied**:
58
- - Set `MPLCONFIGDIR=/tmp/matplotlib` environment variable
59
- - Created writable directory `/tmp/matplotlib` with proper permissions
60
- - Updated entrypoint script to prepare matplotlib cache directory
61
-
62
- ### 8️⃣ Duplicate Route Registration (Multiple Init) ✅ FIXED
63
- **Problem**: Routes printed repeatedly due to Uvicorn reload or multiple startup triggers.
64
-
65
- **Solution Applied**:
66
- - Added `--no-reload` flag to uvicorn command in Dockerfile
67
- - Updated CMD to: `uvicorn app:app --host 0.0.0.0 --port 7860 --no-reload`
68
-
69
- ### 9️⃣ Hugging Face Cache Redownloads Each Restart ✅ FIXED
70
- **Problem**: Each container start re-downloads 2GB+ GGUF model.
71
-
72
- **Solution Applied**:
73
- - Set persistent cache directory: `HF_HOME=/app/.cache/huggingface`
74
- - Created writable cache directory with proper permissions
75
- - Optimized cache cleanup to preserve downloaded models
76
-
77
- ## Deliverables Created
78
-
79
- ### 1. Optimized Dockerfile (`Dockerfile.optimized`)
80
- - Implements all environment variables and persistent cache setup
81
- - Installs fixed dependency versions
82
- - Pre-downloads or properly defers GGUF model load
83
- - Creates all necessary writable directories
84
-
85
- ### 2. Improved Entrypoint Script (`entrypoint_optimized.sh`)
86
- - Cleans only specific caches (no chmod /tmp)
87
- - Prepares writable directories for OpenVINO and Matplotlib
88
- - Sets all required environment variables
89
- - Provides comprehensive startup logging
90
-
91
- ### 3. Updated Requirements (`requirements.txt`)
92
- - Fixed Transformers and Accelerate version compatibility
93
- - Maintained all other dependencies
94
-
95
- ### 4. Enhanced Model Manager (`model_manager.py`)
96
- - Fixed pipeline creation to avoid `assistant_model` issues
97
- - Improved error handling for newer transformers versions
98
-
99
- ### 5. Updated Application Logic (`app.py`)
100
- - Disabled GGUF preload by default
101
- - Enhanced Redis/DB fallback logic
102
- - Improved HF Spaces detection
103
-
104
- ## Performance Goals Achieved
105
-
106
- ✅ **Model load under 20 seconds** (GGUF warm start)
107
- ✅ **No model redownloads after restarts** (persistent cache)
108
- ✅ **Clean startup logs with zero unhandled warnings**
109
- ✅ **Single set of route logs** (no duplicates)
110
- ✅ **All inference models load successfully** (GGUF, Transformers, OpenVINO)
111
- ✅ **GPU utilization optimized** (proper CUDA configuration)
112
-
113
- ## Environment Variables Set
114
-
115
- ```bash
116
- HF_HOME=/app/.cache/huggingface
117
- XDG_CACHE_HOME=/tmp
118
- TORCH_HOME=/tmp/torch
119
- WHISPER_CACHE=/tmp/whisper
120
- PYTHONUNBUFFERED=1
121
- PYTHONPATH=/app
122
- GGUF_N_THREADS=4
123
- GGUF_N_BATCH=64
124
- OMP_NUM_THREADS=4
125
- MKL_NUM_THREADS=4
126
- NUMEXPR_NUM_THREADS=4
127
- OPENVINO_TELEMETRY_DIR=/tmp/openvino_telemetry
128
- MPLCONFIGDIR=/tmp/matplotlib
129
- PRELOAD_GGUF=false
130
- ```
131
-
132
- ## Success Criteria Met
133
-
134
- After applying all fixes, the container should:
135
-
136
- 1. **Start once** with a single set of route logs
137
- 2. **Load GGUF and Transformer models** without warnings
138
- 3. **Have writable directories** for /tmp, /app/.cache, and /tmp/matplotlib
139
- 4. **Gracefully disable Redis/DB** if missing
140
- 5. **Function fully on GPU** with OpenVINO and Transformers pipelines
141
- 6. **Show clean startup logs** with "Application startup complete — no warnings"
142
-
143
- ## Usage
144
-
145
- To use the optimized container:
146
-
147
- ```bash
148
- # Build with optimized Dockerfile
149
- docker build -f Dockerfile.optimized -t ai-service-optimized .
150
-
151
- # Run with optimized entrypoint
152
- docker run -p 7860:7860 ai-service-optimized
153
- ```
154
-
155
- The container will now start cleanly with all identified issues resolved and optimal performance.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CRITICAL_DEPLOYMENT_FIX.md DELETED
@@ -1,288 +0,0 @@
1
- # ⚠️ CRITICAL DEPLOYMENT FIXES
2
-
3
- **Date:** $(date)
4
- **Status:** 🔴 URGENT - MUST APPLY BEFORE DEPLOYMENT
5
-
6
- ## 🚨 Issues Found in Production Logs
7
-
8
- From your Hugging Face Spaces deployment logs, two critical issues were identified:
9
-
10
- ---
11
-
12
- ## Issue 1: ASGI vs WSGI Error (BLOCKING DEPLOYMENT)
13
-
14
- ### Error Message:
15
- ```
16
- TypeError: FastAPI.__call__() missing 1 required positional argument: 'send'
17
- ```
18
-
19
- ### Problem:
20
- FastAPI is an **ASGI application**, but the Dockerfile was using **Gunicorn in WSGI mode**. This is incompatible and causes all requests to fail.
21
-
22
- ### Root Cause:
23
- ```dockerfile
24
- # OLD (WRONG):
25
- CMD ["gunicorn", "--bind", "0.0.0.0:7860", "--workers", "1", "--threads", "2", "--timeout", "0", "app:app"]
26
- ```
27
-
28
- Gunicorn without ASGI workers cannot handle FastAPI applications.
29
-
30
- ### Fix Applied:
31
- ```dockerfile
32
- # NEW (CORRECT):
33
- CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]
34
- ```
35
-
36
- **File Changed:** `Dockerfile` (line 227)
37
-
38
- ### Alternative Fix (if you prefer Gunicorn):
39
- ```dockerfile
40
- CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:7860", "--workers", "1", "app:app"]
41
- ```
42
-
43
- ---
44
-
45
- ## Issue 2: Missing onnxruntime Dependency
46
-
47
- ### Error Message:
48
- ```
49
- ModuleNotFoundError: No module named 'onnxruntime'
50
- ```
51
-
52
- ### Problem:
53
- The `inference_service.py` imports `ORTModelForSeq2SeqLM` from `optimum.onnxruntime`, which requires `onnxruntime`, but it was not in `requirements.txt`.
54
-
55
- ### Root Cause:
56
- - `optimum==1.27.0` was installed
57
- - But `onnxruntime` (required dependency) was missing
58
- - No error handling for optional ONNX optimization
59
-
60
- ### Fixes Applied:
61
-
62
- #### Fix 1: Added onnxruntime to requirements.txt
63
- ```diff
64
- # Model Optimization & Quantization
65
- optimum==1.27.0
66
- optimum-intel==1.25.2
67
- + onnxruntime==1.16.3
68
- nncf==2.17.0
69
- ```
70
-
71
- **File Changed:** `requirements.txt` (line 54)
72
-
73
- #### Fix 2: Added error handling in inference_service.py
74
- ```python
75
- # Optional ONNX Runtime support
76
- try:
77
- from optimum.onnxruntime import ORTModelForSeq2SeqLM
78
- ONNX_AVAILABLE = True
79
- except (ImportError, ModuleNotFoundError) as e:
80
- logging.warning(f"ONNX Runtime not available: {e}")
81
- ORTModelForSeq2SeqLM = None
82
- ONNX_AVAILABLE = False
83
- ```
84
-
85
- **File Changed:** `services/ai-service/src/ai_med_extract/inference_service.py` (lines 9-16)
86
-
87
- ---
88
-
89
- ## 📋 Summary of Changes
90
-
91
- ### Files Modified: 3
92
-
93
- 1. **Dockerfile**
94
- - Changed from gunicorn (WSGI) to uvicorn (ASGI)
95
- - ✅ Critical - Without this, the app cannot serve requests
96
-
97
- 2. **requirements.txt**
98
- - Added `onnxruntime==1.16.3`
99
- - ✅ Critical - Routes fail to register without this
100
-
101
- 3. **services/ai-service/src/ai_med_extract/inference_service.py**
102
- - Added try-except for ONNX imports
103
- - Added graceful fallback to standard transformers
104
- - ✅ Important - Prevents import errors
105
-
106
- ---
107
-
108
- ## 🚀 Deployment Steps
109
-
110
- ### Step 1: Verify Changes
111
- ```bash
112
- # Check Dockerfile CMD line
113
- grep "CMD" Dockerfile
114
- # Should show: CMD ["uvicorn", "app:app", ...]
115
-
116
- # Check onnxruntime in requirements
117
- grep "onnxruntime" requirements.txt
118
- # Should show: onnxruntime==1.16.3
119
- ```
120
-
121
- ### Step 2: Rebuild and Deploy
122
- ```bash
123
- # Commit changes
124
- git add Dockerfile requirements.txt services/ai-service/src/ai_med_extract/inference_service.py
125
- git commit -m "Fix ASGI/WSGI error and add onnxruntime dependency"
126
- git push origin main
127
- ```
128
-
129
- ### Step 3: Verify Deployment
130
- ```bash
131
- # Wait for rebuild (5-10 minutes)
132
- # Then test endpoints:
133
-
134
- # Test health
135
- curl https://your-space.hf.space/health
136
-
137
- # Test root
138
- curl https://your-space.hf.space/
139
-
140
- # Check logs for:
141
- # ✅ "Starting server with uvicorn"
142
- # ✅ "Application startup complete"
143
- # ❌ NO "TypeError: FastAPI.__call__() missing"
144
- ```
145
-
146
- ---
147
-
148
- ## 🎯 Expected Behavior After Fix
149
-
150
- ### Startup Logs Should Show:
151
- ```
152
- ✅ Detected Hugging Face Spaces environment
153
- ✅ Model manager imported successfully
154
- ✅ Agents initialized successfully
155
- ✅ App instance created successfully
156
- ✅ Uvicorn running on http://0.0.0.0:7860
157
- ✅ Application startup complete
158
- ```
159
-
160
- ### NOT:
161
- ```
162
- ❌ TypeError: FastAPI.__call__() missing 1 required positional argument: 'send'
163
- ❌ ModuleNotFoundError: No module named 'onnxruntime'
164
- ❌ Error handling request /
165
- ```
166
-
167
- ---
168
-
169
- ## 🔍 Why This Happened
170
-
171
- ### ASGI vs WSGI Issue:
172
- - **WSGI** (Web Server Gateway Interface): Synchronous, used by Flask, Django
173
- - **ASGI** (Asynchronous Server Gateway Interface): Async, used by FastAPI, Starlette
174
- - Gunicorn is primarily a WSGI server
175
- - FastAPI requires ASGI, so we need Uvicorn (ASGI server) or Gunicorn with Uvicorn workers
176
-
177
- ### Missing Dependency:
178
- - `optimum` package has optional dependencies
179
- - `optimum[onnxruntime]` would install onnxruntime automatically
180
- - But plain `optimum` doesn't include it
181
- - The inference_service.py assumed it would be there
182
-
183
- ---
184
-
185
- ## 📊 Impact Assessment
186
-
187
- ### Before Fixes:
188
- - ❌ **All API requests fail** with TypeError
189
- - ❌ Routes fail to register due to import error
190
- - ❌ App appears to start but cannot serve traffic
191
- - ❌ 100% failure rate
192
-
193
- ### After Fixes:
194
- - ✅ App serves requests correctly
195
- - ✅ All routes register successfully
196
- - ✅ ONNX optimization available (faster inference)
197
- - ✅ Graceful fallback if ONNX fails
198
- - ✅ ~0% failure rate (assuming proper deployment)
199
-
200
- ---
201
-
202
- ## 🔧 Additional Recommendations
203
-
204
- ### 1. Test Locally Before Deploying
205
- ```bash
206
- # Set HF Spaces environment
207
- export HF_SPACES=true
208
-
209
- # Install dependencies
210
- pip install -r requirements.txt
211
-
212
- # Run with uvicorn
213
- uvicorn app:app --host 0.0.0.0 --port 7860
214
-
215
- # Test in another terminal
216
- curl http://localhost:7860/health
217
- ```
218
-
219
- ### 2. Monitor First Requests
220
- After deploying, monitor the logs for:
221
- - Startup messages
222
- - First request handling
223
- - Any error patterns
224
-
225
- ### 3. Consider Adding Health Check Timeout
226
- In your HF Spaces settings, ensure health check timeout is at least 60 seconds for model loading.
227
-
228
- ---
229
-
230
- ## 🎓 Lessons Learned
231
-
232
- 1. **Always use ASGI servers for FastAPI**
233
- - Uvicorn (recommended)
234
- - Hypercorn
235
- - Daphne
236
- - Gunicorn with uvicorn.workers.UvicornWorker
237
-
238
- 2. **Test with production-like environment**
239
- - Use containers locally
240
- - Match the deployment server type
241
- - Test with same Python version
242
-
243
- 3. **Handle optional dependencies gracefully**
244
- - Add try-except for optional imports
245
- - Provide fallbacks
246
- - Log warnings, not errors
247
-
248
- 4. **Check requirements carefully**
249
- - `optimum` ≠ `optimum[onnxruntime]`
250
- - Read package documentation
251
- - Test installation in clean environment
252
-
253
- ---
254
-
255
- ## ✅ Verification Checklist
256
-
257
- - [ ] Dockerfile uses uvicorn (or gunicorn with UvicornWorker)
258
- - [ ] onnxruntime in requirements.txt
259
- - [ ] inference_service.py has try-except for ONNX import
260
- - [ ] Local test with uvicorn succeeds
261
- - [ ] Health endpoint returns 200
262
- - [ ] No ASGI/WSGI errors in logs
263
- - [ ] Routes register successfully
264
-
265
- ---
266
-
267
- ## 📞 If Issues Persist
268
-
269
- If you still see errors after applying these fixes:
270
-
271
- 1. **Check the logs** for new error messages
272
- 2. **Verify the changes** were actually deployed (check build logs)
273
- 3. **Clear cache** in HF Spaces settings
274
- 4. **Restart the Space** manually if needed
275
- 5. **Check dependencies** - ensure all installed correctly
276
-
277
- ---
278
-
279
- **Priority:** 🔴 CRITICAL - MUST APPLY IMMEDIATELY
280
- **Estimated Fix Time:** 5 minutes
281
- **Deployment Time:** 5-10 minutes (rebuild)
282
- **Success Probability:** 95%+ with these fixes
283
-
284
- ---
285
-
286
- *This document supersedes previous deployment guidance*
287
- *Apply these fixes before attempting any other changes*
288
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEPLOYMENT.md DELETED
@@ -1,106 +0,0 @@
1
- # Deployment Instructions
2
-
3
- This document provides deployment instructions for the Medical AI Service in various environments.
4
-
5
- ## Local Development
6
-
7
- ### Prerequisites
8
- - Python 3.10+
9
- - Docker (optional, for containerized testing)
10
-
11
- ### Setup
12
- 1. Clone the repository
13
- 2. Install dependencies: `pip install -r requirements.txt`
14
- 3. Set environment variables (see Configuration section)
15
- 4. Run the application: `python -m uvicorn ai_med_extract.app:create_app --host 0.0.0.0 --port 7860`
16
-
17
- ### Testing
18
- - Health check: `curl http://localhost:7860/health/live`
19
- - API docs: `http://localhost:7860/docs` (FastAPI Swagger UI)
20
-
21
- ## Docker Deployment
22
-
23
- ### Build and Run
24
- ```bash
25
- docker build -t medical-ai-service .
26
- docker run -p 7860:7860 -e SECRET_KEY=your-secret -e DATABASE_URL=your-db medical-ai-service
27
- ```
28
-
29
- ### Configuration
30
- - Exposes port 7860
31
- - Runs FastAPI app with uvicorn
32
- - Includes model caching optimizations
33
-
34
- ## Kubernetes Deployment
35
-
36
- ### Prerequisites
37
- - Kubernetes cluster
38
- - kubectl configured
39
- - Secrets created for database, Redis, and JWT keys
40
-
41
- ### Deploy
42
- ```bash
43
- kubectl apply -f infra/k8s/secure_deployment.yaml
44
- ```
45
-
46
- ### Features
47
- - Horizontal Pod Autoscaler (2-10 replicas based on CPU/memory)
48
- - Resource limits: 1-4 CPU, 4-8Gi memory
49
- - Prometheus monitoring annotations
50
- - Security contexts and network policies
51
-
52
- ### Scaling
53
- The HPA automatically scales based on:
54
- - CPU utilization > 70%
55
- - Memory utilization > 80%
56
-
57
- ## Hugging Face Spaces Deployment
58
-
59
- ### Prerequisites
60
- - Hugging Face account
61
- - Space created with Docker runtime
62
-
63
- ### Configuration
64
- 1. Dockerfile exposes port 7860
65
- 2. FastAPI app listens on 0.0.0.0:7860
66
- 3. requirements.txt includes all dependencies
67
- 4. .huggingface.yaml with `runtime: docker`
68
- 5. .dockerignore and .gitignore present
69
-
70
- ### Deploy
71
- ```bash
72
- # Test locally
73
- docker build -t hntai-app .
74
- docker run -p 7860:7860 hntai-app
75
-
76
- # Push to HF Spaces
77
- # App available at your-space-name.hf.space
78
- ```
79
-
80
- ## Configuration
81
-
82
- ### Required Environment Variables
83
- - `SECRET_KEY`: Application secret key
84
- - `JWT_SECRET_KEY`: JWT signing key
85
- - `DATABASE_URL`: PostgreSQL connection string
86
- - `REDIS_URL`: Redis connection string
87
-
88
- ### Optional
89
- - `ENVIRONMENT`: prod/dev (default: prod)
90
- - `PORT`: Service port (default: 7860)
91
- - `CORS_ORIGINS`: Allowed CORS origins (default: *)
92
- - Model cache directories and other settings in config_settings.py
93
-
94
- ## Monitoring
95
-
96
- ### Health Checks
97
- - `/health/live`: Liveness probe
98
- - `/health/ready`: Readiness probe
99
-
100
- ### Metrics
101
- - `/metrics`: Prometheus metrics endpoint
102
- - Includes performance metrics, model loading status
103
-
104
- ### Logging
105
- - Structured JSON logs for production
106
- - Configurable log levels
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEPLOYMENT_FIX_SUMMARY.md DELETED
@@ -1,132 +0,0 @@
1
- # Hugging Face Spaces Deployment Fix Summary
2
-
3
- ## Root Cause Analysis
4
-
5
- The deployment error `ModuleNotFoundError: No module named 'app'` was caused by the `.dockerignore` file excluding the root `app.py` file from the Docker build context.
6
-
7
- ### Error Details
8
- ```
9
- [2025-10-07 12:40:17 +0000] [10] [ERROR] Exception in worker process
10
- ...
11
- ModuleNotFoundError: No module named 'app'
12
- ```
13
-
14
- ## Issues Identified and Fixed
15
-
16
- ### 1. **Critical Issue: .dockerignore Configuration** ✓ FIXED
17
- - **Problem**: The `.dockerignore` file was excluding everything (`*`) and then only including specific files, but it was missing the root `app.py` file.
18
- - **Impact**: The `app.py` file was not being copied to the Docker container, causing gunicorn to fail with `ModuleNotFoundError: No module named 'app'`.
19
- - **Fix**: Added `!app.py` and `!__init__.py` to the `.dockerignore` include list.
20
-
21
- ### 2. **Missing Import in ai_med_extract/app.py** ✓ FIXED
22
- - **Problem**: The `model_manager` was being used in the `lifespan` function but was not imported at the module level.
23
- - **Impact**: Could cause runtime errors when scalable components try to initialize.
24
- - **Fix**: Added `from .utils.model_manager import model_manager` to the imports.
25
-
26
- ### 3. **Improved Logging in Root app.py** ✓ FIXED
27
- - **Problem**: Limited debugging information when imports fail.
28
- - **Impact**: Made troubleshooting difficult.
29
- - **Fix**: Added comprehensive logging including:
30
- - Python path information
31
- - Current working directory
32
- - Files in current directory
33
- - Full exception tracebacks
34
-
35
- ### 4. **Enhanced .huggingface.yaml Configuration** ✓ FIXED
36
- - **Problem**: Missing explicit app entrypoint configuration.
37
- - **Impact**: Hugging Face Spaces might not know which app to run.
38
- - **Fix**: Added app configuration section with explicit entrypoint and port.
39
-
40
- ### 5. **Simplified Root app.py Import Logic** ✓ FIXED
41
- - **Problem**: Overly complex import logic with importlib that could fail.
42
- - **Impact**: Made debugging more difficult.
43
- - **Fix**: Simplified to direct imports with proper error handling.
44
-
45
- ## Files Modified
46
-
47
- 1. **`.dockerignore`** - Added root `app.py` and `__init__.py` to include list
48
- 2. **`app.py`** (root) - Enhanced logging and simplified import logic
49
- 3. **`services/ai-service/src/ai_med_extract/app.py`** - Added missing model_manager import
50
- 4. **`.huggingface.yaml`** - Added explicit app configuration
51
- 5. **`Dockerfile`** - Added clarification comment about Hugging Face Spaces usage
52
-
53
- ## Verification
54
-
55
- ### Local Testing
56
- ✓ App imports successfully: `import app` works
57
- ✓ App instance created: `app.app.title == "Medical AI Service"`
58
- ✓ All agents initialize correctly
59
- ✓ No import errors or missing dependencies
60
-
61
- ### Expected Behavior on Hugging Face Spaces
62
- 1. Dockerfile builds successfully with all necessary files
63
- 2. Gunicorn can find and import the `app` module
64
- 3. FastAPI app initializes with minimal preloading (FAST_MODE=true)
65
- 4. App responds to health checks and API requests
66
-
67
- ## Deployment Steps
68
-
69
- 1. Commit all changes to Git
70
- 2. Push to Hugging Face Spaces repository
71
- 3. Hugging Face Spaces will automatically:
72
- - Build the Docker container with the fixed `.dockerignore`
73
- - Install dependencies from `requirements.txt`
74
- - Run gunicorn with `app:app` entrypoint
75
- - App should start successfully on port 7860
76
-
77
- ## Key Configuration
78
-
79
- ### Environment Variables (set in app.py)
80
- - `FAST_MODE=true` - Enables fast startup mode
81
- - `PRELOAD_SMALL_MODELS=false` - Disables model preloading
82
- - `HF_HOME=/tmp/huggingface` - Sets Hugging Face cache directory
83
- - `TORCH_HOME=/tmp/torch` - Sets PyTorch cache directory
84
-
85
- ### Gunicorn Configuration (in Dockerfile CMD)
86
- - Workers: 1
87
- - Threads: 2
88
- - Timeout: 0 (unlimited)
89
- - Bind: 0.0.0.0:7860
90
-
91
- ## Troubleshooting
92
-
93
- If the deployment still fails:
94
-
95
- 1. **Check the build logs** - Verify that `app.py` is being copied to the container
96
- 2. **Check the runtime logs** - Look for import errors or missing dependencies
97
- 3. **Verify file structure** - Ensure all files are in the correct locations
98
- 4. **Check cache** - Set `cache: false` in `.huggingface.yaml` to force rebuild
99
- 5. **Test locally** - Build the Docker image locally to verify it works
100
-
101
- ### Local Docker Test
102
- ```bash
103
- # Build the Docker image
104
- docker build -t hntai-test .
105
-
106
- # Run the container
107
- docker run -p 7860:7860 hntai-test
108
-
109
- # Test the endpoint
110
- curl http://localhost:7860/health
111
- ```
112
-
113
- ## Additional Notes
114
-
115
- - The app uses a multi-strategy import approach with fallbacks
116
- - All heavy model loading is deferred to runtime (not import time)
117
- - Redis and database features are optional and will be skipped if not available
118
- - The app will start in degraded mode if necessary rather than failing completely
119
-
120
- ## Success Criteria
121
-
122
- ✓ No `ModuleNotFoundError` during gunicorn startup
123
- ✓ App responds to health check requests
124
- ✓ API endpoints are accessible
125
- ✓ Agents initialize (with or without models depending on FAST_MODE)
126
- ✓ No critical errors in container logs
127
-
128
- ---
129
-
130
- **Status**: Ready for deployment to Hugging Face Spaces
131
- **Last Updated**: 2025-10-07
132
- **Priority**: Critical - Blocks deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEVELOPMENT.md DELETED
@@ -1,377 +0,0 @@
1
- # HNTAI - Scalable Medical Data Extraction API - Development Guide
2
-
3
- ## Overview
4
-
5
- This FastAPI-based application provides scalable medical data extraction services, fully aligned with the "ChatGPT Version 3 - Scalable" architecture. It features async processing, Redis caching, PostgreSQL persistence, and enterprise-grade security.
6
-
7
- ## Architecture
8
-
9
- ### Core Components
10
-
11
- 1. **FastAPI Application** (`app.py`)
12
- - Main application factory with lifespan events
13
- - CORS middleware for cross-origin requests
14
- - Centralized agent initialization
15
- - Route registration from APIRouter
16
-
17
- 2. **Configuration** (`config_settings.py`)
18
- - Pydantic-based settings with validation
19
- - Environment variable loading
20
- - Database and Redis URL configuration
21
-
22
- 3. **Inference Service** (`inference_service.py`)
23
- - Async text summarization using thread pools
24
- - Model caching for performance
25
- - Chunking for long text processing
26
-
27
- 4. **PHI Scrubber Service** (`phi_scrubber_service.py`)
28
- - Regex-based PHI detection and redaction
29
- - Audit logging to PostgreSQL
30
- - Redis-based statistics tracking
31
-
32
- 5. **API Routes** (`api/routes_fastapi.py`)
33
- - FastAPI APIRouter with async endpoints
34
- - Health checks (/live, /ready)
35
- - Placeholder routes for full migration
36
-
37
- ### Data Flow
38
-
39
- ```
40
- Client Request → FastAPI → Route Handler → Agent/Service → Redis Cache → PostgreSQL → Response
41
- ```
42
-
43
- ## Development Setup
44
-
45
- ### Prerequisites
46
-
47
- - Python 3.10+
48
- - PostgreSQL 13+
49
- - Redis 6+
50
- - Docker (optional)
51
-
52
- ### Local Development
53
-
54
- 1. **Clone and Setup Virtual Environment**
55
- ```bash
56
- git clone <repository>
57
- cd hntai
58
- python -m venv venv
59
- source venv/bin/activate # On Windows: venv\Scripts\activate
60
- ```
61
-
62
- 2. **Install Dependencies**
63
- ```bash
64
- pip install -r requirements.txt
65
- ```
66
-
67
- 3. **Setup Database and Redis**
68
- ```bash
69
- # Start PostgreSQL (using Docker)
70
- docker run -d --name postgres -e POSTGRES_PASSWORD=password -p 5432:5432 postgres:13
71
-
72
- # Start Redis (using Docker)
73
- docker run -d --name redis -p 6379:6379 redis:6
74
-
75
- # Create database
76
- createdb medical_ai
77
- ```
78
-
79
- 4. **Environment Variables**
80
- Create `.env` file:
81
- ```bash
82
- DATABASE_URL=postgresql://postgres:password@localhost:5432/medical_ai
83
- REDIS_URL=redis://localhost:6379/0
84
- SECRET_KEY=your-secret-key-here
85
- JWT_SECRET_KEY=your-jwt-secret-key-here
86
- ```
87
-
88
- 5. **Run Database Migrations**
89
- ```bash
90
- # Apply schema
91
- psql -d medical_ai -f database/postgresql/001_schema.sql
92
- ```
93
-
94
- 6. **Run the Application**
95
- ```bash
96
- # Development mode
97
- python -m ai_med_extract.main
98
-
99
- # Or directly
100
- uvicorn ai_med_extract.app:create_app --reload --host 0.0.0.0 --port 7860
101
- ```
102
-
103
- 7. **Access the Application**
104
- - API: http://localhost:7860
105
- - Docs: http://localhost:7860/docs (FastAPI auto-generated)
106
- - Health: http://localhost:7860/live
107
-
108
- ### Debugging
109
-
110
- 1. **Enable Debug Logging**
111
- ```python
112
- import logging
113
- logging.basicConfig(level=logging.DEBUG)
114
- ```
115
-
116
- 2. **Use FastAPI Debug Mode**
117
- ```bash
118
- uvicorn ai_med_extract.app:create_app --reload --debug --host 0.0.0.0 --port 7860
119
- ```
120
-
121
- 3. **Test Endpoints**
122
- ```bash
123
- # Health check
124
- curl http://localhost:7860/live
125
-
126
- # API docs
127
- curl http://localhost:7860/openapi.json
128
- ```
129
-
130
- 4. **Database Debugging**
131
- ```bash
132
- # Connect to PostgreSQL
133
- psql -d medical_ai
134
-
135
- # Check PHI audit logs
136
- SELECT * FROM phi_audit_log LIMIT 10;
137
- ```
138
-
139
- 5. **Redis Debugging**
140
- ```bash
141
- # Connect to Redis CLI
142
- redis-cli
143
-
144
- # Check keys
145
- KEYS *
146
- ```
147
-
148
- ## Production Deployment
149
-
150
- ### Option 1: Docker Deployment
151
-
152
- 1. **Build Docker Image**
153
- ```bash
154
- docker build -t hntai-api .
155
- ```
156
-
157
- 2. **Run Container**
158
- ```bash
159
- docker run -d \
160
- --name hntai-api \
161
- -p 7860:7860 \
162
- -e DATABASE_URL=postgresql://... \
163
- -e REDIS_URL=redis://... \
164
- -e SECRET_KEY=... \
165
- -e JWT_SECRET_KEY=... \
166
- hntai-api
167
- ```
168
-
169
- ### Option 2: Kubernetes Deployment
170
-
171
- 1. **Prerequisites**
172
- - Kubernetes cluster
173
- - kubectl configured
174
- - PostgreSQL and Redis services running
175
-
176
- 2. **Create Secrets**
177
- ```bash
178
- kubectl create secret generic medical-ai-secrets \
179
- --from-literal=DATABASE_URL=postgresql://... \
180
- --from-literal=REDIS_URL=redis://... \
181
- --from-literal=SECRET_KEY=... \
182
- --from-literal=JWT_SECRET_KEY=...
183
- ```
184
-
185
- 3. **Deploy to Kubernetes**
186
- ```bash
187
- kubectl apply -f infra/k8s/secure_deployment.yaml
188
- ```
189
-
190
- 4. **Verify Deployment**
191
- ```bash
192
- kubectl get pods -n medical-ai
193
- kubectl logs -n medical-ai deployment/medical-ai-service
194
- ```
195
-
196
- ### Option 3: Hugging Face Spaces (Legacy)
197
-
198
- The application still supports HF Spaces deployment for lightweight use cases.
199
-
200
- 1. **Update app.py** for HF Spaces compatibility
201
- 2. **Deploy via HF Spaces** with Docker SDK
202
-
203
- ## Monitoring and Observability
204
-
205
- ### Prometheus Metrics
206
-
207
- The application exposes metrics at `/metrics` endpoint.
208
-
209
- 1. **Setup Prometheus**
210
- ```bash
211
- kubectl apply -f monitoring/prometheus.yml
212
- ```
213
-
214
- 2. **Access Metrics**
215
- ```bash
216
- curl http://ai-service.medical-ai.svc.cluster.local:80/metrics
217
- ```
218
-
219
- ### Health Checks
220
-
221
- - **Liveness** (`/live`): Basic health check
222
- - **Readiness** (`/ready`): Checks if agents are initialized
223
-
224
- ### Logging
225
-
226
- - Structured JSON logging
227
- - PHI operations logged to database
228
- - Error tracking with stack traces
229
-
230
- ## Security Features
231
-
232
- ### HIPAA Compliance
233
-
234
- - PHI scrubbing with audit trails
235
- - Non-root container execution
236
- - Secrets management via Kubernetes
237
- - Network policies restricting traffic
238
-
239
- ### Authentication
240
-
241
- - JWT-based authentication (framework ready)
242
- - API key support (configurable)
243
-
244
- ## API Usage
245
-
246
- ### Health Endpoints
247
-
248
- ```bash
249
- GET /live
250
- GET /ready
251
- ```
252
-
253
- ### PHI Scrubbing
254
-
255
- ```bash
256
- POST /phi/scrub
257
- Content-Type: application/json
258
-
259
- {
260
- "text": "Patient John Doe, SSN 123-45-6789, diagnosed with diabetes."
261
- }
262
- ```
263
-
264
- Response:
265
- ```json
266
- {
267
- "scrubbed_text": "Patient [REDACTED], SSN [REDACTED], diagnosed with diabetes.",
268
- "phi_found": ["NAME", "SSN"],
269
- "redaction_count": 2
270
- }
271
- ```
272
-
273
- ### Text Summarization
274
-
275
- ```bash
276
- POST /api/generate_summary
277
- Content-Type: application/json
278
-
279
- {
280
- "text": "Long medical text...",
281
- "max_length": 150,
282
- "min_length": 50
283
- }
284
- ```
285
-
286
- ### Generate Patient Summary
287
-
288
- The `generate_patient_summary` endpoint has been migrated from the original Flask implementation to FastAPI. It generates a comprehensive 4-section patient summary from EHR data, with support for streaming (SSE) to handle long-running tasks and prevent timeouts.
289
-
290
- **Endpoint**: `POST /generate_patient_summary`
291
-
292
- **Query Parameters**:
293
- - `stream` (optional, default: `false`): Set to `true` for Server-Sent Events (SSE) streaming updates.
294
-
295
- **Request Body** (JSON):
296
- ```json
297
- {
298
- "patientid": "12345",
299
- "token": "your-auth-token",
300
- "key": "your-api-key",
301
- "patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
302
- "patient_summarizer_model_type": "gguf",
303
- "generation_mode": "hq", // Options: "hq" (high-quality), "fast", "rule" (deterministic)
304
- "timeout_mode": "fast" // Options: "fast" (8s EHR timeout), "extended" (30s)
305
- }
306
- ```
307
-
308
- **Synchronous Response** (when `stream=false`):
309
- ```json
310
- {
311
- "summary": "## Clinical Assessment\n- Patient details...\n\n## Key Trends & Changes\n- Changes detected...\n\n## Plan & Suggested Actions\n- Recommendations...\n\n## Direct Guidance for Physician\n- Clinical insights...",
312
- "baseline": "Patient baseline data...",
313
- "delta": "Changes from previous visits...",
314
- "timing": {"ehr_api": 2.5, "generation": 15.3, "total": 17.8},
315
- "model_used": "microsoft/Phi-3-mini-4k-instruct (gguf)",
316
- "timeout_mode_used": "fast"
317
- }
318
- ```
319
-
320
- **Streaming Response** (when `stream=true`):
321
- - Returns a `text/event-stream` response with SSE events:
322
- - `type: progress` - Progress updates (e.g., 10%, 50%)
323
- - `type: complete` - Final result with full summary
324
- - `type: error` - Error details if failed
325
- - `type: heartbeat` - Keep-alive signals
326
-
327
- **Notes**:
328
- - The endpoint integrates with an external EHR API to fetch patient data.
329
- - Supports multiple model types: GGUF, text-generation, summarization, seq2seq.
330
- - Includes fallbacks for timeouts, API errors, and model failures.
331
- - PHI scrubbing is applied automatically.
332
- - Full implementation includes delta computation, baseline building, and 4-section markdown output.
333
-
334
- ### Other Endpoints (Migration in Progress)
335
- - `POST /upload` - File upload and text extraction
336
- - `POST /transcribe` - Audio transcription
337
- - `POST /extract_medical_data` - Structured medical data extraction
338
- - `POST /api/extract_medical_data_from_audio` - Audio-based medical extraction
339
-
340
- ## Troubleshooting
341
-
342
- ### Common Issues
343
-
344
- 1. **Model Loading Failures**
345
- - Check HF_HOME and cache directories
346
- - Ensure sufficient memory
347
- - Verify internet connectivity for model downloads
348
-
349
- 2. **Database Connection Errors**
350
- - Verify DATABASE_URL format
351
- - Check PostgreSQL service status
352
- - Ensure database exists and schema applied
353
-
354
- 3. **Redis Connection Issues**
355
- - Verify REDIS_URL format
356
- - Check Redis service availability
357
- - Monitor Redis memory usage
358
-
359
- 4. **PHI Scrubbing Not Working**
360
- - Check regex patterns in phi_scrubber_service.py
361
- - Verify Redis connection for stats
362
- - Check database audit logs
363
-
364
- ### Performance Tuning
365
-
366
- - Adjust thread pools in inference_service.py
367
- - Configure Redis connection pooling
368
- - Set appropriate resource limits in K8s
369
- - Monitor memory usage for model caching
370
-
371
- ## Contributing
372
-
373
- 1. Follow async/await patterns for new endpoints
374
- 2. Add proper error handling and logging
375
- 3. Update tests for new functionality
376
- 4. Ensure HIPAA compliance for PHI handling
377
- 5. Document API changes in this guide
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEVICE_PARAMETER_FIX_SUMMARY.md DELETED
@@ -1,136 +0,0 @@
1
- # Device Parameter Fix for Accelerate Models
2
-
3
- ## Issue Description
4
-
5
- The patient summarizer was failing with the error:
6
- ```
7
- The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.
8
- ```
9
-
10
- This error occurs when a model is loaded with the `accelerate` library and the code tries to specify a `device` parameter in the pipeline creation.
11
-
12
- ## Root Cause
13
-
14
- In `services/ai-service/src/ai_med_extract/api/routes_fastapi.py`, the `get_summarizer_pipeline` function was passing both `device` and `device_map` parameters to the pipeline:
15
-
16
- ```python
17
- # PROBLEMATIC CODE (before fix)
18
- pipeline(
19
- task=summarizer_model_type,
20
- model=summarizer_model_name,
21
- trust_remote_code=True,
22
- device=device, # ❌ Conflicts with accelerate
23
- torch_dtype=dtype,
24
- **({"device_map": device_map} if device_map else {}) # ❌ Also conflicts
25
- )
26
- ```
27
-
28
- When `device_map="auto"` is used (for GPU), the model is loaded with `accelerate`, which then conflicts with the `device` parameter.
29
-
30
- ## Fix Applied
31
-
32
- ### 1. Separated GPU and CPU Pipeline Creation
33
-
34
- **For GPU (CUDA available):**
35
- - Use `device_map="auto"` for automatic device mapping
36
- - **Do NOT** pass `device` parameter
37
- - Use `torch.float16` for efficiency
38
-
39
- **For CPU:**
40
- - Use `device=-1` for CPU
41
- - Use `torch.float32` for compatibility
42
-
43
- ### 2. Added Fallback Error Handling
44
-
45
- If the initial pipeline creation fails due to device conflicts:
46
- 1. Detect accelerate/device-related errors
47
- 2. Retry without any device parameters
48
- 3. Log the fallback process for debugging
49
-
50
- ### 3. Enhanced Logging
51
-
52
- Added detailed logging to track:
53
- - Pipeline creation parameters
54
- - Success/failure of initial creation
55
- - Fallback process when needed
56
- - Final pipeline status
57
-
58
- ## Code Changes
59
-
60
- ### Before (Problematic):
61
- ```python
62
- get_summarizer_pipeline.cache[key] = pipeline(
63
- task=summarizer_model_type,
64
- model=summarizer_model_name,
65
- trust_remote_code=True,
66
- device=device,
67
- torch_dtype=dtype,
68
- **({"device_map": device_map} if device_map else {})
69
- )
70
- ```
71
-
72
- ### After (Fixed):
73
- ```python
74
- # Separate GPU and CPU handling
75
- if torch.cuda.is_available():
76
- pipeline_kwargs = {
77
- "task": summarizer_model_type,
78
- "model": summarizer_model_name,
79
- "trust_remote_code": True,
80
- "device_map": "auto", # ✅ Only device_map for GPU
81
- "torch_dtype": torch.float16
82
- }
83
- else:
84
- pipeline_kwargs = {
85
- "task": summarizer_model_type,
86
- "model": summarizer_model_name,
87
- "trust_remote_code": True,
88
- "device": -1, # ✅ Only device for CPU
89
- "torch_dtype": torch.float32
90
- }
91
-
92
- # Try with device parameters first
93
- try:
94
- get_summarizer_pipeline.cache[key] = pipeline(**pipeline_kwargs)
95
- except Exception as e:
96
- # Fallback without device parameters if accelerate conflicts
97
- if "accelerate" in str(e).lower() or "device" in str(e).lower():
98
- fallback_kwargs = {
99
- "task": summarizer_model_type,
100
- "model": summarizer_model_name,
101
- "trust_remote_code": True,
102
- "torch_dtype": dtype
103
- }
104
- get_summarizer_pipeline.cache[key] = pipeline(**fallback_kwargs)
105
- ```
106
-
107
- ## Testing
108
-
109
- Created `test_device_fix.py` to verify:
110
- 1. ✅ Pipeline creation works without device conflicts
111
- 2. ✅ Fallback behavior works when device parameters fail
112
- 3. ✅ Both GPU and CPU scenarios are handled correctly
113
-
114
- ## Expected Results
115
-
116
- After this fix:
117
- - ✅ Patient summarizer should work without accelerate device errors
118
- - ✅ Models load correctly on both GPU and CPU
119
- - ✅ Fallback ensures compatibility with various model configurations
120
- - ✅ Detailed logging helps debug any future issues
121
-
122
- ## Files Modified
123
-
124
- 1. **`services/ai-service/src/ai_med_extract/api/routes_fastapi.py`**
125
- - Fixed `get_summarizer_pipeline` function
126
- - Added proper GPU/CPU separation
127
- - Added fallback error handling
128
- - Enhanced logging
129
-
130
- 2. **`test_device_fix.py`** (new file)
131
- - Test script to verify the fix works
132
- - Tests both normal and fallback scenarios
133
-
134
- ## Deployment
135
-
136
- This fix should resolve the patient summarizer error you encountered. The changes are backward compatible and include fallback mechanisms to ensure the service continues working even if there are other device-related issues.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
FIX_404_SUMMARY.md DELETED
@@ -1,170 +0,0 @@
1
- # Fix for 404 Error on `/generate_patient_summary` Endpoint
2
-
3
- ## Problem
4
- The `/generate_patient_summary` endpoint was returning a 404 Not Found error when accessed on Hugging Face Spaces at:
5
- ```
6
- https://salvinjose-hntai.hf.space/generate_patient_summary?stream=true
7
- ```
8
-
9
- ## Root Cause
10
- 1. **Route Registration Issue**: The `/generate_patient_summary` endpoint was defined INSIDE the `register_routes()` function, which meant it was being added to the router AFTER the router was already included in the app. While this should work in FastAPI, it's not best practice and can cause timing issues.
11
-
12
- 2. **Double Initialization**: The app was being initialized twice:
13
- - Once in `create_app()` (which calls `initialize_agents` by default)
14
- - Once again in the root `app.py` file
15
-
16
- This double initialization could cause routes to be registered incorrectly or timing issues.
17
-
18
- ## Changes Made
19
-
20
- ### 1. Fixed Route Registration (`services/ai-service/src/ai_med_extract/api/routes_fastapi.py`)
21
-
22
- **Before:**
23
- ```python
24
- def register_routes(app, agents):
25
- app.include_router(router)
26
-
27
- # Routes defined INSIDE the function
28
- @router.post("/generate_patient_summary")
29
- async def generate_patient_summary(...):
30
- ...
31
- ```
32
-
33
- **After:**
34
- ```python
35
- # Define routes at MODULE LEVEL (outside register_routes)
36
- @router.post("/generate_patient_summary")
37
- async def generate_patient_summary(
38
- request: Request,
39
- background_tasks: BackgroundTasks,
40
- stream: bool = False
41
- ):
42
- """Generate patient summary with optional streaming support."""
43
- ...
44
-
45
- def register_routes(app, agents):
46
- # Just include the router with already-defined routes
47
- app.include_router(router)
48
- ...
49
- ```
50
-
51
- ### 2. Fixed Double Initialization (`app.py`)
52
-
53
- **Before:**
54
- ```python
55
- app = create_app() # This calls initialize_agents internally
56
- initialize_agents(app, preload_small_models=False) # Called again!
57
- ```
58
-
59
- **After:**
60
- ```python
61
- app = create_app(initialize=False) # Don't initialize yet
62
- initialize_agents(app, preload_small_models=False) # Initialize once
63
- ```
64
-
65
- ### 3. Added Comprehensive Logging
66
-
67
- Added logging to show all registered routes on startup in both:
68
- - `services/ai-service/src/ai_med_extract/app.py` (lines 781-786)
69
- - `app.py` (lines 95-103)
70
-
71
- This will help debug any remaining routing issues on HF Spaces.
72
-
73
- ### 4. Added Diagnostic Endpoint
74
-
75
- Added a new `/api/info` endpoint that returns:
76
- ```json
77
- {
78
- "status": "ok",
79
- "message": "Medical AI Service API",
80
- "version": "1.0.0",
81
- "endpoints": {
82
- "generate_patient_summary": "/generate_patient_summary (POST)",
83
- "upload": "/upload (POST)",
84
- "transcribe": "/transcribe (POST)",
85
- "health": "/health/* (GET)"
86
- }
87
- }
88
- ```
89
-
90
- ## Testing
91
-
92
- ### 1. Verify Routes are Registered
93
- After deploying to HF Spaces, check the logs for:
94
- ```
95
- ============================================================
96
- REGISTERED ROUTES:
97
- ['POST'] /generate_patient_summary
98
- ...
99
- Total routes registered: X
100
- ============================================================
101
- ```
102
-
103
- ### 2. Test the Diagnostic Endpoint
104
- Access: `https://salvinjose-hntai.hf.space/api/info`
105
-
106
- Should return:
107
- ```json
108
- {
109
- "status": "ok",
110
- "message": "Medical AI Service API",
111
- ...
112
- }
113
- ```
114
-
115
- ### 3. Test the Debug Endpoint
116
- Access: `https://salvinjose-hntai.hf.space/debug/routes`
117
-
118
- Should return a list of all registered routes:
119
- ```json
120
- {
121
- "routes": [
122
- {"path": "/generate_patient_summary", "methods": ["POST"], "name": "generate_patient_summary"},
123
- ...
124
- ],
125
- "total": X
126
- }
127
- ```
128
-
129
- ### 4. Test the Target Endpoint
130
- ```bash
131
- curl -X POST "https://salvinjose-hntai.hf.space/generate_patient_summary?stream=true" \
132
- -H "Content-Type: application/json" \
133
- -d '{
134
- "patientid": "your-patient-id",
135
- "token": "your-auth-token",
136
- "key": "your-api-key"
137
- }'
138
- ```
139
-
140
- ## Expected Outcome
141
-
142
- The `/generate_patient_summary` endpoint should now:
143
- 1. Return a proper response instead of 404
144
- 2. Support both streaming (`stream=true`) and non-streaming modes
145
- 3. Be visible in the route listing at `/debug/routes`
146
-
147
- ## If Issues Persist
148
-
149
- If the 404 error persists after these changes:
150
-
151
- 1. **Check the logs** - Look for the "REGISTERED ROUTES" section to verify the endpoint is registered
152
- 2. **Test the diagnostic endpoint** - Access `/api/info` to verify the API is accessible
153
- 3. **Check the debug endpoint** - Access `/debug/routes` to see all registered routes
154
- 4. **Verify the URL** - Ensure you're using the correct URL without double slashes
155
- 5. **Check for errors** - Look for any exceptions during route registration in the logs
156
-
157
- ## Next Steps
158
-
159
- 1. Commit these changes
160
- 2. Push to HF Spaces
161
- 3. Check the logs for route registration
162
- 4. Test the endpoints as described above
163
- 5. If issues persist, share the logs from HF Spaces
164
-
165
- ## Files Modified
166
-
167
- - `app.py` (root level)
168
- - `services/ai-service/src/ai_med_extract/app.py`
169
- - `services/ai-service/src/ai_med_extract/api/routes_fastapi.py`
170
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
GPU_CONFIGURATION_GUIDE.md DELETED
@@ -1,169 +0,0 @@
1
- # GPU Configuration Guide for Hugging Face Spaces
2
-
3
- ## Overview
4
- The GGUF model loader has been updated to automatically detect and use GPU when available in upgraded Hugging Face Spaces.
5
-
6
- ## How It Works
7
-
8
- ### Automatic GPU Detection
9
- The system now automatically detects GPU availability and configures the model accordingly:
10
-
11
- 1. **GPU Available**: Uses all GPU layers (`n_gpu_layers=-1`) for maximum performance
12
- 2. **CPU Only**: Falls back to CPU-only mode (`n_gpu_layers=0`) when GPU is not available
13
- 3. **Error Handling**: Gracefully falls back to CPU if GPU detection fails
14
-
15
- ### Configuration Options
16
-
17
- #### Environment Variables
18
- You can control GPU usage through environment variables:
19
-
20
- ```bash
21
- # Use all GPU layers (default when GPU is available)
22
- GGUF_GPU_LAYERS=-1
23
-
24
- # Use specific number of GPU layers (e.g., 20 layers)
25
- GGUF_GPU_LAYERS=20
26
-
27
- # Force CPU-only mode even if GPU is available
28
- GGUF_GPU_LAYERS=0
29
- ```
30
-
31
- #### Batch Size Configuration
32
- ```bash
33
- # Adjust batch size for GPU memory
34
- GGUF_N_BATCH=32 # Default
35
- GGUF_N_BATCH=64 # For more GPU memory
36
- ```
37
-
38
- ## Performance Expectations
39
-
40
- ### CPU-Only Mode (Current Free Tier)
41
- - **Speed**: ~2-5 tokens/second
42
- - **Memory**: ~2-4GB RAM usage
43
- - **Latency**: 30-60 seconds for patient summaries
44
-
45
- ### GPU Mode (Upgraded Space)
46
- - **Speed**: ~10-50 tokens/second (5-10x faster)
47
- - **Memory**: ~4-8GB GPU memory + 2-4GB RAM
48
- - **Latency**: 5-15 seconds for patient summaries
49
-
50
- ## Upgrade Benefits
51
-
52
- ### 1. **Significantly Faster Generation**
53
- - 5-10x speed improvement for GGUF models
54
- - Reduced streaming latency
55
- - Better user experience
56
-
57
- ### 2. **Better Resource Utilization**
58
- - GPU acceleration for model inference
59
- - More efficient memory usage
60
- - Parallel processing capabilities
61
-
62
- ### 3. **Scalability**
63
- - Handle more concurrent requests
64
- - Support larger models
65
- - Better performance under load
66
-
67
- ## Implementation Details
68
-
69
- ### Code Changes Made
70
- ```python
71
- # GPU detection and configuration
72
- n_gpu_layers = 0 # Default to CPU-only
73
- gpu_available = False
74
-
75
- # Check for CUDA availability
76
- try:
77
- import torch
78
- if torch.cuda.is_available():
79
- gpu_available = True
80
- # Use all GPU layers if available
81
- n_gpu_layers = int(os.environ.get("GGUF_GPU_LAYERS", "-1"))
82
- logger.info(f"CUDA available, using {n_gpu_layers} GPU layers")
83
- else:
84
- logger.info("CUDA not available, using CPU only")
85
- except ImportError:
86
- logger.info("PyTorch not available, using CPU only")
87
- except Exception as e:
88
- logger.warning(f"GPU detection failed: {e}, falling back to CPU")
89
- ```
90
-
91
- ### Logging Output
92
- The system now provides clear logging about GPU usage:
93
-
94
- ```
95
- [GGUF] CUDA available, using -1 GPU layers
96
- [GGUF] Model initialized in 2.34s from /path/to/model.gguf (threads=4, batch=32, GPU layers=-1)
97
- ```
98
-
99
- Or for CPU-only:
100
- ```
101
- [GGUF] CUDA not available, using CPU only
102
- [GGUF] Model initialized in 1.23s from /path/to/model.gguf (threads=4, batch=32, CPU-only)
103
- ```
104
-
105
- ## Testing GPU Usage
106
-
107
- ### 1. Check GPU Availability
108
- ```python
109
- import torch
110
- print(f"CUDA available: {torch.cuda.is_available()}")
111
- if torch.cuda.is_available():
112
- print(f"GPU count: {torch.cuda.device_count()}")
113
- print(f"GPU name: {torch.cuda.get_device_name(0)}")
114
- ```
115
-
116
- ### 2. Monitor GPU Usage
117
- ```bash
118
- # Check GPU memory usage
119
- nvidia-smi
120
-
121
- # Monitor GPU utilization
122
- watch -n 1 nvidia-smi
123
- ```
124
-
125
- ### 3. Test Performance
126
- The streaming API will show improved performance with GPU:
127
- - Faster progress updates
128
- - Reduced generation time
129
- - Better throughput
130
-
131
- ## Troubleshooting
132
-
133
- ### Common Issues
134
-
135
- 1. **GPU Not Detected**
136
- - Ensure PyTorch with CUDA support is installed
137
- - Check CUDA_VISIBLE_DEVICES environment variable
138
- - Verify GPU is available in the Space
139
-
140
- 2. **Out of Memory Errors**
141
- - Reduce `GGUF_GPU_LAYERS` to use fewer layers
142
- - Decrease `GGUF_N_BATCH` for smaller batch size
143
- - Use smaller models
144
-
145
- 3. **Performance Issues**
146
- - Check GPU utilization with `nvidia-smi`
147
- - Monitor memory usage
148
- - Adjust batch size and layer count
149
-
150
- ### Fallback Behavior
151
- The system is designed to gracefully fall back to CPU if GPU is not available or fails, ensuring the service remains functional.
152
-
153
- ## Migration Notes
154
-
155
- - **Backward Compatible**: Works on both CPU and GPU Spaces
156
- - **No Breaking Changes**: Existing functionality preserved
157
- - **Automatic Detection**: No manual configuration required
158
- - **Environment Variables**: Optional fine-tuning available
159
-
160
- ## Expected Results After Upgrade
161
-
162
- With GPU acceleration, you should see:
163
- - **5-10x faster** patient summary generation
164
- - **Reduced streaming latency** from 30-60s to 5-15s
165
- - **Better concurrent request handling**
166
- - **More responsive user interface**
167
-
168
- The streaming API will provide the same events but with much faster progression through the processing stages.
169
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
HF_SPACES_FIXES_APPLIED.md DELETED
@@ -1,416 +0,0 @@
1
- # Hugging Face Spaces - Issues Fixed
2
-
3
- ## Summary
4
- This document summarizes all the fixes applied to resolve potential internal server errors when deploying to Hugging Face Spaces.
5
-
6
- **Date:** $(date)
7
- **Total Issues Fixed:** 18 (4 critical, 6 high, 8 medium/minor)
8
-
9
- ---
10
-
11
- ## ✅ CRITICAL FIXES APPLIED
12
-
13
- ### 1. ✅ Fixed Redis Connection Blocking Startup
14
- **Files Modified:**
15
- - `services/ai-service/src/ai_med_extract/app.py` (Lines 61-88)
16
- - `services/ai-service/src/ai_med_extract/api_endpoints.py` (Lines 18-34)
17
- - `app.py` (Lines 35-50)
18
-
19
- **Changes:**
20
- - Added HF_SPACES environment detection
21
- - Redis connections are now skipped entirely on HF Spaces
22
- - Added proper error handling for Redis initialization failures
23
- - Empty Redis URL defaults prevent connection attempts
24
- - Module-level Redis initialization now has try-except wrapper
25
-
26
- **Testing:**
27
- ```bash
28
- # Test app starts without Redis
29
- export HF_SPACES=true
30
- export REDIS_URL=""
31
- python app.py
32
- # Should see: "Skipping Redis initialization on HF Spaces"
33
- ```
34
-
35
- ### 2. ✅ Fixed Read-Only Filesystem Issues
36
- **Files Modified:**
37
- - `services/ai-service/src/config_settings.py` (Lines 13-14, 28-29, 40-50)
38
- - `services/ai-service/src/ai_med_extract/utils/file_utils.py` (Lines 33-64)
39
-
40
- **Changes:**
41
- - Changed default UPLOAD_PATH from `/app/uploads` to `/tmp/uploads`
42
- - Changed default MODEL_CACHE_DIR from `/app/models` to `/tmp/models`
43
- - DATABASE_URL now defaults to empty string instead of postgres connection
44
- - REDIS_URL now defaults to empty string
45
- - Added error handling for directory creation failures
46
- - HF_SPACES boolean flag now properly read from environment
47
-
48
- **Testing:**
49
- ```bash
50
- # Verify paths
51
- python -c "from config_settings import get_settings; s = get_settings(); print(f'Upload: {s.UPLOAD_PATH}, Models: {s.MODEL_CACHE_DIR}')"
52
- # Should output: Upload: /tmp/uploads, Models: /tmp/models
53
- ```
54
-
55
- ### 3. ✅ Fixed Gradio App Localhost Requests
56
- **Files Modified:**
57
- - `services/ai-service/src/ai_med_extract/gradio_app.py` (Complete rewrite)
58
-
59
- **Changes:**
60
- - Removed all localhost HTTP requests
61
- - Functions now call agents directly via imports
62
- - Added proper async handling for inference service
63
- - Added comprehensive error handling
64
- - PHI scrubber agent called directly instead of via API
65
- - Fallback handling if agents fail to initialize
66
-
67
- **Testing:**
68
- ```python
69
- # Test functions work without HTTP
70
- from services.ai_service.src.ai_med_extract.gradio_app import summarize_text, scrub_phi
71
- result = summarize_text("Test medical text here")
72
- print(result)
73
- ```
74
-
75
- ### 4. ✅ Fixed API Endpoints Service Redis Initialization
76
- **Files Modified:**
77
- - `services/ai-service/src/ai_med_extract/api_endpoints.py` (Lines 18-34, 67-79)
78
-
79
- **Changes:**
80
- - Wrapped Redis initialization in try-except at module level
81
- - Added fallback for PHI scrubbing when Redis unavailable
82
- - `/phi/scrub` endpoint returns graceful error when Redis not available
83
- - Proper logging of initialization failures
84
-
85
- **Testing:**
86
- ```bash
87
- # Test API starts without Redis
88
- curl http://localhost:7860/phi/scrub -X POST -H "Content-Type: application/json" -d '{"text":"test"}'
89
- # Should return JSON with warning about Redis not available
90
- ```
91
-
92
- ---
93
-
94
- ## ✅ HIGH SEVERITY FIXES APPLIED
95
-
96
- ### 5. ✅ Fixed Database Connection Attempts on HF Spaces
97
- **Files Modified:**
98
- - `services/ai-service/src/ai_med_extract/app.py` (Lines 89-104)
99
-
100
- **Changes:**
101
- - Database connections now skipped on HF Spaces
102
- - Empty DATABASE_URL prevents connection attempts
103
- - Added explicit logging for HF Spaces environment detection
104
-
105
- ### 6. ✅ Fixed Environment Variable Defaults
106
- **Files Modified:**
107
- - `services/ai-service/src/config_settings.py`
108
- - `app.py`
109
-
110
- **Changes:**
111
- - DATABASE_URL defaults to empty string (not postgres URL)
112
- - REDIS_URL defaults to empty string (not redis URL)
113
- - HF_SPACES detection via SPACE_ID or SPACE_AUTHOR_NAME environment variables
114
- - Automatic setting of HF_SPACES=true when detected
115
-
116
- ### 7. ✅ Improved Model Loading Memory Management
117
- **Files Modified:**
118
- - `services/ai-service/src/ai_med_extract/app.py` (Lines 558-636)
119
-
120
- **Changes:**
121
- - HF_SPACES detection ensures FAST_MODE is enabled
122
- - PRELOAD_SMALL_MODELS disabled on HF Spaces
123
- - Models loaded lazily to reduce memory footprint
124
- - Better fallback handling for model loading failures
125
-
126
- ### 8. ✅ Fixed Upload Path Consistency
127
- **Files Modified:**
128
- - `services/ai-service/src/ai_med_extract/app.py` (Lines 215-244)
129
- - `services/ai-service/src/ai_med_extract/utils/file_utils.py` (Lines 33-64)
130
-
131
- **Changes:**
132
- - Upload directory resolution now HF_SPACES-aware
133
- - All file operations consistently use /tmp on HF Spaces
134
- - Improved error handling for directory creation
135
- - Fallback chain: /tmp/uploads → /tmp (if all else fails)
136
-
137
- ---
138
-
139
- ## ✅ MEDIUM SEVERITY FIXES APPLIED
140
-
141
- ### 9. ✅ Improved External API Error Handling
142
- **Files Modified:**
143
- - `services/ai-service/src/ai_med_extract/api/routes_fastapi.py` (Lines 375-413, 880-928)
144
-
145
- **Changes:**
146
- - Added specific exception handling for timeout, connection, and request errors
147
- - User-friendly error messages with categories (TIMEOUT, CONNECTION, EHR_API, MEMORY, GENERAL)
148
- - Better error response format with error_category field
149
- - Proper logging of error details
150
- - Graceful degradation with fallback summaries
151
-
152
- **Error Categories Implemented:**
153
- - `TIMEOUT`: Operation took too long
154
- - `CONNECTION`: Network/connectivity issues
155
- - `EHR_API`: External EHR system errors
156
- - `MEMORY`: Insufficient memory errors
157
- - `GENERAL`: Other errors with truncated message
158
-
159
- ### 10-18. ✅ Additional Improvements
160
- - Added comprehensive logging throughout
161
- - Improved fallback strategies
162
- - Better async exception handling
163
- - Cache directory management
164
- - Thread pool size consideration
165
- - Model download progress logging
166
-
167
- ---
168
-
169
- ## 📋 TESTING CHECKLIST
170
-
171
- ### Basic Functionality Tests
172
-
173
- #### 1. App Startup Test
174
- ```bash
175
- export HF_SPACES=true
176
- export REDIS_URL=""
177
- export DATABASE_URL=""
178
- python app.py
179
- # Should start without errors
180
- # Check logs for: "Detected Hugging Face Spaces environment"
181
- ```
182
-
183
- #### 2. Health Endpoints Test
184
- ```bash
185
- curl http://localhost:7860/health
186
- curl http://localhost:7860/live
187
- curl http://localhost:7860/ready
188
- # All should return 200 OK
189
- ```
190
-
191
- #### 3. File Operations Test
192
- ```bash
193
- # Verify /tmp/uploads directory is created
194
- ls -la /tmp/uploads
195
- # Should exist and be writable
196
- ```
197
-
198
- #### 4. Redis Disabled Test
199
- ```bash
200
- # Start app without Redis
201
- export REDIS_URL=""
202
- python app.py
203
- # Check logs: "Redis URL not configured"
204
- # No Redis connection errors should appear
205
- ```
206
-
207
- #### 5. Database Disabled Test
208
- ```bash
209
- # Start app without Database
210
- export DATABASE_URL=""
211
- python app.py
212
- # Check logs: "Database audit logger not configured"
213
- # No database connection errors should appear
214
- ```
215
-
216
- ### API Endpoint Tests
217
-
218
- #### 6. Summarization Endpoint Test
219
- ```bash
220
- curl -X POST http://localhost:7860/summarize \
221
- -H "Content-Type: application/json" \
222
- -d '{"text":"Patient presents with fever and cough."}'
223
- # Should return summary or graceful error
224
- ```
225
-
226
- #### 7. PHI Scrubbing Endpoint Test
227
- ```bash
228
- curl -X POST http://localhost:7860/phi/scrub \
229
- -H "Content-Type: application/json" \
230
- -d '{"text":"John Doe, SSN 123-45-6789"}'
231
- # Should return with warning if Redis unavailable
232
- # Should not return 500 error
233
- ```
234
-
235
- #### 8. Patient Summary Endpoint Test (with Mock Data)
236
- ```bash
237
- curl -X POST http://localhost:7860/generate_patient_summary \
238
- -H "Content-Type: application/json" \
239
- -d '{
240
- "patientid": "test123",
241
- "token": "mock_token",
242
- "key": "http://mock-ehr-system.com",
243
- "generation_mode": "rule"
244
- }'
245
- # Should return rule-based summary or connection error
246
- # Should not return 500 error
247
- ```
248
-
249
- ### Error Handling Tests
250
-
251
- #### 9. Timeout Error Test
252
- ```bash
253
- # Test with unreachable EHR endpoint
254
- curl -X POST http://localhost:7860/generate_patient_summary \
255
- -H "Content-Type: application/json" \
256
- -d '{
257
- "patientid": "test",
258
- "token": "test",
259
- "key": "http://1.2.3.4:9999",
260
- "generation_mode": "rule",
261
- "timeout_mode": "fast"
262
- }'
263
- # Should return error with category: "TIMEOUT" or "CONNECTION"
264
- ```
265
-
266
- #### 10. Invalid Input Test
267
- ```bash
268
- curl -X POST http://localhost:7860/summarize \
269
- -H "Content-Type: application/json" \
270
- -d '{"text":""}'
271
- # Should return 400 error (not 500)
272
- ```
273
-
274
- ### Memory and Resource Tests
275
-
276
- #### 11. Model Loading Test
277
- ```python
278
- # Test lazy model loading
279
- import os
280
- os.environ['HF_SPACES'] = 'true'
281
- os.environ['FAST_MODE'] = 'true'
282
- from ai_med_extract.app import create_app, initialize_agents
283
- app = create_app(initialize=False)
284
- initialize_agents(app, preload_small_models=False)
285
- # Should complete without loading heavy models
286
- ```
287
-
288
- #### 12. Memory Cleanup Test
289
- ```bash
290
- # Monitor memory usage during operation
291
- # Start app with monitoring
292
- python -m memory_profiler app.py &
293
- # Make several requests
294
- # Memory should be released after each request
295
- ```
296
-
297
- ---
298
-
299
- ## 🔧 ENVIRONMENT VARIABLES FOR HF SPACES
300
-
301
- Add these to your Hugging Face Space settings or `.env` file:
302
-
303
- ```bash
304
- # Required for HF Spaces
305
- HF_SPACES=true
306
- FAST_MODE=true
307
- PRELOAD_SMALL_MODELS=false
308
-
309
- # Disable external services
310
- REDIS_URL=
311
- DATABASE_URL=
312
-
313
- # Configure paths
314
- UPLOAD_PATH=/tmp/uploads
315
- MODEL_CACHE_DIR=/tmp/models
316
-
317
- # Memory optimization
318
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
319
- TOKENIZERS_PARALLELISM=false
320
- OMP_NUM_THREADS=1
321
- MKL_NUM_THREADS=1
322
-
323
- # Cache directories
324
- HF_HOME=/tmp/huggingface
325
- XDG_CACHE_HOME=/tmp
326
- TORCH_HOME=/tmp/torch
327
- WHISPER_CACHE=/tmp/whisper
328
- ```
329
-
330
- ---
331
-
332
- ## 📊 VERIFICATION SUMMARY
333
-
334
- ### Files Modified: 7
335
- 1. `services/ai-service/src/config_settings.py`
336
- 2. `services/ai-service/src/ai_med_extract/app.py`
337
- 3. `services/ai-service/src/ai_med_extract/api_endpoints.py`
338
- 4. `services/ai-service/src/ai_med_extract/gradio_app.py`
339
- 5. `services/ai-service/src/ai_med_extract/utils/file_utils.py`
340
- 6. `services/ai-service/src/ai_med_extract/api/routes_fastapi.py`
341
- 7. `app.py`
342
-
343
- ### Lines Changed: ~500+
344
- ### New Features Added:
345
- - HF_SPACES environment detection
346
- - Graceful degradation when services unavailable
347
- - Better error categorization
348
- - Improved logging
349
-
350
- ### Backward Compatibility:
351
- ✅ All changes maintain backward compatibility with non-HF Spaces deployments
352
-
353
- ---
354
-
355
- ## 🚀 DEPLOYMENT READY
356
-
357
- The application is now ready for Hugging Face Spaces deployment with:
358
-
359
- 1. ✅ No Redis dependency
360
- 2. ✅ No Database dependency
361
- 3. ✅ All file operations in /tmp
362
- 4. ✅ Memory-optimized model loading
363
- 5. ✅ Graceful error handling
364
- 6. ✅ Proper async/await patterns
365
- 7. ✅ Comprehensive logging
366
- 8. ✅ Fallback strategies for all critical paths
367
-
368
- ---
369
-
370
- ## 📝 NEXT STEPS
371
-
372
- 1. **Test locally with HF_SPACES=true**
373
- ```bash
374
- export HF_SPACES=true
375
- python app.py
376
- ```
377
-
378
- 2. **Deploy to HF Spaces**
379
- - Push code to your Hugging Face Space
380
- - Set environment variables in Space settings
381
- - Monitor startup logs
382
-
383
- 3. **Verify endpoints**
384
- - Test `/health`, `/ready`, `/live`
385
- - Test main API endpoints
386
- - Check error responses are proper (not 500)
387
-
388
- 4. **Monitor performance**
389
- - Check memory usage
390
- - Verify model loading times
391
- - Test with realistic workloads
392
-
393
- ---
394
-
395
- ## ⚠️ KNOWN LIMITATIONS ON HF SPACES
396
-
397
- 1. **No Redis**: Caching and rate limiting features disabled
398
- 2. **No Database**: Audit logging and persistence disabled
399
- 3. **Memory Limits**: Large models may not load on free tier
400
- 4. **Storage Limits**: /tmp has size restrictions
401
- 5. **No External Services**: EHR API calls may timeout on slow networks
402
-
403
- These limitations are handled gracefully with fallbacks and proper error messages.
404
-
405
- ---
406
-
407
- ## 📧 SUPPORT
408
-
409
- If you encounter issues:
410
- 1. Check the logs for specific error messages
411
- 2. Verify environment variables are set correctly
412
- 3. Ensure HF_SPACES=true is set
413
- 4. Check the error category in API responses
414
- 5. Review this document for relevant fixes
415
-
416
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
HF_SPACES_ISSUES_REPORT.md DELETED
@@ -1,209 +0,0 @@
1
- # Hugging Face Spaces - Potential Internal Server Error Issues
2
-
3
- ## 🔴 CRITICAL ISSUES (Will cause 500 errors)
4
-
5
- ### 1. Redis Connection Will Block Startup
6
- **File:** `services/ai-service/src/ai_med_extract/app.py` (Lines 63-75)
7
- **Issue:** The app tries to connect to Redis at startup with a 5-second timeout, but the error handling may not be sufficient.
8
- ```python
9
- redis_client = redis.from_url(redis_url, decode_responses=True, socket_timeout=5, socket_connect_timeout=5)
10
- await asyncio.wait_for(redis_client.ping(), timeout=5.0)
11
- ```
12
- **Impact:** If Redis URL is invalid or connection hangs, it could delay startup significantly.
13
- **Fix:** Ensure Redis connection is truly optional and doesn't block any critical paths.
14
-
15
- ### 2. Config Settings Tries to Create Directories in Read-Only Filesystem
16
- **File:** `services/ai-service/src/config_settings.py` (Lines 41-43)
17
- ```python
18
- os.makedirs(s.UPLOAD_PATH, exist_ok=True) # /app/uploads - may be read-only on HF Spaces
19
- os.makedirs(s.MODEL_CACHE_DIR, exist_ok=True) # /app/models - may be read-only on HF Spaces
20
- ```
21
- **Impact:** HF Spaces has a read-only filesystem except for `/tmp`. This will fail.
22
- **Fix:** Default paths should be in `/tmp/` directory.
23
-
24
- ### 3. Gradio App Makes Localhost Requests
25
- **File:** `services/ai-service/src/ai_med_extract/gradio_app.py` (Lines 11, 21)
26
- ```python
27
- response = requests.post(f"http://localhost:{settings.PORT}/summarize", json={"text": text})
28
- response = requests.post(f"http://localhost:{settings.PORT}/phi/scrub", json={"text": text})
29
- ```
30
- **Impact:** On HF Spaces, the Gradio interface can't make requests to localhost. This will cause connection errors.
31
- **Fix:** Gradio functions should call agents directly, not via HTTP requests.
32
-
33
- ### 4. API Endpoints Service Tries to Connect to Redis Without Proper Fallback
34
- **File:** `services/ai-service/src/ai_med_extract/api_endpoints.py` (Lines 17-18)
35
- ```python
36
- _inf = InferenceService()
37
- _redis = redis.from_url(settings.REDIS_URL, decode_responses=True)
38
- _phi = PHIScrubberService(_redis)
39
- ```
40
- **Impact:** This creates Redis connection at module import time without error handling. Will crash if Redis is not available.
41
- **Fix:** Wrap in try-except and use lazy initialization.
42
-
43
- ## 🟠 HIGH SEVERITY ISSUES (Likely to cause errors)
44
-
45
- ### 5. Database URL May Cause Connection Attempts
46
- **File:** `services/ai-service/src/ai_med_extract/app.py` (Lines 78-88)
47
- ```python
48
- database_url = os.getenv('DATABASE_URL')
49
- if database_url:
50
- try:
51
- db_audit_logger = await initialize_db_audit_logger(database_url)
52
- ```
53
- **Impact:** If DATABASE_URL is set but PostgreSQL is not available, this could hang or fail.
54
- **Fix:** Add connection timeout and better error handling.
55
-
56
- ### 6. Missing Environment Variable Handling
57
- **File:** `services/ai-service/src/config_settings.py` (Line 13)
58
- ```python
59
- DATABASE_URL: str = os.getenv("DATABASE_URL", "postgresql+asyncpg://user:password@postgres:5432/db")
60
- ```
61
- **Impact:** Default database URL points to `postgres:5432` which won't exist on HF Spaces.
62
- **Fix:** Should default to `None` or check for HF Spaces environment.
63
-
64
- ### 7. Model Loading May Exhaust Memory
65
- **File:** `services/ai-service/src/ai_med_extract/app.py` (Lines 558-636)
66
- **Issue:** Multiple models are preloaded even in non-fast mode, which could exhaust available memory on free HF Spaces tier.
67
- ```python
68
- if preload_small_models and not fast_mode:
69
- # Loads summarizer_agent, medical_data_extractor_agent, patient_summarizer_agent
70
- ```
71
- **Impact:** Out of memory errors causing crashes.
72
- **Fix:** Ensure HF_SPACES environment variable is checked and models are loaded lazily.
73
-
74
- ### 8. File Upload Paths May Be Incorrect
75
- **File:** `services/ai-service/src/ai_med_extract/api/routes_fastapi.py` (Line 1076)
76
- ```python
77
- upload_dir = '/tmp/uploads'
78
- os.makedirs(upload_dir, exist_ok=True)
79
- ```
80
- **Impact:** While this uses /tmp, other parts of the code may use different paths.
81
- **Fix:** Ensure all upload/temp file operations use /tmp consistently.
82
-
83
- ## 🟡 MEDIUM SEVERITY ISSUES (May cause errors under certain conditions)
84
-
85
- ### 9. Model Manager Import May Fail
86
- **File:** `services/ai-service/src/ai_med_extract/app.py` (Lines 20-35)
87
- ```python
88
- try:
89
- from .utils.model_manager import model_manager
90
- logging.info("Model manager imported successfully")
91
- except ImportError as e:
92
- logging.warning(f"Failed to import model_manager: {e}")
93
- ```
94
- **Impact:** If model_manager import fails, fallback is used but may not work properly for all operations.
95
- **Fix:** Ensure fallback is comprehensive.
96
-
97
- ### 10. OpenVINO Model Loading May Not Work on HF Spaces
98
- **File:** `services/ai-service/src/ai_med_extract/utils/model_loader_spaces.py` (Lines 17-21)
99
- ```python
100
- model = OVModelForCausalLM.from_pretrained(model_name, compile=True, device="CPU", cache_dir=...)
101
- ```
102
- **Impact:** OpenVINO may have compatibility issues on HF Spaces infrastructure.
103
- **Fix:** Add fallback to regular transformers if OpenVINO fails.
104
-
105
- ### 11. External API Calls May Timeout
106
- **File:** `services/ai-service/src/ai_med_extract/api/routes_fastapi.py` (Lines 372-392)
107
- ```python
108
- response = requests.post(ehr_url, json={"patientid": patientid}, headers=headers, timeout=EHR_TIMEOUT)
109
- ```
110
- **Impact:** External EHR API calls may fail or timeout, causing endpoint failures.
111
- **Fix:** Better error messages and graceful degradation.
112
-
113
- ### 12. Transformers Model Loading May Fail with Device Mapping
114
- **File:** `services/ai-service/src/ai_med_extract/utils/model_manager.py` (Lines 74-86)
115
- ```python
116
- self._model = AutoModelForCausalLM.from_pretrained(
117
- self.model_name,
118
- device_map="auto" if self.device == "cuda" and torch.cuda.is_available() else None,
119
- ```
120
- **Impact:** `device_map="auto"` may cause issues on HF Spaces with limited resources.
121
- **Fix:** Force CPU mode on HF Spaces.
122
-
123
- ### 13. GGUF Model Loading May Download Large Files
124
- **File:** `services/ai-service/src/ai_med_extract/api/routes_fastapi.py` (Lines 40-68)
125
- ```python
126
- GGUF_MODEL_CACHE[key] = GGUFModelPipeline(model_name, filename, timeout=timeout)
127
- ```
128
- **Impact:** GGUF models can be very large (several GB), exhausting disk space or taking too long to download.
129
- **Fix:** Add size checks and better error handling.
130
-
131
- ### 14. Thread Pool Executor May Cause Resource Issues
132
- **File:** `services/ai-service/src/ai_med_extract/api/routes_fastapi.py` (Line 1424)
133
- ```python
134
- with ThreadPoolExecutor(max_workers=4) as executor:
135
- ```
136
- **Impact:** Multiple threads may compete for limited CPU resources on free tier.
137
- **Fix:** Reduce max_workers on HF Spaces or use sequential processing.
138
-
139
- ## 🔵 MINOR ISSUES (Edge cases)
140
-
141
- ### 15. Werkzeug Import for secure_filename
142
- **File:** `services/ai-service/src/ai_med_extract/api/routes_fastapi.py` (Line 1215)
143
- ```python
144
- from werkzeug.utils import secure_filename
145
- ```
146
- **Impact:** Werkzeug is listed in requirements but only used in one place.
147
- **Fix:** Could use a simpler alternative to reduce dependencies.
148
-
149
- ### 16. Missing Error Handlers for Specific Exceptions
150
- **File:** `services/ai-service/src/ai_med_extract/app.py` (Lines 246-262)
151
- **Issue:** Global exception handler catches all exceptions but may not properly handle async exceptions.
152
- **Fix:** Add specific handlers for common async exceptions.
153
-
154
- ### 17. Cache Directory Cleanup May Not Work
155
- **File:** Multiple files use cache directories in `/tmp/`
156
- **Impact:** On HF Spaces, /tmp may persist between requests but have size limits.
157
- **Fix:** Implement proper cache cleanup strategies.
158
-
159
- ### 18. Model Download Progress May Block
160
- **Issue:** Large model downloads without progress indicators may appear as hangs.
161
- **Fix:** Add progress logging for model downloads.
162
-
163
- ## 📋 RECOMMENDATIONS
164
-
165
- ### Immediate Fixes Required:
166
-
167
- 1. **Fix config_settings.py paths** - Change default UPLOAD_PATH and MODEL_CACHE_DIR to `/tmp/uploads` and `/tmp/models`
168
- 2. **Fix api_endpoints.py Redis initialization** - Wrap in try-except block
169
- 3. **Fix gradio_app.py** - Make it call agents directly instead of HTTP requests
170
- 4. **Add HF_SPACES environment check** - Disable Redis/DB connections when on HF Spaces
171
- 5. **Ensure all file operations use /tmp** - Audit all file write operations
172
-
173
- ### Testing Checklist:
174
-
175
- - [ ] Test app startup without Redis
176
- - [ ] Test app startup without Database
177
- - [ ] Test all API endpoints return proper error messages (not 500) when services unavailable
178
- - [ ] Test model loading with memory constraints
179
- - [ ] Test file uploads work with /tmp directory
180
- - [ ] Test that no operations try to write to read-only filesystem
181
- - [ ] Verify all external API calls have proper timeouts
182
- - [ ] Check that lazy loading works for all models
183
-
184
- ### Environment Variables for HF Spaces:
185
-
186
- ```bash
187
- FAST_MODE=true
188
- PRELOAD_SMALL_MODELS=false
189
- HF_SPACES=true
190
- REDIS_URL= # Empty - don't use Redis
191
- DATABASE_URL= # Empty - don't use Database
192
- UPLOAD_PATH=/tmp/uploads
193
- MODEL_CACHE_DIR=/tmp/models
194
- ```
195
-
196
- ## 🔧 Priority Fix Order:
197
-
198
- 1. **Critical Path Issues** - Redis/DB connections, filesystem paths
199
- 2. **Model Loading** - Memory optimization, lazy loading
200
- 3. **API Endpoints** - Error handling, timeouts
201
- 4. **Gradio Integration** - Direct agent calls
202
- 5. **Monitoring** - Better logging and error messages
203
-
204
- ---
205
-
206
- **Generated:** $(date)
207
- **Scan Coverage:** 15 files, 6000+ lines of code
208
- **Issues Found:** 18 total (4 critical, 6 high, 8 medium/minor)
209
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
HF_SPACES_RUNTIME_FIX_SUMMARY.md DELETED
@@ -1,81 +0,0 @@
1
- # Hugging Face Spaces Runtime Error Fix Summary
2
-
3
- ## Issues Identified and Fixed
4
-
5
- ### 1. Invalid uvicorn option `--no-reload`
6
- **Problem**: The Dockerfile was using `--no-reload` which is not a valid uvicorn option.
7
- **Error**: `Error: No such option: --no-reload (Possible options: --reload, --reload-delay, --reload-dir)`
8
-
9
- **Fix Applied**:
10
- - Updated `Dockerfile` line 226: Removed `--no-reload` option
11
- - Changed from: `CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--no-reload"]`
12
- - Changed to: `CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]`
13
-
14
- ### 2. Permission issues with /tmp directory
15
- **Problem**: The entrypoint script was trying to `chmod -R 777 /tmp` which fails on Hugging Face Spaces due to permission restrictions.
16
- **Error**: `chmod: changing permissions of '/tmp': Operation not permitted`
17
-
18
- **Fix Applied**:
19
- - Updated `Dockerfile` entrypoint script to only chmod specific subdirectories
20
- - Changed from: `chmod -R 777 /tmp`
21
- - Changed to: `chmod -R 777 /tmp/uploads /tmp/huggingface /tmp/torch /tmp/whisper || true`
22
- - Updated `entrypoint_optimized.sh` with similar fixes
23
-
24
- ### 3. Entrypoint configuration
25
- **Problem**: The `.huggingface.yaml` was pointing to the wrong entrypoint path.
26
-
27
- **Fix Applied**:
28
- - Updated `.huggingface.yaml` to use the correct path: `services/ai-service/src/ai_med_extract/app:app`
29
- - Created `start_hf_spaces.py` as an alternative startup script
30
- - Both approaches are now available for deployment
31
-
32
- ## Files Modified
33
-
34
- 1. **Dockerfile**:
35
- - Fixed uvicorn command (removed `--no-reload`)
36
- - Updated entrypoint script to avoid chmod on entire `/tmp`
37
-
38
- 2. **entrypoint_optimized.sh**:
39
- - Updated to only chmod specific directories
40
- - Added `|| true` to prevent script failure on permission errors
41
-
42
- 3. **.huggingface.yaml**:
43
- - Updated entrypoint path to correct location
44
-
45
- 4. **start_hf_spaces.py** (new file):
46
- - Alternative startup script for HF Spaces
47
- - Handles environment setup and app initialization
48
-
49
- 5. **test_hf_spaces_fix.py** (new file):
50
- - Test script to verify fixes work correctly
51
-
52
- ## Testing
53
-
54
- The fixes have been tested locally and should resolve the runtime errors on Hugging Face Spaces:
55
-
56
- 1. ✅ uvicorn command now uses valid options
57
- 2. ✅ Permission handling avoids chmod on entire `/tmp`
58
- 3. ✅ App import and initialization should work correctly
59
-
60
- ## Deployment Instructions
61
-
62
- 1. Commit these changes to your repository
63
- 2. Push to the branch that your Hugging Face Space is monitoring
64
- 3. The Space should automatically rebuild with the fixes
65
- 4. The runtime errors should be resolved
66
-
67
- ## Alternative Deployment Options
68
-
69
- If the main fixes don't work, you can try:
70
-
71
- 1. **Use the startup script**: Change `.huggingface.yaml` entrypoint to `python start_hf_spaces.py`
72
- 2. **Use Dockerfile directly**: Ensure the Dockerfile is used instead of `.huggingface.yaml`
73
- 3. **Manual deployment**: Use the `deploy_fix.sh` script if available
74
-
75
- ## Expected Results
76
-
77
- After applying these fixes, the Hugging Face Space should:
78
- - Start without the `--no-reload` error
79
- - Avoid permission errors with `/tmp`
80
- - Successfully initialize the FastAPI application
81
- - Be accessible on port 7860
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
HUGGINGFACE_DEPLOYMENT_FIX.md DELETED
@@ -1,168 +0,0 @@
1
- # Hugging Face Spaces Deployment Fix
2
-
3
- ## Problem Summary
4
-
5
- The deployment to Hugging Face Spaces was failing with the error:
6
- ```
7
- ModuleNotFoundError: No module named 'app'
8
- [2025-10-07 12:40:17 +0000] [10] [ERROR] Exception in worker process
9
- [2025-10-07 12:40:17 +0000] [1] [ERROR] Worker (pid:10) exited with code 3
10
- [2025-10-07 12:40:17 +0000] [1] [ERROR] Reason: Worker failed to boot.
11
- ```
12
-
13
- ## Root Cause
14
-
15
- The `.dockerignore` file was configured to exclude everything (`*`) by default, then selectively include specific files. However, the **root `app.py` file was NOT in the include list**, causing it to be excluded from the Docker build context.
16
-
17
- When Hugging Face Spaces built the container and gunicorn tried to run `app:app`, the module couldn't be found because the file didn't exist in the container.
18
-
19
- ## Fixes Applied
20
-
21
- ### 1. **CRITICAL: Fixed .dockerignore** ✅
22
- **File**: `.dockerignore`
23
-
24
- Added the missing root files to the include list:
25
- ```diff
26
- !requirements.txt
27
- !README.md
28
- !ai_med_extract.py
29
- +!app.py
30
- +!__init__.py
31
-
32
- # Include source code (but not cache files)
33
- +!services/
34
- +!services/ai-service/
35
- +!services/ai-service/src/
36
- !services/ai-service/src/ai_med_extract/
37
- ```
38
-
39
- This ensures that:
40
- - Root `app.py` is copied to the container
41
- - Root `__init__.py` is included for package support
42
- - Complete `services/` directory structure is preserved
43
-
44
- ### 2. **Fixed Missing Import** ✅
45
- **File**: `services/ai-service/src/ai_med_extract/app.py`
46
-
47
- Added missing import at module level:
48
- ```python
49
- from .utils.model_manager import model_manager
50
- ```
51
-
52
- This was being used in the `lifespan` function but wasn't imported, which could cause runtime errors.
53
-
54
- ### 3. **Enhanced Logging** ✅
55
- **File**: `app.py` (root)
56
-
57
- Added comprehensive debug logging:
58
- ```python
59
- logging.info(f"Python path: {sys.path[:3]}")
60
- logging.info(f"Current working directory: {os.getcwd()}")
61
- logging.info(f"Files in current directory: {os.listdir('.')}")
62
- ```
63
-
64
- This provides better visibility for troubleshooting import issues.
65
-
66
- ### 4. **Updated Hugging Face Config** ✅
67
- **File**: `.huggingface.yaml`
68
-
69
- Added explicit app configuration:
70
- ```yaml
71
- app:
72
- entrypoint: app:app
73
- port: 7860
74
- ```
75
-
76
- ### 5. **Documentation Updates** ✅
77
- - Added `DEPLOYMENT_FIX_SUMMARY.md` with detailed analysis
78
- - Updated Dockerfile with clarification comments
79
-
80
- ## Verification
81
-
82
- ### Local Testing ✅
83
- ```bash
84
- python -c "import app; print(app.app.title)"
85
- # Output: Medical AI Service
86
- ```
87
-
88
- All local tests passed:
89
- - ✅ App imports successfully
90
- - ✅ App instance created
91
- - ✅ Agents initialized
92
- - ✅ No module errors
93
-
94
- ## Deployment Instructions
95
-
96
- 1. **Commit the changes**:
97
- ```bash
98
- git add .
99
- git commit -m "Fix HF Spaces deployment - resolve ModuleNotFoundError"
100
- ```
101
-
102
- 2. **Push to Hugging Face Spaces**:
103
- ```bash
104
- git push origin main
105
- ```
106
-
107
- 3. **Monitor the deployment**:
108
- - Check build logs to verify `app.py` is being copied
109
- - Check runtime logs for successful import
110
- - Verify gunicorn starts without errors
111
-
112
- ## Expected Behavior
113
-
114
- ### Before Fix ❌
115
- ```
116
- ModuleNotFoundError: No module named 'app'
117
- Worker failed to boot
118
- Exit code: 3
119
- ```
120
-
121
- ### After Fix ✅
122
- ```
123
- [INFO] Starting gunicorn 21.2.0
124
- [INFO] Listening at: http://0.0.0.0:7860
125
- [INFO] Booting worker with pid: 10
126
- [INFO] Attempting to import from ai_med_extract package...
127
- [INFO] Successfully imported create_app and initialize_agents
128
- [INFO] App instance created successfully
129
- [INFO] Agents initialized successfully
130
- ```
131
-
132
- ## Files Modified
133
-
134
- | File | Change | Priority |
135
- |------|--------|----------|
136
- | `.dockerignore` | Added `!app.py` and `!__init__.py` | **CRITICAL** |
137
- | `app.py` | Enhanced logging | High |
138
- | `services/ai-service/src/ai_med_extract/app.py` | Fixed missing import | High |
139
- | `.huggingface.yaml` | Added app config | Medium |
140
- | `Dockerfile` | Added clarification comment | Low |
141
-
142
- ## Confidence Level
143
-
144
- **HIGH** - The root cause has been definitively identified and fixed. Local testing confirms the fix works correctly. The changes are minimal and targeted.
145
-
146
- ## Rollback Plan
147
-
148
- If deployment still fails:
149
- ```bash
150
- git revert HEAD
151
- git push origin main
152
- ```
153
-
154
- Then investigate additional issues in the Hugging Face Spaces build/runtime logs.
155
-
156
- ## Success Criteria
157
-
158
- - ✅ No `ModuleNotFoundError` during startup
159
- - ✅ Gunicorn worker boots successfully
160
- - ✅ App responds to health checks: `/health`
161
- - ✅ API documentation accessible: `/docs`
162
- - ✅ No exit code 3 errors
163
-
164
- ---
165
-
166
- **Status**: Ready for deployment
167
- **Date**: 2025-10-07
168
- **Priority**: Critical - Unblocks production deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
QUICK_REFERENCE.md DELETED
@@ -1,157 +0,0 @@
1
- # HF Spaces Deployment - Quick Reference Card
2
-
3
- ## 🚀 Quick Deploy Checklist
4
-
5
- ### 1. Environment Variables (Set in HF Spaces Settings)
6
- ```bash
7
- HF_SPACES=true
8
- FAST_MODE=true
9
- PRELOAD_SMALL_MODELS=false
10
- REDIS_URL=
11
- DATABASE_URL=
12
- ```
13
-
14
- ### 2. Verify Before Pushing
15
- ```bash
16
- ✓ All changes committed
17
- ✓ requirements.txt up to date
18
- ✓ No hardcoded localhost URLs
19
- ✓ No hardcoded /app/ or /data/ paths
20
- ```
21
-
22
- ### 3. After Deployment
23
- ```bash
24
- ✓ Check startup logs
25
- ✓ Test /health endpoint
26
- ✓ Test main API endpoints
27
- ✓ Monitor memory usage
28
- ```
29
-
30
- ---
31
-
32
- ## 🔧 Files Modified (7 Total)
33
-
34
- | File | Change |
35
- |------|--------|
36
- | `config_settings.py` | Paths → /tmp, Redis/DB defaults → empty |
37
- | `app.py` (root) | HF_SPACES detection |
38
- | `app.py` (ai_med_extract) | Redis/DB optional |
39
- | `api_endpoints.py` | Redis init with error handling |
40
- | `gradio_app.py` | Direct agent calls (no HTTP) |
41
- | `file_utils.py` | HF_SPACES-aware paths |
42
- | `routes_fastapi.py` | Better error handling |
43
-
44
- ---
45
-
46
- ## ⚠️ Common Issues & Solutions
47
-
48
- | Issue | Solution |
49
- |-------|----------|
50
- | **500 Error on Startup** | Check logs for missing env vars |
51
- | **Redis Connection Error** | Set `REDIS_URL=` (empty) |
52
- | **Database Connection Error** | Set `DATABASE_URL=` (empty) |
53
- | **File Write Error** | Paths should use /tmp |
54
- | **Memory Error** | Set `FAST_MODE=true`, `PRELOAD_SMALL_MODELS=false` |
55
- | **Timeout Error** | External API calls - expected behavior |
56
-
57
- ---
58
-
59
- ## 🧪 Quick Test Commands
60
-
61
- ### Test Health
62
- ```bash
63
- curl https://your-space.hf.space/health
64
- curl https://your-space.hf.space/ready
65
- ```
66
-
67
- ### Test Summarization
68
- ```bash
69
- curl -X POST https://your-space.hf.space/summarize \
70
- -H "Content-Type: application/json" \
71
- -d '{"text":"Medical text here"}'
72
- ```
73
-
74
- ### Test PHI Scrubbing
75
- ```bash
76
- curl -X POST https://your-space.hf.space/phi/scrub \
77
- -H "Content-Type: application/json" \
78
- -d '{"text":"Patient name: John Doe"}'
79
- ```
80
-
81
- ---
82
-
83
- ## 📊 What's Fixed
84
-
85
- | Category | Status |
86
- |----------|--------|
87
- | Redis Dependency | ✅ Optional |
88
- | Database Dependency | ✅ Optional |
89
- | File Operations | ✅ Use /tmp |
90
- | Gradio Localhost | ✅ Direct calls |
91
- | Error Handling | ✅ User-friendly |
92
- | Memory Usage | ✅ Optimized |
93
-
94
- ---
95
-
96
- ## 🔍 Monitoring Checklist
97
-
98
- - [ ] Startup logs show "Detected Hugging Face Spaces environment"
99
- - [ ] No Redis connection errors
100
- - [ ] No Database connection errors
101
- - [ ] All endpoints return proper status codes
102
- - [ ] Error messages are user-friendly
103
- - [ ] Memory usage < 16GB (Basic tier)
104
- - [ ] Response times < 30s
105
-
106
- ---
107
-
108
- ## 📞 Emergency Debug
109
-
110
- If app crashes on HF Spaces:
111
-
112
- 1. **Check Startup Logs** - Look for first error
113
- 2. **Verify Env Vars** - HF_SPACES=true set?
114
- 3. **Test Locally** - `export HF_SPACES=true && python app.py`
115
- 4. **Check Memory** - Model too large?
116
- 5. **Review Fixes** - See `HF_SPACES_FIXES_APPLIED.md`
117
-
118
- ---
119
-
120
- ## 📚 Full Documentation
121
-
122
- - **Issues Found:** `HF_SPACES_ISSUES_REPORT.md`
123
- - **Fixes Applied:** `HF_SPACES_FIXES_APPLIED.md`
124
- - **Summary:** `SCAN_SUMMARY.md`
125
- - **This Card:** `QUICK_REFERENCE.md`
126
-
127
- ---
128
-
129
- ## ✅ Success Indicators
130
-
131
- Your deployment is successful if:
132
-
133
- ✓ App starts without crashes
134
- ✓ /health returns 200
135
- ✓ API endpoints respond (even if with errors)
136
- ✓ No 500 errors in logs
137
- ✓ Memory usage stable
138
- ✓ Error messages are informative
139
-
140
- ---
141
-
142
- ## 🎯 Expected Behavior on HF Spaces
143
-
144
- | Feature | Behavior |
145
- |---------|----------|
146
- | Redis | Disabled, features degraded gracefully |
147
- | Database | Disabled, no audit logging |
148
- | File Uploads | Work in /tmp |
149
- | Model Loading | Lazy, optimized for memory |
150
- | External APIs | May timeout, handled gracefully |
151
- | Caching | Limited to /tmp (ephemeral) |
152
-
153
- ---
154
-
155
- *Quick Reference for HF Spaces Deployment*
156
- *For detailed information, see full documentation*
157
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,83 +1,361 @@
1
- ---
2
- title: HNTAI - Medical Data Extraction API
3
- emoji: 📉
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: docker
7
- app_port: 7860
8
- pinned: false
9
- ---
10
 
11
- # HNTAI - Scalable Medical Data Extraction API
12
-
13
- This is a FastAPI-based scalable API for extracting and processing medical data from various document formats, aligned with "ChatGPT Version 3 - Scalable" architecture.
14
-
15
- ## Features
16
- - Document text extraction (PDF, DOCX, Images)
17
- - Audio transcription
18
- - Medical data extraction
19
- - PHI (Protected Health Information) scrubbing with audit logging
20
- - Text summarization with Redis caching
21
- - PostgreSQL database integration for persistence
22
- - Async processing for scalability
23
- - Health endpoints (/live, /ready)
24
- - Security features (non-root containers, secrets management, HIPAA compliance)
25
-
26
- ## Architecture Alignment
27
- Fully aligned with "ChatGPT Version 3 - Scalable":
28
- - FastAPI for async API handling
29
- - Redis for caching and PHI stats
30
- - PostgreSQL for audit logs and data persistence
31
- - Kubernetes deployment with security contexts
32
- - Network policies and HIPAA compliance
33
- - Prometheus monitoring
34
- - Proper resource limits and health probes
35
-
36
- ## Deployment Options
37
- - **Hugging Face Spaces**: Lightweight Docker deployment (legacy)
38
- - **Kubernetes**: Scalable production deployment with security features
39
-
40
- ## Environment Variables
41
- - `DATABASE_URL`: PostgreSQL connection string
42
- - `REDIS_URL`: Redis connection string
43
- - `SECRET_KEY`: Application secret key
44
- - `JWT_SECRET_KEY`: JWT signing key
45
-
46
- ## API Endpoints
47
- - GET /health/live - Liveness health check
48
- - GET /health/ready - Readiness health check
49
- - GET /metrics - Prometheus metrics
50
- - POST /generate_patient_summary - Generate comprehensive patient summaries (with streaming support)
51
- - POST /upload - Upload and process medical documents
52
- - GET /get_updated_medical_data - Retrieve processed medical data
53
- - PUT /update_medical_data - Update medical data fields
54
- - POST /transcribe - Transcribe audio files
55
- - POST /extract_medical_data - Extract structured medical data
56
- - POST /api/generate_summary - Generate text summaries
57
- - POST /api/extract_medical_data_from_audio - Process audio recordings
58
- - POST /api/patient_summary_openvino - Generate patient summaries using OpenVINO
59
-
60
- ## Development
61
 
62
- ### Code Quality
63
- This project uses the following tools for code quality:
64
- - **Black**: Code formatting
65
- - **isort**: Import sorting
66
- - **flake8**: Linting
67
- - **mypy**: Type checking
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
- Run quality checks:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ```bash
 
71
  black .
72
  isort .
 
 
73
  flake8 .
74
  mypy .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ```
76
 
77
- ### Testing
78
- Run tests with:
79
  ```bash
80
- python -m pytest
81
  ```
82
 
83
- For more details, check the API documentation at `/docs`, [DEVELOPMENT.md](DEVELOPMENT.md) for development guides, and [DEPLOYMENT.md](DEPLOYMENT.md) for deployment instructions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HNTAI - Medical Data Extraction & AI Processing Platform
 
 
 
 
 
 
 
 
2
 
3
+ A comprehensive, scalable AI platform for medical data extraction, processing, and analysis. Built with FastAPI, supporting multiple AI model backends including Transformers, OpenVINO, and GGUF models with automatic GPU/CPU optimization.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
+ ## 🏥 Overview
6
+
7
+ HNTAI is a production-ready medical AI platform that provides:
8
+ - **Medical Document Processing**: PDF, DOCX, image, and audio transcription
9
+ - **Protected Health Information (PHI) Scrubbing**: HIPAA-compliant data anonymization
10
+ - **AI-Powered Summarization**: Multi-model support with automatic device optimization
11
+ - **Patient Summary Generation**: Comprehensive clinical assessments
12
+ - **Scalable Architecture**: Kubernetes-ready with monitoring and security features
13
+
14
+ ## 🚀 Key Features
15
+
16
+ ### 🤖 Multi-Model AI Support
17
+ - **Transformers Models**: Hugging Face models with automatic GPU/CPU detection
18
+ - **OpenVINO Optimization**: Intel-optimized models for production performance
19
+ - **GGUF Models**: Quantized models for efficient inference
20
+ - **Automatic Device Selection**: GPU when available, CPU fallback
21
+ - **Model Caching**: Intelligent model management and caching
22
+
23
+ ### 📄 Document Processing
24
+ - **Multi-format Support**: PDF, DOCX, images, audio files
25
+ - **OCR Integration**: Tesseract-based text extraction
26
+ - **Audio Transcription**: Whisper-based speech-to-text
27
+ - **Batch Processing**: Async processing for scalability
28
+
29
+ ### 🔒 Security & Compliance
30
+ - **HIPAA Compliance**: PHI scrubbing with audit logging
31
+ - **Data Encryption**: Secure data handling and storage
32
+ - **Audit Trails**: Comprehensive logging for compliance
33
+ - **Non-root Containers**: Security-hardened deployments
34
+
35
+ ### 📊 Monitoring & Observability
36
+ - **Health Endpoints**: `/health/live`, `/health/ready`
37
+ - **Prometheus Metrics**: `/metrics` endpoint
38
+ - **Structured Logging**: Comprehensive application monitoring
39
+ - **Performance Tracking**: Model inference metrics
40
+
41
+ ## 🏗️ Architecture
42
+
43
+ ```
44
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
45
+ │ FastAPI │ │ AI Models │ │ PostgreSQL │
46
+ │ Web Server │◄──►│ (Multi-backend)│ │ Database │
47
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
48
+ │ │ │
49
+ ▼ ▼ ▼
50
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
51
+ │ Redis Cache │ │ File Storage │ │ Audit Logs │
52
+ │ (PHI Stats) │ │ (Documents) │ │ (Compliance) │
53
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
54
+ ```
55
+
56
+ ## 🛠️ Installation
57
+
58
+ ### Prerequisites
59
+ - Python 3.11+
60
+ - CUDA 11.8+ (for GPU support)
61
+ - Docker (for containerized deployment)
62
+ - PostgreSQL 13+
63
+ - Redis 6+
64
+
65
+ ### Local Development
66
+
67
+ 1. **Clone the repository**:
68
+ ```bash
69
+ git clone <repository-url>
70
+ cd HNTAI
71
+ ```
72
+
73
+ 2. **Create virtual environment**:
74
+ ```bash
75
+ python -m venv venv
76
+ source venv/bin/activate # On Windows: venv\Scripts\activate
77
+ ```
78
+
79
+ 3. **Install dependencies**:
80
+ ```bash
81
+ pip install -r requirements.txt
82
+ ```
83
+
84
+ 4. **Set up environment variables**:
85
+ ```bash
86
+ export DATABASE_URL="postgresql://user:password@localhost:5432/hntai"
87
+ export REDIS_URL="redis://localhost:6379"
88
+ export SECRET_KEY="your-secret-key"
89
+ export JWT_SECRET_KEY="your-jwt-secret"
90
+ export HF_HOME="/tmp/huggingface"
91
+ ```
92
+
93
+ 5. **Run the application**:
94
+ ```bash
95
+ # Development server
96
+ python -m uvicorn services.ai-service.src.ai_med_extract.main:app --reload --host 0.0.0.0 --port 7860
97
+
98
+ # Or using the service directly
99
+ cd services/ai-service
100
+ python src/ai_med_extract/main.py
101
+ ```
102
+
103
+ ### Docker Deployment
104
+
105
+ 1. **Build the image**:
106
+ ```bash
107
+ docker build -t hntai:latest .
108
+ ```
109
+
110
+ 2. **Run with Docker Compose**:
111
+ ```bash
112
+ docker-compose up -d
113
+ ```
114
+
115
+ ### Kubernetes Deployment
116
+
117
+ 1. **Apply Kubernetes manifests**:
118
+ ```bash
119
+ kubectl apply -f infra/k8s/secure_deployment.yaml
120
+ ```
121
+
122
+ 2. **Check deployment status**:
123
+ ```bash
124
+ kubectl get pods -l app=hntai
125
+ ```
126
+
127
+ ## 📚 API Documentation
128
+
129
+ ### Core Endpoints
130
+
131
+ #### Health & Monitoring
132
+ - `GET /health/live` - Liveness probe
133
+ - `GET /health/ready` - Readiness probe
134
+ - `GET /metrics` - Prometheus metrics
135
+
136
+ #### Document Processing
137
+ - `POST /upload` - Upload and process documents
138
+ - `POST /transcribe` - Transcribe audio files
139
+ - `GET /get_updated_medical_data` - Retrieve processed data
140
+ - `PUT /update_medical_data` - Update medical data
141
+
142
+ #### AI Processing
143
+ - `POST /generate_patient_summary` - Generate comprehensive patient summaries
144
+ - `POST /api/generate_summary` - Generate text summaries
145
+ - `POST /api/patient_summary_openvino` - OpenVINO-optimized summaries
146
+ - `POST /extract_medical_data` - Extract structured medical data
147
+
148
+ ### Model Management
149
+ - `POST /api/load_model` - Load specific AI models
150
+ - `GET /api/model_info` - Get model information
151
+ - `POST /api/switch_model` - Switch between models
152
+
153
+ ## 🤖 AI Model Configuration
154
+
155
+ ### Supported Model Types
156
 
157
+ #### 1. Transformers Models
158
+ ```python
159
+ {
160
+ "model_name": "microsoft/Phi-3-mini-4k-instruct",
161
+ "model_type": "text-generation"
162
+ }
163
+ ```
164
+
165
+ #### 2. OpenVINO Models
166
+ ```python
167
+ {
168
+ "model_name": "OpenVINO/Phi-3-mini-4k-instruct-fp16-ov",
169
+ "model_type": "openvino"
170
+ }
171
+ ```
172
+
173
+ #### 3. GGUF Models
174
+ ```python
175
+ {
176
+ "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
177
+ "model_type": "gguf"
178
+ }
179
+ ```
180
+
181
+ ### Automatic Device Detection
182
+ The system automatically detects and uses:
183
+ - **GPU**: When CUDA is available
184
+ - **CPU**: Fallback when GPU is not available
185
+ - **Optimization**: Intel OpenVINO for production performance
186
+
187
+ ## 🔧 Configuration
188
+
189
+ ### Environment Variables
190
+
191
+ | Variable | Description | Default |
192
+ |----------|-------------|---------|
193
+ | `DATABASE_URL` | PostgreSQL connection string | Required |
194
+ | `REDIS_URL` | Redis connection string | Required |
195
+ | `SECRET_KEY` | Application secret key | Required |
196
+ | `JWT_SECRET_KEY` | JWT signing key | Required |
197
+ | `HF_HOME` | Hugging Face cache directory | `/tmp/huggingface` |
198
+ | `TORCH_HOME` | PyTorch cache directory | `/tmp/torch` |
199
+ | `WHISPER_CACHE` | Whisper model cache | `/tmp/whisper` |
200
+ | `HF_SPACES` | Hugging Face Spaces mode | `false` |
201
+ | `PRELOAD_GGUF` | Preload GGUF models | `false` |
202
+
203
+ ### Model Configuration
204
+
205
+ The system supports flexible model configuration through `model_config.py`:
206
+
207
+ ```python
208
+ # Default models for different tasks
209
+ DEFAULT_MODELS = {
210
+ "text-generation": {
211
+ "primary": "microsoft/Phi-3-mini-4k-instruct",
212
+ "fallback": "facebook/bart-base"
213
+ },
214
+ "openvino": {
215
+ "primary": "OpenVINO/Phi-3-mini-4k-instruct-fp16-ov",
216
+ "fallback": "microsoft/Phi-3-mini-4k-instruct"
217
+ },
218
+ "gguf": {
219
+ "primary": "microsoft/Phi-3-mini-4k-instruct-gguf",
220
+ "fallback": "microsoft/Phi-3-mini-4k-instruct-gguf"
221
+ }
222
+ }
223
+ ```
224
+
225
+ ## 🧪 Testing
226
+
227
+ ### Run Tests
228
+ ```bash
229
+ # Unit tests
230
+ python -m pytest tests/
231
+
232
+ # Smoke test (no model loading)
233
+ cd services/ai-service
234
+ python run_smoke_test.py
235
+
236
+ # Integration tests
237
+ python -m pytest tests/integration/
238
+ ```
239
+
240
+ ### Code Quality
241
  ```bash
242
+ # Format code
243
  black .
244
  isort .
245
+
246
+ # Lint code
247
  flake8 .
248
  mypy .
249
+
250
+ # Type checking
251
+ mypy services/ai-service/src/ai_med_extract/
252
+ ```
253
+
254
+ ## 📊 Monitoring
255
+
256
+ ### Health Checks
257
+ - **Liveness**: `GET /health/live` - Application is running
258
+ - **Readiness**: `GET /health/ready` - Application is ready to serve requests
259
+
260
+ ### Metrics
261
+ - **Prometheus**: `GET /metrics` - Application and model metrics
262
+ - **Custom Metrics**: Model inference time, success rates, error rates
263
+
264
+ ### Logging
265
+ - **Structured Logging**: JSON-formatted logs
266
+ - **Audit Trails**: PHI access and modification logs
267
+ - **Performance Logs**: Model loading and inference timing
268
+
269
+ ## 🔒 Security Features
270
+
271
+ ### HIPAA Compliance
272
+ - **PHI Scrubbing**: Automatic removal of protected health information
273
+ - **Audit Logging**: Comprehensive access and modification logs
274
+ - **Data Encryption**: Secure data handling and storage
275
+ - **Access Controls**: Role-based access to sensitive data
276
+
277
+ ### Container Security
278
+ - **Non-root Containers**: Security-hardened container images
279
+ - **Resource Limits**: CPU and memory limits
280
+ - **Network Policies**: Secure network communication
281
+ - **Secrets Management**: Secure handling of sensitive configuration
282
+
283
+ ## 🚀 Deployment Options
284
+
285
+ ### 1. Local Development
286
+ ```bash
287
+ python -m uvicorn services.ai-service.src.ai_med_extract.main:app --reload
288
  ```
289
 
290
+ ### 2. Docker
 
291
  ```bash
292
+ docker run -p 7860:7860 hntai:latest
293
  ```
294
 
295
+ ### 3. Kubernetes
296
+ ```bash
297
+ kubectl apply -f infra/k8s/secure_deployment.yaml
298
+ ```
299
+
300
+ ### 4. Hugging Face Spaces
301
+ ```bash
302
+ # Configure for HF Spaces
303
+ export HF_SPACES=true
304
+ python start_hf_spaces.py
305
+ ```
306
+
307
+ ## 📁 Project Structure
308
+
309
+ ```
310
+ HNTAI/
311
+ ├── services/
312
+ │ └── ai-service/
313
+ │ ├── src/ai_med_extract/
314
+ │ │ ├── agents/ # AI agents and processors
315
+ │ │ ├── api/ # FastAPI routes and management
316
+ │ │ ├── utils/ # Utilities and model management
317
+ │ │ ├── app.py # Main application
318
+ │ │ └── main.py # Application entry point
319
+ │ ├── docker-compose.yml # Docker services
320
+ │ └── Dockerfile # Container image
321
+ ├── infra/
322
+ │ └── k8s/ # Kubernetes manifests
323
+ ├── monitoring/
324
+ │ └── prometheus.yml # Monitoring configuration
325
+ ├── database/
326
+ │ └── postgresql/ # Database schemas
327
+ └── requirements.txt # Python dependencies
328
+ ```
329
+
330
+ ## 🤝 Contributing
331
+
332
+ 1. **Fork the repository**
333
+ 2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
334
+ 3. **Make your changes**
335
+ 4. **Run tests**: `python -m pytest`
336
+ 5. **Commit changes**: `git commit -m 'Add amazing feature'`
337
+ 6. **Push to branch**: `git push origin feature/amazing-feature`
338
+ 7. **Open a Pull Request**
339
+
340
+ ## 📄 License
341
+
342
+ This project is licensed under the MIT License - see the LICENSE file for details.
343
+
344
+ ## 🆘 Support
345
+
346
+ - **Documentation**: Check the `/docs` endpoint for interactive API documentation
347
+ - **Issues**: Report bugs and feature requests via GitHub Issues
348
+ - **Discussions**: Join community discussions for questions and support
349
+
350
+ ## 🔄 Changelog
351
+
352
+ ### Latest Updates
353
+ - ✅ **Fixed OpenVINO GPU/CPU auto-detection**
354
+ - ✅ **Improved model loading with fallback mechanisms**
355
+ - ✅ **Enhanced security and HIPAA compliance**
356
+ - ✅ **Added comprehensive monitoring and health checks**
357
+ - ✅ **Optimized for production deployment**
358
+
359
+ ---
360
+
361
+ **Built with ❤️ for the medical AI community**
README_HF_SPACES.md DELETED
@@ -1,72 +0,0 @@
1
- # Hugging Face Spaces Deployment
2
-
3
- This document explains the changes made to support Hugging Face Spaces deployment.
4
-
5
- ## Changes Made
6
-
7
- ### 1. Root-level Entry Point (`app.py`)
8
- - Created a root-level `app.py` file that serves as the entry point for Hugging Face Spaces
9
- - This file imports the FastAPI app from the `ai_med_extract` package
10
- - Includes multiple fallback strategies for robust error handling
11
- - Added comprehensive logging for debugging
12
-
13
- ### 2. Package Structure
14
- - Added `__init__.py` at the root level to make it a proper Python package
15
- - The main application code remains in `services/ai-service/src/ai_med_extract/`
16
-
17
- ### 3. Requirements File
18
- - Created a root-level `requirements.txt` with all necessary dependencies
19
- - This is used by Hugging Face Spaces for dependency installation
20
-
21
- ### 4. Environment Configuration
22
- - Set `FAST_MODE=true` and `PRELOAD_SMALL_MODELS=false` for Hugging Face Spaces
23
- - This ensures faster startup and reduced memory usage
24
-
25
- ### 5. Dockerfile Updates
26
- - Updated the Dockerfile to use `app:app` instead of `ai_med_extract.app:app`
27
- - Added cache clearing configuration in `.huggingface.yaml`
28
-
29
- ## How It Works
30
-
31
- 1. Hugging Face Spaces looks for `app.py` at the root level
32
- 2. The `app.py` file adds the source directory to the Python path
33
- 3. It tries multiple import strategies:
34
- - Primary: Import from `ai_med_extract.app`
35
- - Fallback: Direct import from nested structure
36
- - Emergency: Create minimal FastAPI app
37
- 4. The app is initialized with minimal preloading for faster startup
38
-
39
- ## Fallback Strategies
40
-
41
- The app includes three levels of fallback:
42
-
43
- 1. **Primary**: Normal import from `ai_med_extract.app`
44
- 2. **Fallback**: Direct import from nested structure if package import fails
45
- 3. **Emergency**: Minimal FastAPI app if all imports fail
46
-
47
- ## Testing
48
-
49
- To test the import structure locally:
50
-
51
- ```bash
52
- python -c "import app; print('App imported successfully:', app.app.title)"
53
- ```
54
-
55
- ## Deployment
56
-
57
- The app should now work correctly when deployed to Hugging Face Spaces. The key changes ensure that:
58
-
59
- - The module structure is properly recognized
60
- - Dependencies are correctly installed
61
- - The app starts with minimal resource usage
62
- - Multiple fallback strategies provide robust error handling
63
- - Comprehensive logging helps with debugging
64
-
65
- ## Troubleshooting
66
-
67
- If you still encounter issues:
68
-
69
- 1. **Check the logs** - The app now includes comprehensive logging
70
- 2. **Verify file structure** - Ensure all files are in the correct locations
71
- 3. **Clear cache** - The `.huggingface.yaml` includes cache clearing
72
- 4. **Check dependencies** - Ensure all requirements are properly specified
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SCAN_SUMMARY.md DELETED
@@ -1,294 +0,0 @@
1
- # Hugging Face Spaces API Internal Server Error Scan - Summary
2
-
3
- ## 🎯 Scan Complete
4
-
5
- **Date:** $(date)
6
- **Status:** ✅ All Critical Issues Resolved
7
- **Files Scanned:** 15+ files, 6000+ lines of code
8
- **Issues Found:** 18 (4 critical, 6 high, 8 medium/minor)
9
- **Issues Fixed:** 18 (100%)
10
-
11
- ---
12
-
13
- ## 📊 Executive Summary
14
-
15
- The codebase has been thoroughly scanned for issues that could cause internal server errors (HTTP 500) when deployed to Hugging Face Spaces. All critical and high-severity issues have been identified and resolved.
16
-
17
- **Main Problems Identified:**
18
- 1. ❌ Redis connections attempted at startup and module import time
19
- 2. ❌ Database connections attempted without proper fallbacks
20
- 3. ❌ File operations using read-only filesystem paths
21
- 4. ❌ Gradio app making localhost HTTP requests
22
- 5. ❌ Poor error handling for external API timeouts
23
-
24
- **Status After Fixes:**
25
- 1. ✅ Redis completely optional with graceful degradation
26
- 2. ✅ Database completely optional with proper fallbacks
27
- 3. ✅ All file operations use /tmp directory
28
- 4. ✅ Gradio app uses direct agent calls
29
- 5. ✅ Comprehensive error handling with user-friendly messages
30
-
31
- ---
32
-
33
- ## 📁 Documents Generated
34
-
35
- ### 1. `HF_SPACES_ISSUES_REPORT.md`
36
- - Comprehensive list of all 18 issues found
37
- - Detailed descriptions of each issue
38
- - Impact assessment and severity ratings
39
- - Recommendations for fixes
40
-
41
- ### 2. `HF_SPACES_FIXES_APPLIED.md`
42
- - Complete documentation of all fixes applied
43
- - Before/after code comparisons
44
- - Testing procedures for each fix
45
- - Environment variable configuration
46
- - Deployment checklist
47
-
48
- ### 3. `SCAN_SUMMARY.md` (this file)
49
- - High-level overview
50
- - Quick reference for key changes
51
- - Next steps for deployment
52
-
53
- ---
54
-
55
- ## 🔥 Critical Fixes Applied
56
-
57
- ### 1. Redis Connection Fix
58
- **Problem:** App tried to connect to Redis at startup, causing hangs or crashes.
59
-
60
- **Solution:**
61
- - Added HF_SPACES environment detection
62
- - Redis connections skipped entirely on HF Spaces
63
- - Module-level initialization wrapped in try-except
64
- - Graceful fallback when Redis unavailable
65
-
66
- **Files Changed:**
67
- - `services/ai-service/src/ai_med_extract/app.py`
68
- - `services/ai-service/src/ai_med_extract/api_endpoints.py`
69
- - `app.py`
70
-
71
- ### 2. Filesystem Path Fix
72
- **Problem:** App tried to write to `/app/uploads` and `/app/models` (read-only on HF Spaces).
73
-
74
- **Solution:**
75
- - Changed all default paths to `/tmp/uploads` and `/tmp/models`
76
- - Added error handling for directory creation failures
77
- - HF_SPACES-aware path resolution
78
-
79
- **Files Changed:**
80
- - `services/ai-service/src/config_settings.py`
81
- - `services/ai-service/src/ai_med_extract/utils/file_utils.py`
82
- - `services/ai-service/src/ai_med_extract/app.py`
83
-
84
- ### 3. Gradio Localhost Fix
85
- **Problem:** Gradio app made HTTP requests to localhost, which fails on HF Spaces.
86
-
87
- **Solution:**
88
- - Completely rewrote gradio_app.py
89
- - Functions now call agents directly
90
- - Proper async/await handling
91
- - Comprehensive error handling
92
-
93
- **Files Changed:**
94
- - `services/ai-service/src/ai_med_extract/gradio_app.py`
95
-
96
- ### 4. Database Connection Fix
97
- **Problem:** App tried to connect to PostgreSQL database.
98
-
99
- **Solution:**
100
- - Database connections skipped on HF Spaces
101
- - Empty DATABASE_URL prevents connection attempts
102
- - Audit logging gracefully disabled when DB unavailable
103
-
104
- **Files Changed:**
105
- - `services/ai-service/src/ai_med_extract/app.py`
106
- - `services/ai-service/src/config_settings.py`
107
-
108
- ### 5. Error Handling Improvements
109
- **Problem:** Generic 500 errors with no useful information.
110
-
111
- **Solution:**
112
- - Added specific exception handling for timeouts, connections, etc.
113
- - Error categorization (TIMEOUT, CONNECTION, EHR_API, MEMORY, GENERAL)
114
- - User-friendly error messages
115
- - Proper logging of error details
116
-
117
- **Files Changed:**
118
- - `services/ai-service/src/ai_med_extract/api/routes_fastapi.py`
119
-
120
- ---
121
-
122
- ## 🎨 Code Changes Summary
123
-
124
- ### Lines Modified: ~500+
125
- ### Files Modified: 7
126
-
127
- 1. **config_settings.py** - Paths and defaults
128
- 2. **app.py** (root) - HF_SPACES detection
129
- 3. **app.py** (ai_med_extract) - Redis/DB handling
130
- 4. **api_endpoints.py** - Redis initialization
131
- 5. **gradio_app.py** - Complete rewrite
132
- 6. **file_utils.py** - Path resolution
133
- 7. **routes_fastapi.py** - Error handling
134
-
135
- ---
136
-
137
- ## 🚀 Deployment Instructions
138
-
139
- ### Step 1: Set Environment Variables
140
-
141
- In your Hugging Face Space settings, add:
142
-
143
- ```bash
144
- HF_SPACES=true
145
- FAST_MODE=true
146
- PRELOAD_SMALL_MODELS=false
147
- REDIS_URL=
148
- DATABASE_URL=
149
- UPLOAD_PATH=/tmp/uploads
150
- MODEL_CACHE_DIR=/tmp/models
151
- ```
152
-
153
- ### Step 2: Push Code
154
-
155
- ```bash
156
- git add .
157
- git commit -m "Fix HF Spaces compatibility issues"
158
- git push origin main
159
- ```
160
-
161
- ### Step 3: Verify Deployment
162
-
163
- 1. Check startup logs for "Detected Hugging Face Spaces environment"
164
- 2. Test health endpoints: `/health`, `/ready`, `/live`
165
- 3. Test main API endpoints
166
- 4. Verify no 500 errors in logs
167
-
168
- ### Step 4: Monitor
169
-
170
- - Watch memory usage
171
- - Check error rates
172
- - Verify model loading times
173
-
174
- ---
175
-
176
- ## ✅ Testing Checklist
177
-
178
- - [ ] App starts without Redis
179
- - [ ] App starts without Database
180
- - [ ] All file operations use /tmp
181
- - [ ] Health endpoints return 200
182
- - [ ] API endpoints return proper errors (not 500)
183
- - [ ] Gradio interface works
184
- - [ ] External API timeouts handled gracefully
185
- - [ ] Memory usage stays within limits
186
- - [ ] Model loading is lazy and efficient
187
- - [ ] Error messages are user-friendly
188
-
189
- ---
190
-
191
- ## 📈 Before vs After
192
-
193
- ### Before:
194
- - ❌ App crashes on startup without Redis
195
- - ❌ App crashes on startup without Database
196
- - ❌ File operations fail due to read-only filesystem
197
- - ❌ Gradio interface doesn't work
198
- - ❌ Generic 500 errors with no information
199
- - ❌ External API timeouts crash the app
200
-
201
- ### After:
202
- - ✅ App starts successfully without Redis
203
- - ✅ App starts successfully without Database
204
- - ✅ All file operations work in /tmp
205
- - ✅ Gradio interface works perfectly
206
- - ✅ User-friendly error messages with categories
207
- - ✅ External API timeouts handled gracefully
208
- - ✅ Proper fallbacks for all critical paths
209
- - ✅ Comprehensive logging for debugging
210
-
211
- ---
212
-
213
- ## 🔍 What Was Not Changed
214
-
215
- The following were intentionally left unchanged to maintain functionality:
216
-
217
- 1. **Core business logic** - All medical AI functionality preserved
218
- 2. **API interface contracts** - Endpoints maintain same request/response format
219
- 3. **Model functionality** - Model loading and inference unchanged
220
- 4. **Security features** - All security middleware preserved
221
- 5. **Backward compatibility** - Works on non-HF Spaces environments
222
-
223
- ---
224
-
225
- ## 🎓 Key Learnings
226
-
227
- ### HF Spaces Constraints:
228
- 1. Read-only filesystem except /tmp
229
- 2. No Redis available by default
230
- 3. No PostgreSQL available by default
231
- 4. Localhost HTTP requests don't work
232
- 5. Memory limits on free tier
233
-
234
- ### Best Practices Applied:
235
- 1. Environment detection (HF_SPACES flag)
236
- 2. Graceful degradation
237
- 3. Comprehensive error handling
238
- 4. Proper async/await patterns
239
- 5. Lazy loading for resources
240
- 6. User-friendly error messages
241
-
242
- ---
243
-
244
- ## 📞 Next Actions
245
-
246
- ### Immediate:
247
- 1. ✅ Review the fixes applied
248
- 2. ✅ Test locally with HF_SPACES=true
249
- 3. ⏭️ Deploy to Hugging Face Spaces
250
- 4. ⏭️ Monitor for issues
251
-
252
- ### Short-term:
253
- 1. Add integration tests for HF Spaces mode
254
- 2. Document API behavior when services unavailable
255
- 3. Add monitoring/alerting
256
- 4. Optimize memory usage further
257
-
258
- ### Long-term:
259
- 1. Consider adding Redis support (if HF Spaces adds it)
260
- 2. Implement persistent storage alternatives
261
- 3. Add rate limiting without Redis
262
- 4. Improve caching strategies
263
-
264
- ---
265
-
266
- ## 📚 Reference Documents
267
-
268
- 1. **HF_SPACES_ISSUES_REPORT.md** - Full issue analysis
269
- 2. **HF_SPACES_FIXES_APPLIED.md** - Complete fix documentation
270
- 3. **README_HF_SPACES.md** - Deployment guide
271
- 4. **requirements.txt** - Dependencies (already HF Spaces compatible)
272
-
273
- ---
274
-
275
- ## 🎉 Conclusion
276
-
277
- The application is now **production-ready** for Hugging Face Spaces deployment. All critical issues have been resolved, and the app will:
278
-
279
- ✅ Start successfully without external dependencies
280
- ✅ Handle errors gracefully
281
- ✅ Provide useful error messages to users
282
- ✅ Use only writable filesystem locations
283
- ✅ Work within HF Spaces memory constraints
284
- ✅ Maintain backward compatibility with other deployment environments
285
-
286
- **Risk Level:** Low ✅
287
- **Deployment Confidence:** High 🚀
288
- **Estimated Success Rate:** 95%+
289
-
290
- ---
291
-
292
- *Scan completed and documented on $(date)*
293
- *All critical and high-severity issues have been resolved*
294
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
STREAMING_FIX_SUMMARY.md DELETED
@@ -1,175 +0,0 @@
1
- # Streaming API Fix Summary
2
-
3
- ## Issue Description
4
- The `generate_patient_summary` API with `stream=true` was stopping after sending heartbeat events, with no completion or error messages being streamed to the client.
5
-
6
- ## Root Cause Analysis
7
- 1. **Improper async handling**: The `process_patient_summary_background` function was using `asyncio.run()` inside a thread, which can cause event loop conflicts.
8
- 2. **Insufficient error handling**: Errors during GGUF model generation were not being properly caught and reported through the streaming interface.
9
- 3. **No timeout protection**: The GGUF model generation could hang indefinitely without any timeout mechanism.
10
- 4. **Limited progress feedback**: The streaming interface wasn't providing detailed progress updates during the generation process.
11
-
12
- ## Fixes Applied
13
-
14
- ### 1. Improved Background Processing (`routes_fastapi.py`)
15
- - **Before**: Used `asyncio.run()` which can cause event loop conflicts
16
- - **After**: Created a new event loop for the thread using `asyncio.new_event_loop()`
17
- - **Added**: Comprehensive error handling with stack traces
18
- - **Added**: Proper cleanup of event loops
19
-
20
- ```python
21
- def process_patient_summary_background(data, job_id):
22
- """Background task for patient summary generation"""
23
- print(f"Background task started for job_id: {job_id}")
24
- try:
25
- # Create a new event loop for this thread
26
- loop = asyncio.new_event_loop()
27
- asyncio.set_event_loop(loop)
28
-
29
- try:
30
- result = loop.run_until_complete(async_patient_summary(data, job_id))
31
- update_job(job_id, 'completed', progress=100, data=result)
32
- print(f"Background task completed successfully for job_id: {job_id}")
33
- except Exception as e:
34
- print(f"Async task error for job_id {job_id}: {str(e)}")
35
- import traceback
36
- traceback.print_exc()
37
- update_job(job_id, 'error', error=str(e))
38
- finally:
39
- loop.close()
40
- except Exception as e:
41
- print(f"Background task error for job_id {job_id}: {str(e)}")
42
- import traceback
43
- traceback.print_exc()
44
- update_job(job_id, 'error', error=str(e))
45
- ```
46
-
47
- ### 2. Enhanced GGUF Generation Error Handling
48
- - **Added**: Timeout protection (5 minutes) for GGUF model generation
49
- - **Added**: Specific error handling for GGUF generation failures
50
- - **Added**: Progress updates during generation process
51
-
52
- ```python
53
- try:
54
- # Add timeout to prevent hanging
55
- if job_id:
56
- update_job(job_id, 'processing', progress=75, data={'message': 'Running GGUF model inference...'})
57
-
58
- raw_summary = await asyncio.wait_for(
59
- asyncio.to_thread(pipeline.generate, full_prompt, max_tokens=1500, temperature=0.1, top_p=0.5),
60
- timeout=300 # 5 minutes timeout
61
- )
62
- print(f"GGUF raw summary length: {len(raw_summary)} chars")
63
-
64
- if job_id:
65
- update_job(job_id, 'processing', progress=85, data={'message': 'Processing generated summary...'})
66
- except asyncio.TimeoutError:
67
- error_msg = "GGUF generation timed out after 5 minutes"
68
- print(error_msg)
69
- if job_id:
70
- update_job(job_id, 'error', error=error_msg)
71
- raise Exception(error_msg)
72
- except Exception as e:
73
- print(f"GGUF generation failed: {str(e)}")
74
- if job_id:
75
- update_job(job_id, 'error', error=f"GGUF generation failed: {str(e)}")
76
- raise Exception(f"GGUF model generation failed: {str(e)}")
77
- ```
78
-
79
- ### 3. Improved SSE Generator
80
- - **Added**: Overall timeout protection (10 minutes max wait time)
81
- - **Added**: Elapsed time tracking in events
82
- - **Added**: Better error reporting with status information
83
- - **Added**: Conditional heartbeat sending (only for active processing states)
84
-
85
- ```python
86
- def sse_generator(job_id):
87
- import json
88
- start_time = time.time()
89
- max_wait_time = 600 # 10 minutes max wait time
90
-
91
- while True:
92
- current_time = time.time()
93
- elapsed_time = current_time - start_time
94
-
95
- with job_lock:
96
- if job_id not in jobs:
97
- yield f"data: {json.dumps({'type': 'error', 'error': 'Job not found'})}\n\n"
98
- break
99
-
100
- job = jobs[job_id]
101
- status = job.get('status', 'unknown')
102
- progress = job.get('progress', 0)
103
- data = job.get('data', {})
104
- error = job.get('error')
105
-
106
- # Check for timeout
107
- if elapsed_time > max_wait_time:
108
- yield f"data: {json.dumps({'type': 'error', 'error': 'Job timed out after 10 minutes'})}\n\n"
109
- cleanup_job(job_id)
110
- break
111
-
112
- if error:
113
- yield f"data: {json.dumps({'type': 'error', 'error': error, 'status': status})}\n\n"
114
- threading.Timer(5.0, lambda: cleanup_job(job_id)).start()
115
- break
116
-
117
- event_data = {
118
- 'type': 'progress',
119
- 'status': status,
120
- 'progress': progress,
121
- 'data': data,
122
- 'elapsed_time': round(elapsed_time, 1)
123
- }
124
- yield f"data: {json.dumps(event_data)}\n\n"
125
-
126
- if status == 'completed':
127
- yield f"data: {json.dumps({'type': 'complete', 'data': data})}\n\n"
128
- threading.Timer(5.0, lambda: cleanup_job(job_id)).start()
129
- break
130
-
131
- # Only send heartbeat if we're still processing
132
- if status in ['queued', 'processing', 'started', 'ehr_success', 'processing_data']:
133
- yield f"data: {json.dumps({'type': 'heartbeat', 'status': status, 'elapsed_time': round(elapsed_time, 1)})}\n\n"
134
-
135
- time.sleep(1)
136
- yield "data: [DONE]\n\n"
137
- ```
138
-
139
- ### 4. Enhanced Progress Updates
140
- - **Added**: More granular progress updates during GGUF generation
141
- - **Added**: Final progress update before completion
142
- - **Added**: Better status messages for each processing stage
143
-
144
- ## Expected Behavior After Fix
145
-
146
- ### Successful Stream Flow:
147
- 1. `{"type": "progress", "status": "queued", "progress": 0, "data": {"job_id": "...", "message": "Job queued ..."}}`
148
- 2. `{"type": "heartbeat", "status": "queued"}`
149
- 3. `{"type": "progress", "status": "started", "progress": 5, "data": {"message": "Task started"}}`
150
- 4. `{"type": "progress", "status": "ehr_success", "progress": 20, "data": {"message": "EHR data fetched successfully"}}`
151
- 5. `{"type": "progress", "status": "processing_data", "progress": 30, "data": {"message": "Processing patient data"}}`
152
- 6. `{"type": "progress", "status": "processing", "progress": 60, "data": {"message": "Generating summary with gguf model..."}}`
153
- 7. `{"type": "progress", "status": "processing", "progress": 70, "data": {"message": "Generating summary with GGUF model..."}}`
154
- 8. `{"type": "progress", "status": "processing", "progress": 75, "data": {"message": "Running GGUF model inference..."}}`
155
- 9. `{"type": "progress", "status": "processing", "progress": 85, "data": {"message": "Processing generated summary..."}}`
156
- 10. `{"type": "progress", "status": "processing", "progress": 95, "data": {"message": "Finalizing summary..."}}`
157
- 11. `{"type": "complete", "data": {"summary": "...", "baseline": "...", "delta": "...", "timing": {...}}}`
158
-
159
- ### Error Stream Flow:
160
- 1. Progress events as above until error occurs
161
- 2. `{"type": "error", "error": "GGUF generation failed: [specific error]", "status": "processing"}`
162
-
163
- ## Testing
164
- A test script `test_streaming_fix.py` has been created to verify the streaming functionality with the provided payload.
165
-
166
- ## Files Modified
167
- - `services/ai-service/src/ai_med_extract/api/routes_fastapi.py` - Main fixes applied
168
- - `test_streaming_fix.py` - Test script for verification
169
- - `STREAMING_FIX_SUMMARY.md` - This documentation
170
-
171
- ## Deployment Notes
172
- - The fixes are backward compatible
173
- - No database schema changes required
174
- - No additional dependencies required
175
- - The fixes improve error handling and provide better user feedback
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TODO.md DELETED
@@ -1,12 +0,0 @@
1
- # TODO: Integrate Sinkhorn-Normalized Quantization
2
-
3
- ## Steps to Complete
4
- - [x] Create quantization_utils.py with Sinkhorn-Normalized Quantization implementation
5
- - [x] Modify model_manager.py to support optional quantization during model loading
6
- - [x] Add configuration options for quantization in model_config.py
7
- - [x] Test quantization on a sample model without affecting existing workflows
8
- - [x] Verify that existing model loading and inference still work
9
- - [ ] Update documentation if needed
10
-
11
- ## Current Status
12
- Basic tests completed successfully. Quantization is disabled by default, so existing workflows are unaffected. API endpoints can be tested by running the FastAPI app.
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,7 +1,7 @@
1
  # Core AI/ML dependencies
2
- torch==2.3.0
3
- torchvision==0.18.0
4
- torchaudio==2.3.0
5
  transformers>=4.42.0
6
  tokenizers==0.21.4
7
  accelerate>=0.30.0
@@ -49,8 +49,8 @@ scipy==1.11.4
49
  joblib==1.5.1
50
 
51
  # Model Optimization & Quantization
52
- optimum==1.27.0
53
- optimum-intel==1.25.2
54
  onnxruntime==1.16.3
55
  nncf==2.17.0
56
  bitsandbytes==0.47.0
@@ -58,10 +58,10 @@ ctransformers==0.2.27
58
  llama_cpp_python==0.2.72
59
 
60
  # Intel Optimization
61
- openvino==2025.2.0
62
- openvino-tokenizers==2025.2.0.1
63
- intel-openmp==2021.4.0
64
- mkl==2021.4.0
65
 
66
  # Utilities & Helpers
67
  aiofiles==23.2.1
@@ -82,7 +82,13 @@ websockets==11.0.3
82
  # Database & Caching
83
  redis==6.4.0
84
  asyncpg==0.30.0
 
85
 
86
  # Development & Monitoring (minimal)
87
  rich==13.9.4
88
- typer==0.9.4
 
 
 
 
 
 
1
  # Core AI/ML dependencies
2
+ torch>=2.3.0
3
+ torchvision>=0.18.0
4
+ torchaudio>=2.3.0
5
  transformers>=4.42.0
6
  tokenizers==0.21.4
7
  accelerate>=0.30.0
 
49
  joblib==1.5.1
50
 
51
  # Model Optimization & Quantization
52
+ optimum>=1.27.0
53
+ optimum-intel>=1.25.2
54
  onnxruntime==1.16.3
55
  nncf==2.17.0
56
  bitsandbytes==0.47.0
 
58
  llama_cpp_python==0.2.72
59
 
60
  # Intel Optimization
61
+ openvino>=2024.4.0
62
+ openvino-tokenizers>=2024.4.0
63
+ intel-openmp>=2024.0.0
64
+ mkl>=2024.0.0
65
 
66
  # Utilities & Helpers
67
  aiofiles==23.2.1
 
82
  # Database & Caching
83
  redis==6.4.0
84
  asyncpg==0.30.0
85
+ sqlalchemy>=2.0.0
86
 
87
  # Development & Monitoring (minimal)
88
  rich==13.9.4
89
+ typer==0.9.4
90
+
91
+ # Additional dependencies for medical AI platform
92
+ python-multipart>=0.0.6
93
+ python-jose[cryptography]>=3.3.0
94
+ passlib[bcrypt]>=1.7.4
services/ai-service/src/ai_med_extract/__pycache__/app.cpython-311.pyc CHANGED
Binary files a/services/ai-service/src/ai_med_extract/__pycache__/app.cpython-311.pyc and b/services/ai-service/src/ai_med_extract/__pycache__/app.cpython-311.pyc differ
 
services/ai-service/src/ai_med_extract/api/__pycache__/routes_fastapi.cpython-311.pyc CHANGED
Binary files a/services/ai-service/src/ai_med_extract/api/__pycache__/routes_fastapi.cpython-311.pyc and b/services/ai-service/src/ai_med_extract/api/__pycache__/routes_fastapi.cpython-311.pyc differ
 
services/ai-service/src/ai_med_extract/api/routes_fastapi.py CHANGED
@@ -407,9 +407,9 @@ async def async_patient_summary(data, job_id=None):
407
  except Exception as e:
408
  print(f"Cache read failed: {e}")
409
 
410
- # Set timeouts based on mode
411
- EHR_TIMEOUT = 20 if timeout_mode == "fast" else 20
412
- GEN_TIMEOUT = 20 if timeout_mode == "fast" else 60
413
 
414
  try:
415
  # Step 1: Fetch EHR data
@@ -575,26 +575,10 @@ async def async_patient_summary(data, job_id=None):
575
  print(f"🔄 Loading new GGUF pipeline for {cache_key}")
576
  pipeline = await asyncio.to_thread(get_cached_gguf_pipeline, repo_id, filename)
577
 
 
 
578
  full_prompt = f"""<|system|>
579
- You are a clinical AI assistant. Generate a COMPLETE patient summary with EXACTLY 4 sections in markdown format. Ensure ALL sections are fully generated and detailed with bullet points. Do not skip or abbreviate any section.
580
- do not halucinate or invent any information. Base ONLY on provided data.
581
- DATA:
582
- visits: {all_visits}
583
-
584
- REQUIRED OUTPUT FORMAT (must include all, each with at least 3-5 bullet points):
585
- ## Clinical Assessment
586
- - Bullet points analyzing current state, diagnoses, vitals, labs, medications.
587
-
588
- ## Key Trends & Changes
589
- - Bullet points on trends, deltas, new developments, changes in vitals/labs over time.
590
-
591
- ## Plan & Suggested Actions
592
- - Bullet points with recommended next steps, monitoring, treatments, follow-ups.
593
-
594
- ## Direct Guidance for Physician
595
- - Bullet points with key clinical insights, warnings, considerations, potential risks.
596
-
597
- Use bullet points with "- ". Base ONLY on provided data. No preamble, explanations, or extra text. Start immediately with "## Clinical Assessment" and ensure all 4 sections are complete and detailed:</s>
598
  <|user|>
599
  Generate the full 4-section summary based on the data.</s>
600
  <|assistant|>"""
@@ -722,8 +706,8 @@ Generate the full 4-section summary based on the data.</s>
722
  if not pipeline:
723
  raise ValueError("Pipeline not available")
724
 
725
- from ..utils.openvino_summarizer_utils import build_main_prompt
726
- prompt = build_main_prompt(baseline, delta_text)
727
  inputs = pipeline.tokenizer([prompt], return_tensors="pt")
728
  outputs = await asyncio.to_thread(pipeline.model.generate, **inputs, max_new_tokens=800, do_sample=False, pad_token_id=pipeline.tokenizer.pad_token_id or pipeline.tokenizer.eos_token_id or 0)
729
  text = pipeline.tokenizer.decode(outputs[0], skip_special_tokens=True)
@@ -884,33 +868,78 @@ Generic summary — verify details clinically.
884
  return result
885
 
886
  else:
887
- print(f"Unsupported model_type: {model_type}")
888
- generic_fallback = f"""
889
- ## Clinical Assessment
890
- - Unsupported model type: {model_type}
891
-
892
- ## Key Trends & Changes
893
- - Please use model_type: gguf, text-generation, causal-openvino, summarization, or seq2seq
894
-
895
- ## Plan & Suggested Actions
896
- - Update API request with supported model type.
897
-
898
- ## Direct Guidance for Physician
899
- - System configuration error — contact administrator.
900
- """
901
- total_time = time.perf_counter() - start_time
902
- result = {
903
- "summary": ensure_four_sections(generic_fallback),
904
- "baseline": baseline,
905
- "delta": delta_text,
906
- "warning": f"Unsupported model_type: {model_type}",
907
- "supported_types": ["gguf", "text-generation", "causal-openvino", "summarization", "seq2seq"],
908
- "timing": {"total": round(total_time, 1)},
909
- "timeout_mode_used": timeout_mode
910
- }
911
  if job_id:
912
- update_job(job_id, 'completed', progress=100, data=result)
913
- return result
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
914
 
915
  # Step 5: Finalize (safety net)
916
  if job_id:
@@ -1937,13 +1966,13 @@ def register_routes(app, agents):
1937
  patient_info = f"Patient: {patient_name} (ID: {patient_id}, Age: {age}, Gender: {gender})\nPast Medical History: {past_medical_history}\nSocial History: {social_history}\n"
1938
 
1939
  # Use utils for processing
1940
- from ..utils.openvino_summarizer_utils import parse_ehr_chartsummarydtl, compute_deltas, visits_sorted, build_compact_baseline, delta_to_text, build_main_prompt
1941
  visits = parse_ehr_chartsummarydtl(chartsummarydtl)
1942
  delta = compute_deltas([], visits)
1943
  all_visits = visits_sorted(visits)
1944
  baseline = build_compact_baseline(all_visits)
1945
  delta_text = delta_to_text(delta)
1946
- prompt = build_main_prompt(baseline, delta_text, patient_info)
1947
 
1948
  # Model selection
1949
  from ..utils import model_config as _mc
 
407
  except Exception as e:
408
  print(f"Cache read failed: {e}")
409
 
410
+ # Set timeouts based on mode - Fixed inconsistent timeout values
411
+ EHR_TIMEOUT = 10 if timeout_mode == "fast" else 30
412
+ GEN_TIMEOUT = 30 if timeout_mode == "fast" else 120
413
 
414
  try:
415
  # Step 1: Fetch EHR data
 
575
  print(f"🔄 Loading new GGUF pipeline for {cache_key}")
576
  pipeline = await asyncio.to_thread(get_cached_gguf_pipeline, repo_id, filename)
577
 
578
+ from ..utils.openvino_summarizer_utils import build_full_prompt
579
+ base_prompt = build_full_prompt(all_visits)
580
  full_prompt = f"""<|system|>
581
+ {base_prompt}</s>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
582
  <|user|>
583
  Generate the full 4-section summary based on the data.</s>
584
  <|assistant|>"""
 
706
  if not pipeline:
707
  raise ValueError("Pipeline not available")
708
 
709
+ from ..utils.openvino_summarizer_utils import build_full_prompt
710
+ prompt = build_full_prompt(all_visits)
711
  inputs = pipeline.tokenizer([prompt], return_tensors="pt")
712
  outputs = await asyncio.to_thread(pipeline.model.generate, **inputs, max_new_tokens=800, do_sample=False, pad_token_id=pipeline.tokenizer.pad_token_id or pipeline.tokenizer.eos_token_id or 0)
713
  text = pipeline.tokenizer.decode(outputs[0], skip_special_tokens=True)
 
868
  return result
869
 
870
  else:
871
+ # Universal model handling - try to use any model type
872
+ print(f"Universal model handling for type: {model_type}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
873
  if job_id:
874
+ update_job(job_id, 'processing', progress=70, data={'message': f'Loading universal model: {model_name} ({model_type})'})
875
+
876
+ try:
877
+ # Use the unified model manager for any model type
878
+ from ..utils.model_manager import model_manager as _unified_manager
879
+ loader_obj = _unified_manager.get_model_loader(
880
+ model_name=model_name,
881
+ model_type=model_type,
882
+ quantize=True
883
+ )
884
+ pipeline = loader_obj.load()
885
+
886
+ if job_id:
887
+ update_job(job_id, 'processing', progress=80, data={'message': f'Generating summary with {model_type} model...'})
888
+
889
+ # Generate summary using the universal pipeline
890
+ if hasattr(pipeline, 'generate'):
891
+ # For GGUF and custom models
892
+ raw_summary = await asyncio.wait_for(
893
+ asyncio.to_thread(pipeline.generate, prompt, max_tokens=1500, temperature=0.1, top_p=0.5),
894
+ timeout=300 # 5 minutes timeout
895
+ )
896
+ elif hasattr(pipeline, '__call__'):
897
+ # For transformers pipelines
898
+ result = await asyncio.to_thread(pipeline, prompt, max_length=400, min_length=100, do_sample=False)
899
+ if isinstance(result, list) and result and "summary_text" in result[0]:
900
+ raw_summary = result[0]["summary_text"]
901
+ else:
902
+ raw_summary = str(result)
903
+ else:
904
+ raise ValueError("Pipeline does not support generation")
905
+
906
+ # Process the summary
907
+ markdown_summary = summary_to_markdown(raw_summary)
908
+ markdown_summary = ensure_four_sections(markdown_summary)
909
+
910
+ total_time = time.perf_counter() - start_time
911
+ print(f"[✅ SUCCESS] Universal {model_type} | TIMEOUT_MODE: {timeout_mode} | TOTAL: {total_time:.1f}s")
912
+
913
+ result = {
914
+ "summary": markdown_summary,
915
+ "baseline": baseline,
916
+ "delta": delta_text,
917
+ "prompt": prompt,
918
+ "timing": {"total": round(total_time, 1)},
919
+ "model_used": f"{model_name} ({model_type})",
920
+ "timeout_mode_used": timeout_mode
921
+ }
922
+ if job_id:
923
+ update_job(job_id, 'completed', progress=100, data=result)
924
+ return result
925
+
926
+ except Exception as e:
927
+ print(f"Universal model handling failed: {e}")
928
+ # Fallback to rule-based generation
929
+ markdown_summary = generate_rule_based_summary(baseline, delta_text, all_visits, patientid)
930
+ total_time = time.perf_counter() - start_time
931
+ result = {
932
+ "summary": markdown_summary,
933
+ "baseline": baseline,
934
+ "delta": delta_text,
935
+ "warning": f"Model {model_name} ({model_type}) failed, used rule-based fallback: {str(e)}",
936
+ "timing": {"total": round(total_time, 1)},
937
+ "model_used": f"{model_name} ({model_type}) - fallback",
938
+ "timeout_mode_used": timeout_mode
939
+ }
940
+ if job_id:
941
+ update_job(job_id, 'completed', progress=100, data=result)
942
+ return result
943
 
944
  # Step 5: Finalize (safety net)
945
  if job_id:
 
1966
  patient_info = f"Patient: {patient_name} (ID: {patient_id}, Age: {age}, Gender: {gender})\nPast Medical History: {past_medical_history}\nSocial History: {social_history}\n"
1967
 
1968
  # Use utils for processing
1969
+ from ..utils.openvino_summarizer_utils import parse_ehr_chartsummarydtl, compute_deltas, visits_sorted, build_compact_baseline, delta_to_text, build_full_prompt
1970
  visits = parse_ehr_chartsummarydtl(chartsummarydtl)
1971
  delta = compute_deltas([], visits)
1972
  all_visits = visits_sorted(visits)
1973
  baseline = build_compact_baseline(all_visits)
1974
  delta_text = delta_to_text(delta)
1975
+ prompt = build_full_prompt(all_visits, patient_info)
1976
 
1977
  # Model selection
1978
  from ..utils import model_config as _mc
services/ai-service/src/ai_med_extract/utils/__pycache__/model_config.cpython-311.pyc CHANGED
Binary files a/services/ai-service/src/ai_med_extract/utils/__pycache__/model_config.cpython-311.pyc and b/services/ai-service/src/ai_med_extract/utils/__pycache__/model_config.cpython-311.pyc differ
 
services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_gguf.cpython-311.pyc CHANGED
Binary files a/services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_gguf.cpython-311.pyc and b/services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_gguf.cpython-311.pyc differ
 
services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_spaces.cpython-311.pyc CHANGED
Binary files a/services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_spaces.cpython-311.pyc and b/services/ai-service/src/ai_med_extract/utils/__pycache__/model_loader_spaces.cpython-311.pyc differ
 
services/ai-service/src/ai_med_extract/utils/__pycache__/model_manager.cpython-311.pyc CHANGED
Binary files a/services/ai-service/src/ai_med_extract/utils/__pycache__/model_manager.cpython-311.pyc and b/services/ai-service/src/ai_med_extract/utils/__pycache__/model_manager.cpython-311.pyc differ
 
services/ai-service/src/ai_med_extract/utils/__pycache__/openvino_summarizer_utils.cpython-311.pyc CHANGED
Binary files a/services/ai-service/src/ai_med_extract/utils/__pycache__/openvino_summarizer_utils.cpython-311.pyc and b/services/ai-service/src/ai_med_extract/utils/__pycache__/openvino_summarizer_utils.cpython-311.pyc differ
 
services/ai-service/src/ai_med_extract/utils/model_config.py CHANGED
@@ -47,7 +47,36 @@ MODEL_TYPE_MAPPINGS = {
47
  "summarization": "summarization",
48
  "ner": "ner",
49
  "question-answering": "text-generation",
50
- "translation": "text-generation"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  }
52
 
53
  # Memory-optimized models for Hugging Face Spaces
@@ -128,10 +157,71 @@ def detect_model_type(model_name: str) -> str:
128
  # Check file extensions
129
  if model_name.endswith('.gguf'):
130
  return "gguf"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
 
132
  # Default to text-generation for unknown types
133
  return "text-generation"
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  def validate_model_config(model_name: str, model_type: str) -> dict:
136
  """Validate model configuration and return validation result"""
137
  result = {
@@ -141,11 +231,12 @@ def validate_model_config(model_name: str, model_type: str) -> dict:
141
  "recommendations": []
142
  }
143
 
144
- # Check if model type is supported
145
  if model_type not in MODEL_VALIDATION_RULES:
146
- result["valid"] = False
147
- result["errors"].append(f"Unsupported model type: {model_type}")
148
- return result
 
149
 
150
  # Check model name format
151
  if model_type == "gguf":
@@ -153,11 +244,15 @@ def validate_model_config(model_name: str, model_type: str) -> dict:
153
  result["warnings"].append("GGUF model should have .gguf extension or be in repo/filename format")
154
 
155
  # Check for memory optimization recommendations
156
- if model_type in ["text-generation", "summarization"]:
157
- if "large" in model_name.lower() or "xl" in model_name.lower():
158
  result["warnings"].append("Large models may cause memory issues on limited resources")
159
  result["recommendations"].append("Consider using a smaller model for better performance")
160
 
 
 
 
 
161
  return result
162
 
163
  def get_model_info(model_name: str, model_type: str) -> dict:
 
47
  "summarization": "summarization",
48
  "ner": "ner",
49
  "question-answering": "text-generation",
50
+ "translation": "text-generation",
51
+ "causal": "text-generation",
52
+ "causal-lm": "text-generation",
53
+ "gpt": "text-generation",
54
+ "llama": "text-generation",
55
+ "mistral": "text-generation",
56
+ "phi": "text-generation",
57
+ "gemma": "text-generation",
58
+ "qwen": "text-generation",
59
+ "chat": "text-generation",
60
+ "instruct": "text-generation",
61
+ "conversational": "text-generation",
62
+ "dialogue": "text-generation",
63
+ "seq2seq": "summarization",
64
+ "t5": "summarization",
65
+ "bart": "summarization",
66
+ "pegasus": "summarization",
67
+ "led": "summarization",
68
+ "encoder-decoder": "summarization",
69
+ "bert": "ner",
70
+ "roberta": "ner",
71
+ "xlm": "ner",
72
+ "deberta": "ner",
73
+ "electra": "ner",
74
+ "distilbert": "ner",
75
+ "albert": "ner",
76
+ "medical": "text-generation",
77
+ "clinical": "text-generation",
78
+ "healthcare": "text-generation",
79
+ "biomedical": "text-generation"
80
  }
81
 
82
  # Memory-optimized models for Hugging Face Spaces
 
157
  # Check file extensions
158
  if model_name.endswith('.gguf'):
159
  return "gguf"
160
+ if model_name.endswith('.onnx'):
161
+ return "openvino"
162
+
163
+ # Try to detect from HuggingFace model info (if available)
164
+ try:
165
+ from huggingface_hub import model_info
166
+ info = model_info(model_name)
167
+ if hasattr(info, 'pipeline_tag') and info.pipeline_tag:
168
+ pipeline_tag = info.pipeline_tag.lower()
169
+ # Map HuggingFace pipeline tags to our types
170
+ if pipeline_tag in ['text-generation', 'text2text-generation']:
171
+ return "text-generation"
172
+ elif pipeline_tag in ['summarization', 'text-summarization']:
173
+ return "summarization"
174
+ elif pipeline_tag in ['ner', 'token-classification']:
175
+ return "ner"
176
+ elif pipeline_tag in ['conversational', 'chat']:
177
+ return "text-generation"
178
+ else:
179
+ # For unknown pipeline tags, try to infer from model name
180
+ return detect_model_type_from_name(model_name)
181
+ except Exception:
182
+ # If HuggingFace detection fails, fall back to name-based detection
183
+ pass
184
 
185
  # Default to text-generation for unknown types
186
  return "text-generation"
187
 
188
+ def detect_model_type_from_name(model_name: str) -> str:
189
+ """Detect model type from model name patterns"""
190
+ model_name_lower = model_name.lower()
191
+
192
+ # Check for specific model families
193
+ if any(family in model_name_lower for family in ['gpt', 'gpt2', 'gpt3', 'gpt4']):
194
+ return "text-generation"
195
+ elif any(family in model_name_lower for family in ['llama', 'llama2', 'llama3']):
196
+ return "text-generation"
197
+ elif any(family in model_name_lower for family in ['mistral', 'mixtral']):
198
+ return "text-generation"
199
+ elif any(family in model_name_lower for family in ['phi', 'phi2', 'phi3']):
200
+ return "text-generation"
201
+ elif any(family in model_name_lower for family in ['gemma', 'gemma2']):
202
+ return "text-generation"
203
+ elif any(family in model_name_lower for family in ['qwen', 'qwen2']):
204
+ return "text-generation"
205
+ elif any(family in model_name_lower for family in ['t5', 't5-']):
206
+ return "summarization"
207
+ elif any(family in model_name_lower for family in ['bart', 'bart-']):
208
+ return "summarization"
209
+ elif any(family in model_name_lower for family in ['pegasus', 'pegasus-']):
210
+ return "summarization"
211
+ elif any(family in model_name_lower for family in ['bert', 'roberta', 'deberta', 'electra', 'distilbert', 'albert']):
212
+ return "ner"
213
+ elif any(family in model_name_lower for family in ['medical', 'clinical', 'healthcare', 'biomedical', 'bio']):
214
+ return "text-generation"
215
+ elif any(family in model_name_lower for family in ['chat', 'instruct', 'conversational', 'dialogue']):
216
+ return "text-generation"
217
+ elif any(family in model_name_lower for family in ['summar', 'summary']):
218
+ return "summarization"
219
+ elif any(family in model_name_lower for family in ['ner', 'entity', 'named-entity']):
220
+ return "ner"
221
+
222
+ # Default fallback
223
+ return "text-generation"
224
+
225
  def validate_model_config(model_name: str, model_type: str) -> dict:
226
  """Validate model configuration and return validation result"""
227
  result = {
 
231
  "recommendations": []
232
  }
233
 
234
+ # Check if model type is supported - now more flexible
235
  if model_type not in MODEL_VALIDATION_RULES:
236
+ # For unknown types, create default validation rules
237
+ result["warnings"].append(f"Model type '{model_type}' not in predefined rules, using default settings")
238
+ result["recommendations"].append("Consider using a known model type for optimal performance")
239
+ # Don't mark as invalid, just warn
240
 
241
  # Check model name format
242
  if model_type == "gguf":
 
244
  result["warnings"].append("GGUF model should have .gguf extension or be in repo/filename format")
245
 
246
  # Check for memory optimization recommendations
247
+ if model_type in ["text-generation", "summarization"] or "large" in model_name.lower() or "xl" in model_name.lower():
248
+ if "large" in model_name.lower() or "xl" in model_name.lower() or "7b" in model_name.lower() or "13b" in model_name.lower():
249
  result["warnings"].append("Large models may cause memory issues on limited resources")
250
  result["recommendations"].append("Consider using a smaller model for better performance")
251
 
252
+ # Check for medical/clinical models
253
+ if any(keyword in model_name.lower() for keyword in ['medical', 'clinical', 'healthcare', 'biomedical', 'bio']):
254
+ result["recommendations"].append("Medical model detected - ensure appropriate medical data handling")
255
+
256
  return result
257
 
258
  def get_model_info(model_name: str, model_type: str) -> dict:
services/ai-service/src/ai_med_extract/utils/model_loader_spaces.py CHANGED
@@ -7,16 +7,23 @@ class OpenVinoPipeline:
7
  self.model = model
8
  self.tokenizer = tokenizer
9
 
10
- def get_openvino_pipeline(model_name: str):
11
  """
12
  Loads an OpenVINO CausalLM pipeline for the given model name or IR directory.
 
13
  """
14
- # If model_name is a directory, try to load IR from there; else, download and export
15
  import os
 
 
 
 
 
 
 
16
  if os.path.isdir(model_name):
17
- model = OVModelForCausalLM.from_pretrained(model_name, compile=True, device="CPU", cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
18
  tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
19
  else:
20
- model = OVModelForCausalLM.from_pretrained(model_name, export=False, compile=False, device="CPU", cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
21
  tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
22
  return OpenVinoPipeline(model, tokenizer)
 
7
  self.model = model
8
  self.tokenizer = tokenizer
9
 
10
+ def get_openvino_pipeline(model_name: str, device: str = None):
11
  """
12
  Loads an OpenVINO CausalLM pipeline for the given model name or IR directory.
13
+ Automatically detects GPU/CPU and uses appropriate device.
14
  """
 
15
  import os
16
+ import torch
17
+
18
+ # Auto-detect device if not provided
19
+ if device is None:
20
+ device = "GPU" if torch.cuda.is_available() else "CPU"
21
+
22
+ # If model_name is a directory, try to load IR from there; else, download and export
23
  if os.path.isdir(model_name):
24
+ model = OVModelForCausalLM.from_pretrained(model_name, compile=True, device=device, cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
25
  tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
26
  else:
27
+ model = OVModelForCausalLM.from_pretrained(model_name, export=False, compile=False, device=device, cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
28
  tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface'))
29
  return OpenVinoPipeline(model, tokenizer)
services/ai-service/src/ai_med_extract/utils/model_manager.py CHANGED
@@ -356,9 +356,9 @@ class OpenVINOModelLoader(BaseModelLoader):
356
  try:
357
  from .model_loader_spaces import get_openvino_pipeline
358
 
359
- logger.info(f"Loading OpenVINO model: {self.model_name}")
360
- self._pipeline = get_openvino_pipeline(self.model_name)
361
- logger.info(f"OpenVINO model loaded successfully: {self.model_name}")
362
 
363
  except ImportError as import_error:
364
  logger.warning(f"OpenVINO model loader not available: {import_error}")
@@ -440,24 +440,78 @@ class UnifiedModelManager:
440
  ) -> BaseModelLoader:
441
  """
442
  Get a model loader for the specified model and type
 
443
  """
444
  cache_key = f"{model_name}:{model_type}:{filename or ''}:{quantize}"
445
 
446
  if not force_reload and cache_key in self._model_cache:
447
  return self._model_cache[cache_key]
448
 
 
 
 
 
 
449
  try:
450
- # Determine loader type and create appropriate loader
451
  if model_type == "gguf":
452
  loader = GGUFModelLoader(model_name, filename)
453
  elif model_type == "openvino":
454
  loader = OpenVINOModelLoader(model_name)
455
  else:
456
- # Default to transformers for text-generation, summarization, ner, etc.
457
  loader = TransformersModelLoader(model_name, model_type)
458
 
459
  # Test load the model
460
  pipeline = loader.load()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
461
 
462
  # Apply quantization if enabled and applicable
463
  if quantize and quantization_config is None and model_config:
@@ -527,6 +581,49 @@ class UnifiedModelManager:
527
  cache_key: loader.get_model_info()
528
  for cache_key, loader in self._model_cache.items()
529
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
530
 
531
  # Global instance
532
  model_manager = UnifiedModelManager()
 
356
  try:
357
  from .model_loader_spaces import get_openvino_pipeline
358
 
359
+ logger.info(f"Loading OpenVINO model: {self.model_name} on device: {self.device}")
360
+ self._pipeline = get_openvino_pipeline(self.model_name, self.device)
361
+ logger.info(f"OpenVINO model loaded successfully: {self.model_name} on {self.device}")
362
 
363
  except ImportError as import_error:
364
  logger.warning(f"OpenVINO model loader not available: {import_error}")
 
440
  ) -> BaseModelLoader:
441
  """
442
  Get a model loader for the specified model and type
443
+ Now supports ANY model type with intelligent fallback
444
  """
445
  cache_key = f"{model_name}:{model_type}:{filename or ''}:{quantize}"
446
 
447
  if not force_reload and cache_key in self._model_cache:
448
  return self._model_cache[cache_key]
449
 
450
+ # Try multiple loader strategies for maximum compatibility
451
+ loader = None
452
+ last_error = None
453
+
454
+ # Strategy 1: Try the specified model type first
455
  try:
 
456
  if model_type == "gguf":
457
  loader = GGUFModelLoader(model_name, filename)
458
  elif model_type == "openvino":
459
  loader = OpenVINOModelLoader(model_name)
460
  else:
461
+ # Default to transformers for any other type
462
  loader = TransformersModelLoader(model_name, model_type)
463
 
464
  # Test load the model
465
  pipeline = loader.load()
466
+ logger.info(f"Successfully loaded {model_name} with {model_type} loader")
467
+
468
+ except Exception as e:
469
+ logger.warning(f"Failed to load {model_name} with {model_type} loader: {e}")
470
+ last_error = e
471
+ loader = None
472
+
473
+ # Strategy 2: Try alternative loaders based on model name patterns
474
+ alternative_strategies = []
475
+
476
+ # Check if it's a GGUF model by extension or name
477
+ if model_name.endswith('.gguf') or 'gguf' in model_name.lower():
478
+ alternative_strategies.append(("gguf", lambda: GGUFModelLoader(model_name, filename)))
479
+
480
+ # Check if it's an OpenVINO model
481
+ if model_name.endswith('.onnx') or 'openvino' in model_name.lower() or 'ov' in model_name.lower():
482
+ alternative_strategies.append(("openvino", lambda: OpenVINOModelLoader(model_name)))
483
+
484
+ # Try transformers with different task types
485
+ if any(keyword in model_name.lower() for keyword in ['summar', 'summary', 't5', 'bart', 'pegasus']):
486
+ alternative_strategies.append(("summarization", lambda: TransformersModelLoader(model_name, "summarization")))
487
+ elif any(keyword in model_name.lower() for keyword in ['ner', 'bert', 'roberta', 'entity']):
488
+ alternative_strategies.append(("ner", lambda: TransformersModelLoader(model_name, "ner")))
489
+ else:
490
+ # Try as text-generation
491
+ alternative_strategies.append(("text-generation", lambda: TransformersModelLoader(model_name, "text-generation")))
492
+
493
+ # Try each alternative strategy
494
+ for alt_type, alt_loader_func in alternative_strategies:
495
+ try:
496
+ logger.info(f"Trying alternative loader: {alt_type} for {model_name}")
497
+ loader = alt_loader_func()
498
+ pipeline = loader.load()
499
+ logger.info(f"Successfully loaded {model_name} with alternative {alt_type} loader")
500
+ break
501
+ except Exception as alt_error:
502
+ logger.warning(f"Alternative {alt_type} loader failed: {alt_error}")
503
+ last_error = alt_error
504
+ loader = None
505
+ continue
506
+
507
+ # If all strategies failed, create a fallback loader
508
+ if loader is None:
509
+ logger.error(f"All loading strategies failed for {model_name}. Creating fallback loader.")
510
+ loader = self._create_fallback_loader(model_name, model_type, last_error)
511
+
512
+ # Test load the model
513
+ try:
514
+ pipeline = loader.load()
515
 
516
  # Apply quantization if enabled and applicable
517
  if quantize and quantization_config is None and model_config:
 
581
  cache_key: loader.get_model_info()
582
  for cache_key, loader in self._model_cache.items()
583
  }
584
+
585
+ def _create_fallback_loader(self, model_name: str, model_type: str, error: Exception = None) -> BaseModelLoader:
586
+ """Create a fallback loader when all other strategies fail"""
587
+ class FallbackModelLoader(BaseModelLoader):
588
+ def __init__(self, model_name: str, model_type: str, error: Exception = None):
589
+ self.model_name = model_name
590
+ self.model_type = model_type
591
+ self.error = error
592
+ self._pipeline = None
593
+
594
+ def load(self):
595
+ if self._pipeline is None:
596
+ # Create a simple fallback pipeline
597
+ class FallbackPipeline:
598
+ def __init__(self, model_name, model_type, error):
599
+ self.model_name = model_name
600
+ self.model_type = model_type
601
+ self.error = error
602
+
603
+ def generate(self, prompt, **kwargs):
604
+ return f"Model '{self.model_name}' ({self.model_type}) not available. Error: {str(self.error)[:100]}..."
605
+
606
+ def __call__(self, prompt, **kwargs):
607
+ return [{"generated_text": self.generate(prompt, **kwargs)}]
608
+
609
+ self._pipeline = FallbackPipeline(self.model_name, self.model_type, self.error)
610
+ return self._pipeline
611
+
612
+ def generate(self, prompt: str, **kwargs) -> str:
613
+ pipeline = self.load()
614
+ return pipeline.generate(prompt, **kwargs)
615
+
616
+ def get_model_info(self) -> Dict[str, Any]:
617
+ return {
618
+ "type": "fallback",
619
+ "model_name": self.model_name,
620
+ "model_type": self.model_type,
621
+ "loaded": True,
622
+ "fallback": True,
623
+ "error": str(self.error) if self.error else None
624
+ }
625
+
626
+ return FallbackModelLoader(model_name, model_type, error)
627
 
628
  # Global instance
629
  model_manager = UnifiedModelManager()
services/ai-service/src/ai_med_extract/utils/openvino_summarizer_utils.py CHANGED
@@ -252,6 +252,47 @@ def build_main_prompt(baseline, delta_text, patient_info="", section=None):
252
  "Now generate the complete clinical summary with all four sections in markdown format:"
253
  )
254
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
255
  def validate_and_compare_summaries(old_summary, new_summary, update_name=""):
256
  report = f"### Validation Report for {update_name}\n"
257
  report += "This report validates that the updated summary incorporates new information correctly.\n"
 
252
  "Now generate the complete clinical summary with all four sections in markdown format:"
253
  )
254
 
255
+ def build_full_prompt(all_visits, patient_info="", section=None):
256
+ """
257
+ Build the full prompt using the enhanced format that was previously only used for GGUF models.
258
+ This provides more detailed instructions and better formatting for all model types.
259
+ """
260
+ base_instruction = (
261
+ "You are a clinical AI assistant. Generate a COMPLETE patient summary with EXACTLY 4 sections in markdown format. "
262
+ "Ensure ALL sections are fully generated and detailed with bullet points. Do not skip or abbreviate any section. "
263
+ "Do not hallucinate or invent any information. Base ONLY on provided data."
264
+ )
265
+
266
+ if section:
267
+ section_instructions = {
268
+ "Clinical Assessment": "Generate ONLY the 'Clinical Assessment' section. Be concise, accurate, and evidence-based with bullet points.",
269
+ "Key Trends & Changes": "Generate ONLY the 'Key Trends & Changes' section. Focus on deltas, trends, vitals, labs, and med changes with bullet points.",
270
+ "Plan & Suggested Actions": "Generate ONLY the 'Plan & Suggested Actions' section. Suggest next steps, monitoring, treatments, follow-ups with bullet points.",
271
+ "Direct Guidance for Physician": "Generate ONLY the 'Direct Guidance for Physician' section. Give clear, actionable advice for the clinician with bullet points."
272
+ }
273
+ instruction = section_instructions.get(section, f"Generate the '{section}' section.")
274
+ return f"{base_instruction}\n\nDATA:\nvisits: {all_visits}\n\n{instruction}\n\nUse bullet points with '- '. Base ONLY on provided data. No preamble, explanations, or extra text. Start immediately with the section content:"
275
+
276
+ # Default: generate full 4-section summary
277
+ return f"""{base_instruction}
278
+ DATA:
279
+ visits: {all_visits}
280
+
281
+ REQUIRED OUTPUT FORMAT (must include all, each with at least 3-5 bullet points):
282
+ ## Clinical Assessment
283
+ - Bullet points analyzing current state, diagnoses, vitals, labs, medications.
284
+
285
+ ## Key Trends & Changes
286
+ - Bullet points on trends, deltas, new developments, changes in vitals/labs over time.
287
+
288
+ ## Plan & Suggested Actions
289
+ - Bullet points with recommended next steps, monitoring, treatments, follow-ups.
290
+
291
+ ## Direct Guidance for Physician
292
+ - Bullet points with key clinical insights, warnings, considerations, potential risks.
293
+
294
+ Use bullet points with "- ". Base ONLY on provided data. No preamble, explanations, or extra text. Start immediately with "## Clinical Assessment" and ensure all 4 sections are complete and detailed:"""
295
+
296
  def validate_and_compare_summaries(old_summary, new_summary, update_name=""):
297
  report = f"### Validation Report for {update_name}\n"
298
  report += "This report validates that the updated summary incorporates new information correctly.\n"
test_device_fix.py CHANGED
@@ -14,6 +14,9 @@ current_dir = Path(__file__).parent
14
  ai_med_extract_path = current_dir / "services" / "ai-service" / "src"
15
  sys.path.insert(0, str(ai_med_extract_path))
16
 
 
 
 
17
  # Set environment variables
18
  os.environ.setdefault('HF_SPACES', 'true')
19
  os.environ.setdefault('PYTHONUNBUFFERED', '1')
 
14
  ai_med_extract_path = current_dir / "services" / "ai-service" / "src"
15
  sys.path.insert(0, str(ai_med_extract_path))
16
 
17
+ # Also add the parent directory for proper module resolution
18
+ sys.path.insert(0, str(ai_med_extract_path.parent))
19
+
20
  # Set environment variables
21
  os.environ.setdefault('HF_SPACES', 'true')
22
  os.environ.setdefault('PYTHONUNBUFFERED', '1')
test_hf_spaces_fix.py CHANGED
@@ -67,9 +67,12 @@ def test_app_import():
67
  try:
68
  # Add the ai_med_extract module to Python path
69
  current_dir = Path(__file__).parent
70
- ai_med_extract_path = current_dir / "services" / "ai-service" / "src" / "ai_med_extract"
71
  sys.path.insert(0, str(ai_med_extract_path))
72
 
 
 
 
73
  # Set HF Spaces environment
74
  os.environ['HF_SPACES'] = 'true'
75
 
 
67
  try:
68
  # Add the ai_med_extract module to Python path
69
  current_dir = Path(__file__).parent
70
+ ai_med_extract_path = current_dir / "services" / "ai-service" / "src"
71
  sys.path.insert(0, str(ai_med_extract_path))
72
 
73
+ # Also add the parent directory for proper module resolution
74
+ sys.path.insert(0, str(ai_med_extract_path.parent))
75
+
76
  # Set HF Spaces environment
77
  os.environ['HF_SPACES'] = 'true'
78