Spaces:

salvinjose
/

HNTAI

Paused

App Files Files Community

sachinchandrankallar commited on Nov 25, 2025

Commit

733c0c5

1 Parent(s): cdea66b

1200 seconds from 600 secs timeout

Browse files

Files changed (21) hide show

Dockerfile.hf-spaces-minimal +1 -1
docs/archive/COMPREHENSIVE_STREAMING_FIX.md +2 -2
docs/archive/patient_summary_models_review.md +5 -5
docs/hf-spaces/FILES_CREATED.md +4 -4
docs/hf-spaces/INDEX.md +2 -2
services/ai-service/DEPLOYMENT_FIX.md +4 -4
services/ai-service/Dockerfile.prod +1 -1
services/ai-service/src/__main__.py +1 -1
services/ai-service/src/ai_med_extract/api/routes_fastapi.py +20 -20
services/ai-service/src/ai_med_extract/app.py +1 -1
services/ai-service/src/ai_med_extract/config/performance_config.py +2 -2
services/ai-service/src/ai_med_extract/enable_optimizations.py +2 -2
services/ai-service/src/ai_med_extract/inference_service.py +1 -1
services/ai-service/src/ai_med_extract/phi_scrubber_service.py +1 -1
services/ai-service/src/ai_med_extract/services/job_manager.py +1 -1
services/ai-service/src/ai_med_extract/services/request_queue.py +3 -3
services/ai-service/src/ai_med_extract/utils/constants.py +20 -20
services/ai-service/src/ai_med_extract/utils/hf_spaces_config.py +1 -1
services/ai-service/src/ai_med_extract/utils/openvino_summarizer_utils.py +1 -1
services/ai-service/src/ai_med_extract/utils/performance_monitor.py +1 -1
services/ai-service/src/ai_med_extract/utils/unified_model_manager.py +1 -1

Dockerfile.hf-spaces-minimal CHANGED Viewed

@@ -48,5 +48,5 @@ HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
     CMD curl -f http://localhost:7860/health || exit 1
 # Start application with single worker for minimal memory footprint
-CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1", "--timeout-keep-alive", "600"]

     CMD curl -f http://localhost:7860/health || exit 1
 # Start application with single worker for minimal memory footprint
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1", "--timeout-keep-alive", "1200"]

docs/archive/COMPREHENSIVE_STREAMING_FIX.md CHANGED Viewed

@@ -31,7 +31,7 @@ is_gguf_mode = (data.get('generation_mode') == 'gguf' or
 ### **3. Extended Timeout Configuration**
 ```python
 # Extended timeout for GGUF operations
-max_wait_time = 600  # 10 minutes for GGUF operations
 heartbeat_interval = 5  # Every 5 seconds
 ```
@@ -54,7 +54,7 @@ heartbeat_interval = 5  # Every 5 seconds
 ### **5. Enhanced SSE Generator**
 ```python
 def sse_generator_extended(job_id):
-    max_wait_time = 600  # 10 minutes for GGUF operations
     heartbeat_interval = 5  # Every 5 seconds
     # Enhanced logging and progress updates
 ```

 ### **3. Extended Timeout Configuration**
 ```python
 # Extended timeout for GGUF operations
+max_wait_time = 1200  # 10 minutes for GGUF operations
 heartbeat_interval = 5  # Every 5 seconds
 ```
 ### **5. Enhanced SSE Generator**
 ```python
 def sse_generator_extended(job_id):
+    max_wait_time = 1200  # 10 minutes for GGUF operations
     heartbeat_interval = 5  # Every 5 seconds
     # Enhanced logging and progress updates
 ```

docs/archive/patient_summary_models_review.md CHANGED Viewed

@@ -160,7 +160,7 @@ elif model_type == "causal-openvino":
 #### Weaknesses
 - ⚠️ **Slight quality loss**: Q4 quantization may reduce quality slightly
-- ⚠️ **Longer timeouts**: Extended timeout needed (600s on HF Spaces)
 - ⚠️ **File path parsing**: Requires special handling for filename extraction
 #### Implementation Details
@@ -428,7 +428,7 @@ Based on HF Spaces configuration (`hf_spaces_config.py`):
 - ✅ **RAM**: ~3-4GB during inference
 - ✅ **Speed**: Very good on T4 (GGUF optimized)
 - ✅ **HF Spaces Config**: Primary GGUF model (line 33)
-- ✅ **Extended Timeout**: 600s configured for HF Spaces (routes_fastapi.py line 1075)
 - ✅ **Quantization**: Q4 reduces memory by ~75%
 #### Performance Estimates
@@ -449,7 +449,7 @@ Based on HF Spaces configuration (`hf_spaces_config.py`):
 #### Recommendations
 - **Best Choice** for cost-conscious deployment
 - Use when expecting high concurrent load
-- Extended timeout already configured (600s)
 - Cache-friendly for repeated requests
 ---
@@ -551,7 +551,7 @@ GGUF (Phi-3-Q4):   ~2.0GB GPU  (16% of usable)
 Based on `routes_fastapi.py`:
 - **Standard models**: 120-180s timeout
-- **GGUF models**: 600s extended timeout (line 1075)
 - **HF Spaces detection**: Automatic (line 1073-1074)
 ### Optimization Strategies for T4
@@ -619,7 +619,7 @@ Fallback Model: microsoft/Phi-3-mini-4k-instruct-gguf
 Emergency Fallback: google/flan-t5-large
 Max Concurrent: 5-6 requests (BART), 8-10 (GGUF)
 Memory Limit: 80% (12.8GB GPU, 24GB RAM)
-Timeout: 180s (standard), 600s (GGUF)
 ```
 ### 📊 **Expected Performance**

 #### Weaknesses
 - ⚠️ **Slight quality loss**: Q4 quantization may reduce quality slightly
+- ⚠️ **Longer timeouts**: Extended timeout needed (1200s on HF Spaces)
 - ⚠️ **File path parsing**: Requires special handling for filename extraction
 #### Implementation Details
 - ✅ **RAM**: ~3-4GB during inference
 - ✅ **Speed**: Very good on T4 (GGUF optimized)
 - ✅ **HF Spaces Config**: Primary GGUF model (line 33)
+- ✅ **Extended Timeout**: 1200s configured for HF Spaces (routes_fastapi.py line 1075)
 - ✅ **Quantization**: Q4 reduces memory by ~75%
 #### Performance Estimates
 #### Recommendations
 - **Best Choice** for cost-conscious deployment
 - Use when expecting high concurrent load
+- Extended timeout already configured (1200s)
 - Cache-friendly for repeated requests
 ---
 Based on `routes_fastapi.py`:
 - **Standard models**: 120-180s timeout
+- **GGUF models**: 1200s extended timeout (line 1075)
 - **HF Spaces detection**: Automatic (line 1073-1074)
 ### Optimization Strategies for T4
 Emergency Fallback: google/flan-t5-large
 Max Concurrent: 5-6 requests (BART), 8-10 (GGUF)
 Memory Limit: 80% (12.8GB GPU, 24GB RAM)
+Timeout: 180s (standard), 1200s (GGUF)
 ```
 ### 📊 **Expected Performance**

docs/hf-spaces/FILES_CREATED.md CHANGED Viewed

@@ -125,7 +125,7 @@ python verify_cache.py
 ### 7. `MODEL_CACHING_SUMMARY.md` ⭐ START HERE
 **Purpose**: Overview and answer to your question
-**Size**: ~600 lines
 **Contents**:
 - Direct answer to your question
 - Performance comparison
@@ -183,7 +183,7 @@ python verify_cache.py
 ### 11. `README_HF_SPACES.md`
 **Purpose**: Main README for HF Spaces deployment
-**Size**: ~600 lines
 **Contents**:
 - Quick start (3 steps)
 - File structure
@@ -231,11 +231,11 @@ python verify_cache.py
 | `entrypoint.sh` | Script | ⭐ YES | 40 lines | Startup verification |
 | `verify_cache.py` | Tool | Recommended | 200 lines | Verify cache |
 | `health_endpoints.py` | Code | Recommended | +120 lines | Health endpoints |
-| `MODEL_CACHING_SUMMARY.md` | Docs | ⭐ START HERE | 600 lines | Overview |
 | `HF_SPACES_QUICKSTART.md` | Docs | Recommended | 400 lines | Quick start |
 | `HF_SPACES_DEPLOYMENT.md` | Docs | Reference | 800 lines | Full guide |
 | `DEPLOYMENT_CHECKLIST.md` | Docs | Helpful | 400 lines | Checklist |
-| `README_HF_SPACES.md` | Docs | Reference | 600 lines | Main README |
 | `COMPARISON_BEFORE_AFTER.md` | Docs | Helpful | 500 lines | Comparison |
 | `FILES_CREATED.md` | Docs | Reference | This file | Index |

 ### 7. `MODEL_CACHING_SUMMARY.md` ⭐ START HERE
 **Purpose**: Overview and answer to your question
+**Size**: ~1200 lines
 **Contents**:
 - Direct answer to your question
 - Performance comparison
 ### 11. `README_HF_SPACES.md`
 **Purpose**: Main README for HF Spaces deployment
+**Size**: ~1200 lines
 **Contents**:
 - Quick start (3 steps)
 - File structure
 | `entrypoint.sh` | Script | ⭐ YES | 40 lines | Startup verification |
 | `verify_cache.py` | Tool | Recommended | 200 lines | Verify cache |
 | `health_endpoints.py` | Code | Recommended | +120 lines | Health endpoints |
+| `MODEL_CACHING_SUMMARY.md` | Docs | ⭐ START HERE | 1200 lines | Overview |
 | `HF_SPACES_QUICKSTART.md` | Docs | Recommended | 400 lines | Quick start |
 | `HF_SPACES_DEPLOYMENT.md` | Docs | Reference | 800 lines | Full guide |
 | `DEPLOYMENT_CHECKLIST.md` | Docs | Helpful | 400 lines | Checklist |
+| `README_HF_SPACES.md` | Docs | Reference | 1200 lines | Main README |
 | `COMPARISON_BEFORE_AFTER.md` | Docs | Helpful | 500 lines | Comparison |
 | `FILES_CREATED.md` | Docs | Reference | This file | Index |

docs/hf-spaces/INDEX.md CHANGED Viewed

@@ -122,8 +122,8 @@ All documentation for deploying to Hugging Face Spaces with pre-cached models.
 | DEPLOYMENT_CHECKLIST.md | ~400 | Use while deploying | ⭐⭐ |
 | MODEL_UPDATE_SUMMARY.md | ~500 | 10 min | ⭐⭐ |
 | HF_SPACES_DEPLOYMENT.md | ~800 | 30 min | ⭐ |
-| MODEL_CACHING_SUMMARY.md | ~600 | 15 min | ⭐ |
-| README_HF_SPACES.md | ~600 | Reference | ⭐ |
 | COMPARISON_BEFORE_AFTER.md | ~500 | Reference | Optional |
 | FILES_CREATED.md | ~500 | Reference | Optional |

 | DEPLOYMENT_CHECKLIST.md | ~400 | Use while deploying | ⭐⭐ |
 | MODEL_UPDATE_SUMMARY.md | ~500 | 10 min | ⭐⭐ |
 | HF_SPACES_DEPLOYMENT.md | ~800 | 30 min | ⭐ |
+| MODEL_CACHING_SUMMARY.md | ~1200 | 15 min | ⭐ |
+| README_HF_SPACES.md | ~1200 | Reference | ⭐ |
 | COMPARISON_BEFORE_AFTER.md | ~500 | Reference | Optional |
 | FILES_CREATED.md | ~500 | Reference | Optional |

services/ai-service/DEPLOYMENT_FIX.md CHANGED Viewed

@@ -17,13 +17,13 @@ The deployment was failing with a "Scheduling failure: unable to schedule" error
 **Before:**
 ```dockerfile
 RUN pip install --no-cache-dir -r /app/requirements.txt gunicorn
-CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:7860", "--timeout", "600", "wsgi:app"]
 ```
 **After:**
 ```dockerfile
 RUN pip install --no-cache-dir -r /app/requirements.txt uvicorn[standard]
-CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "600", "--workers", "4"]
 ```
 ### Why This Works
@@ -66,12 +66,12 @@ If you need more production-grade deployment with multiple workers:
 #### Option A: Gunicorn with Uvicorn Workers (Recommended for Production)
 ```dockerfile
 RUN pip install --no-cache-dir -r /app/requirements.txt gunicorn uvicorn[standard]
-CMD ["gunicorn", "app:app", "--workers", "4", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:7860", "--timeout", "600"]
 ```
 #### Option B: Pure Uvicorn (Current, Good for Medium Load)
 ```dockerfile
-CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "600", "--workers", "4"]
 ```
 ### 3. Health Check Configuration

 **Before:**
 ```dockerfile
 RUN pip install --no-cache-dir -r /app/requirements.txt gunicorn
+CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:7860", "--timeout", "1200", "wsgi:app"]
 ```
 **After:**
 ```dockerfile
 RUN pip install --no-cache-dir -r /app/requirements.txt uvicorn[standard]
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "1200", "--workers", "4"]
 ```
 ### Why This Works
 #### Option A: Gunicorn with Uvicorn Workers (Recommended for Production)
 ```dockerfile
 RUN pip install --no-cache-dir -r /app/requirements.txt gunicorn uvicorn[standard]
+CMD ["gunicorn", "app:app", "--workers", "4", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:7860", "--timeout", "1200"]
 ```
 #### Option B: Pure Uvicorn (Current, Good for Medium Load)
 ```dockerfile
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "1200", "--workers", "4"]
 ```
 ### 3. Health Check Configuration

services/ai-service/Dockerfile.prod CHANGED Viewed

@@ -22,4 +22,4 @@ EXPOSE 7860
 ENV PRELOAD_SMALL_MODELS=false
 # Use uvicorn directly for FastAPI (ASGI) instead of gunicorn (WSGI)
-CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "600", "--workers", "4"]

 ENV PRELOAD_SMALL_MODELS=false
 # Use uvicorn directly for FastAPI (ASGI) instead of gunicorn (WSGI)
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "1200", "--workers", "4"]

services/ai-service/src/__main__.py CHANGED Viewed

@@ -12,4 +12,4 @@ initialize_agents(app)
 if __name__ == '__main__':
     import uvicorn
-    uvicorn.run(app, host="0.0.0.0", port=7860, timeout_keep_alive=600)

 if __name__ == '__main__':
     import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860, timeout_keep_alive=1200)

services/ai-service/src/ai_med_extract/api/routes_fastapi.py CHANGED Viewed

@@ -635,7 +635,7 @@ def generate_rule_based_summary(baseline, delta_text, visits=None, patientid=Non
     # Clinical Overview: summarize baseline
     if baseline:
-        baseline_snip = baseline[:600].replace("\n", " ")
         lines_assessment.append(f"- Baseline: {baseline_snip}")
     else:
         lines_assessment.append("- No baseline data available.")
@@ -1348,7 +1348,7 @@ async def async_patient_summary(data, job_id=None):
             try:
                 # Use extended timeout for GGUF operations on HF Spaces
                 is_hf_spaces = os.environ.get('HF_SPACES', 'false').lower() == 'true'
-                timeout_value = timeout_config.get("gguf_extended_timeout" if is_hf_spaces else "gguf_timeout", 600)
                 if cache_key not in GGUF_PIPELINE_CACHE:
                     if job_id:
@@ -1584,10 +1584,10 @@ async def async_patient_summary(data, job_id=None):
             try:
                 raw_summary = await asyncio.wait_for(
                     generate_with_progress(),
-                    timeout=timeout_config.get("generation_timeout", 600)
                 )
             except asyncio.TimeoutError:
-                error_msg = f"Text generation timed out after {timeout_config.get('generation_timeout', 600)} seconds"
                 log_error_with_context(Exception(error_msg), "Text generation timeout", job_id)
                 update_job_with_error(job_id, error_msg, "generation_timeout")
                 raise Exception(error_msg)
@@ -1663,10 +1663,10 @@ async def async_patient_summary(data, job_id=None):
             try:
                 result_sum = await asyncio.wait_for(
                     asyncio.to_thread(model.generate, context, config),
-                    timeout=timeout_config.get("generation_timeout", 600)
                 )
             except asyncio.TimeoutError:
-                error_msg = f"Summarization timed out after {timeout_config.get('generation_timeout', 600)} seconds"
                 log_error_with_context(Exception(error_msg), "Summarization timeout", job_id)
                 update_job_with_error(job_id, error_msg, "generation_timeout")
                 raise Exception(error_msg)
@@ -1777,7 +1777,7 @@ async def async_patient_summary(data, job_id=None):
                             temperature=0.1,
                             top_p=0.5,
                         ),
-                        timeout=600
                     )
                 else:
                     config = create_generation_config(data, min_tokens=100, temperature=0.1, top_p=0.5)
@@ -1827,7 +1827,7 @@ async def async_patient_summary(data, job_id=None):
         if "timeout" in error_str.lower():
             error_category = "TIMEOUT"
             # Enhanced timeout message with recommendations
-            user_message = f"""Summary generation timed out after {timeout_config.get('generation_timeout', 600)} seconds.
 This may be due to:
 - Large patient dataset requiring more processing time
@@ -1952,7 +1952,7 @@ def process_patient_summary_background(data, job_id):
                             ehr_url,
                             json={"patientid": patientid},
                             headers=headers,
-                            timeout=600
                         )
                         if response.status_code == 200:
                             sample_data = response.json()
@@ -2417,7 +2417,7 @@ async def home():
                 border-radius: 20px;
                 padding: 40px;
                 box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
-                max-width: 600px;
                 width: 100%;
                 animation: fadeIn 0.5s ease-in;
             }
@@ -2433,7 +2433,7 @@ async def home():
                 padding: 8px 16px;
                 border-radius: 20px;
                 font-size: 14px;
-                font-weight: 600;
                 margin-bottom: 20px;
             }
             .status-dot {
@@ -2466,7 +2466,7 @@ async def home():
             }
             .info-title {
                 color: #374151;
-                font-weight: 600;
                 margin-bottom: 15px;
                 font-size: 18px;
             }
@@ -2491,7 +2491,7 @@ async def home():
                 padding: 4px 8px;
                 border-radius: 4px;
                 font-size: 12px;
-                font-weight: 600;
                 margin-right: 10px;
                 min-width: 50px;
                 text-align: center;
@@ -2512,7 +2512,7 @@ async def home():
             .link {
                 color: #667eea;
                 text-decoration: none;
-                font-weight: 600;
             }
             .link:hover {
                 text-decoration: underline;
@@ -2704,7 +2704,7 @@ async def generate_patient_summary_large_data(
             """Wait for slot and then process."""
             try:
                 # Wait for processing slot
-                if queue_manager.wait_for_slot(request_id, timeout=600):
                     # Update job status to show processing started
                     job_manager.update_job(job_id, JOB_STATUS["STARTED"], progress=5, data={'message': 'Processing slot acquired, starting generation...'})
                     # Start background task with optimized generation
@@ -2733,7 +2733,7 @@ async def generate_patient_summary_large_data(
                 'X-Content-Type-Options': 'nosniff',
                 'Access-Control-Allow-Origin': '*',
                 'Access-Control-Allow-Headers': 'Cache-Control, Connection',
-                'Keep-Alive': 'timeout=3600',
                 # Force HTTP/1.1 to avoid HTTP/2 protocol errors
                 'X-Protocol': 'HTTP/1.1'
             }
@@ -2790,7 +2790,7 @@ async def generate_patient_summary_streaming(
             """Wait for slot and then process."""
             try:
                 # Wait for processing slot
-                if queue_manager.wait_for_slot(request_id, timeout=600):
                     # Update job status to show processing started
                     job_manager.update_job(job_id, JOB_STATUS["STARTED"], progress=5, data={'message': 'Processing slot acquired, starting generation...'})
                     # Start background task with optimized generation
@@ -2819,7 +2819,7 @@ async def generate_patient_summary_streaming(
                 'X-Content-Type-Options': 'nosniff',
                 'Access-Control-Allow-Origin': '*',
                 'Access-Control-Allow-Headers': 'Cache-Control, Connection',
-                'Keep-Alive': 'timeout=3600',
                 # Force HTTP/1.1 to avoid HTTP/2 protocol errors
                 'X-Protocol': 'HTTP/1.1'
             }
@@ -2898,7 +2898,7 @@ async def generate_patient_summary(
                 """Wait for slot and then process."""
                 try:
                     # Wait for processing slot
-                    if queue_manager.wait_for_slot(request_id, timeout=600):
                         # Update job status to show processing started
                         job_manager.update_job(job_id, JOB_STATUS["STARTED"], progress=5, data={'message': 'Processing slot acquired, starting generation...'})
                         # Start background task directly (not in separate thread to avoid nesting)
@@ -2928,7 +2928,7 @@ async def generate_patient_summary(
                     'X-Content-Type-Options': 'nosniff',
                     'Access-Control-Allow-Origin': '*',
                     'Access-Control-Allow-Headers': 'Cache-Control, Connection',
-                    'Keep-Alive': 'timeout=3600',
                     # Force HTTP/1.1 to avoid HTTP/2 protocol errors
                     'X-Protocol': 'HTTP/1.1'
                 }

     # Clinical Overview: summarize baseline
     if baseline:
+        baseline_snip = baseline[:1200].replace("\n", " ")
         lines_assessment.append(f"- Baseline: {baseline_snip}")
     else:
         lines_assessment.append("- No baseline data available.")
             try:
                 # Use extended timeout for GGUF operations on HF Spaces
                 is_hf_spaces = os.environ.get('HF_SPACES', 'false').lower() == 'true'
+                timeout_value = timeout_config.get("gguf_extended_timeout" if is_hf_spaces else "gguf_timeout", 1200)
                 if cache_key not in GGUF_PIPELINE_CACHE:
                     if job_id:
             try:
                 raw_summary = await asyncio.wait_for(
                     generate_with_progress(),
+                    timeout=timeout_config.get("generation_timeout", 1200)
                 )
             except asyncio.TimeoutError:
+                error_msg = f"Text generation timed out after {timeout_config.get('generation_timeout', 1200)} seconds"
                 log_error_with_context(Exception(error_msg), "Text generation timeout", job_id)
                 update_job_with_error(job_id, error_msg, "generation_timeout")
                 raise Exception(error_msg)
             try:
                 result_sum = await asyncio.wait_for(
                     asyncio.to_thread(model.generate, context, config),
+                    timeout=timeout_config.get("generation_timeout", 1200)
                 )
             except asyncio.TimeoutError:
+                error_msg = f"Summarization timed out after {timeout_config.get('generation_timeout', 1200)} seconds"
                 log_error_with_context(Exception(error_msg), "Summarization timeout", job_id)
                 update_job_with_error(job_id, error_msg, "generation_timeout")
                 raise Exception(error_msg)
                             temperature=0.1,
                             top_p=0.5,
                         ),
+                        timeout=1200
                     )
                 else:
                     config = create_generation_config(data, min_tokens=100, temperature=0.1, top_p=0.5)
         if "timeout" in error_str.lower():
             error_category = "TIMEOUT"
             # Enhanced timeout message with recommendations
+            user_message = f"""Summary generation timed out after {timeout_config.get('generation_timeout', 1200)} seconds.
 This may be due to:
 - Large patient dataset requiring more processing time
                             ehr_url,
                             json={"patientid": patientid},
                             headers=headers,
+                            timeout=1200
                         )
                         if response.status_code == 200:
                             sample_data = response.json()
                 border-radius: 20px;
                 padding: 40px;
                 box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
+                max-width: 1200px;
                 width: 100%;
                 animation: fadeIn 0.5s ease-in;
             }
                 padding: 8px 16px;
                 border-radius: 20px;
                 font-size: 14px;
+                font-weight: 1200;
                 margin-bottom: 20px;
             }
             .status-dot {
             }
             .info-title {
                 color: #374151;
+                font-weight: 1200;
                 margin-bottom: 15px;
                 font-size: 18px;
             }
                 padding: 4px 8px;
                 border-radius: 4px;
                 font-size: 12px;
+                font-weight: 1200;
                 margin-right: 10px;
                 min-width: 50px;
                 text-align: center;
             .link {
                 color: #667eea;
                 text-decoration: none;
+                font-weight: 1200;
             }
             .link:hover {
                 text-decoration: underline;
             """Wait for slot and then process."""
             try:
                 # Wait for processing slot
+                if queue_manager.wait_for_slot(request_id, timeout=1200):
                     # Update job status to show processing started
                     job_manager.update_job(job_id, JOB_STATUS["STARTED"], progress=5, data={'message': 'Processing slot acquired, starting generation...'})
                     # Start background task with optimized generation
                 'X-Content-Type-Options': 'nosniff',
                 'Access-Control-Allow-Origin': '*',
                 'Access-Control-Allow-Headers': 'Cache-Control, Connection',
+                'Keep-Alive': 'timeout=31200',
                 # Force HTTP/1.1 to avoid HTTP/2 protocol errors
                 'X-Protocol': 'HTTP/1.1'
             }
             """Wait for slot and then process."""
             try:
                 # Wait for processing slot
+                if queue_manager.wait_for_slot(request_id, timeout=1200):
                     # Update job status to show processing started
                     job_manager.update_job(job_id, JOB_STATUS["STARTED"], progress=5, data={'message': 'Processing slot acquired, starting generation...'})
                     # Start background task with optimized generation
                 'X-Content-Type-Options': 'nosniff',
                 'Access-Control-Allow-Origin': '*',
                 'Access-Control-Allow-Headers': 'Cache-Control, Connection',
+                'Keep-Alive': 'timeout=31200',
                 # Force HTTP/1.1 to avoid HTTP/2 protocol errors
                 'X-Protocol': 'HTTP/1.1'
             }
                 """Wait for slot and then process."""
                 try:
                     # Wait for processing slot
+                    if queue_manager.wait_for_slot(request_id, timeout=1200):
                         # Update job status to show processing started
                         job_manager.update_job(job_id, JOB_STATUS["STARTED"], progress=5, data={'message': 'Processing slot acquired, starting generation...'})
                         # Start background task directly (not in separate thread to avoid nesting)
                     'X-Content-Type-Options': 'nosniff',
                     'Access-Control-Allow-Origin': '*',
                     'Access-Control-Allow-Headers': 'Cache-Control, Connection',
+                    'Keep-Alive': 'timeout=31200',
                     # Force HTTP/1.1 to avoid HTTP/2 protocol errors
                     'X-Protocol': 'HTTP/1.1'
                 }

services/ai-service/src/ai_med_extract/app.py CHANGED Viewed

@@ -764,7 +764,7 @@ def run_dev(host: str = "0.0.0.0", port: int = 7860, debug: bool = False):
     # Initialize agents in dev run (preload small models)
     initialize_agents(app, preload_small_models=True)
     print("Agents initialized, starting uvicorn")
-    uvicorn.run(app, host=host, port=port, reload=debug, timeout_keep_alive=600)
 if __name__ == "__main__":

     # Initialize agents in dev run (preload small models)
     initialize_agents(app, preload_small_models=True)
     print("Agents initialized, starting uvicorn")
+    uvicorn.run(app, host=host, port=port, reload=debug, timeout_keep_alive=1200)
 if __name__ == "__main__":

services/ai-service/src/ai_med_extract/config/performance_config.py CHANGED Viewed

@@ -19,7 +19,7 @@ class PerformanceConfig:
     # Caching
     enable_caching: bool = True
-    cache_ttl_seconds: int = 3600
     max_cache_size: int = 1000
     enable_multi_level_cache: bool = True
@@ -65,7 +65,7 @@ class PerformanceConfig:
             # Caching
             enable_caching=os.environ.get('ENABLE_CACHING', 'true').lower() == 'true',
-            cache_ttl_seconds=int(os.environ.get('CACHE_TTL_SECONDS', '3600')),
             max_cache_size=int(os.environ.get('MAX_CACHE_SIZE', '1000')),
             enable_multi_level_cache=os.environ.get('ENABLE_MULTI_LEVEL_CACHE', 'true').lower() == 'true',

     # Caching
     enable_caching: bool = True
+    cache_ttl_seconds: int = 31200
     max_cache_size: int = 1000
     enable_multi_level_cache: bool = True
             # Caching
             enable_caching=os.environ.get('ENABLE_CACHING', 'true').lower() == 'true',
+            cache_ttl_seconds=int(os.environ.get('CACHE_TTL_SECONDS', '31200')),
             max_cache_size=int(os.environ.get('MAX_CACHE_SIZE', '1000')),
             enable_multi_level_cache=os.environ.get('ENABLE_MULTI_LEVEL_CACHE', 'true').lower() == 'true',

services/ai-service/src/ai_med_extract/enable_optimizations.py CHANGED Viewed

@@ -24,7 +24,7 @@ def enable_all_optimizations():
         # Caching
         'ENABLE_CACHING': 'true',
-        'CACHE_TTL_SECONDS': '3600',
         'MAX_CACHE_SIZE': '1000',
         'ENABLE_MULTI_LEVEL_CACHE': 'true',
@@ -85,7 +85,7 @@ def get_optimization_status() -> Dict[str, Any]:
         },
         "caching_optimizations": {
             "enabled": os.environ.get('ENABLE_CACHING', 'true'),
-            "ttl_seconds": os.environ.get('CACHE_TTL_SECONDS', '3600'),
             "max_size": os.environ.get('MAX_CACHE_SIZE', '1000'),
         },
         "async_optimizations": {

         # Caching
         'ENABLE_CACHING': 'true',
+        'CACHE_TTL_SECONDS': '31200',
         'MAX_CACHE_SIZE': '1000',
         'ENABLE_MULTI_LEVEL_CACHE': 'true',
         },
         "caching_optimizations": {
             "enabled": os.environ.get('ENABLE_CACHING', 'true'),
+            "ttl_seconds": os.environ.get('CACHE_TTL_SECONDS', '31200'),
             "max_size": os.environ.get('MAX_CACHE_SIZE', '1000'),
         },
         "async_optimizations": {

services/ai-service/src/ai_med_extract/inference_service.py CHANGED Viewed

@@ -140,7 +140,7 @@ class InferenceService:
         loop = asyncio.get_event_loop()
         # Optimize chunk size based on text length
-        chunk_size = 8000 if len(text) > 16000 else 12000
         if len(text) > chunk_size:
             chunks = self._split_chunks(text, chunk_size)

         loop = asyncio.get_event_loop()
         # Optimize chunk size based on text length
+        chunk_size = 8000 if len(text) > 112000 else 12000
         if len(text) > chunk_size:
             chunks = self._split_chunks(text, chunk_size)

services/ai-service/src/ai_med_extract/phi_scrubber_service.py CHANGED Viewed

@@ -60,7 +60,7 @@ class PHIScrubberService:
             r = redis.from_url(settings.REDIS_URL, decode_responses=True)
             await r.hincrby(key, "events", 1)
             await r.hincrby(key, "found", len(m))
-            await r.expire(key, 7*24*3600)
         except Exception:
             pass
         return {"original_length": len(text), "scrubbed_length": len(scrubbed), "total_phi_found": len(m), "phi_types": phi_types, "scrubbed_text": scrubbed}

             r = redis.from_url(settings.REDIS_URL, decode_responses=True)
             await r.hincrby(key, "events", 1)
             await r.hincrby(key, "found", len(m))
+            await r.expire(key, 7*24*31200)
         except Exception:
             pass
         return {"original_length": len(text), "scrubbed_length": len(scrubbed), "total_phi_found": len(m), "phi_types": phi_types, "scrubbed_text": scrubbed}

services/ai-service/src/ai_med_extract/services/job_manager.py CHANGED Viewed

@@ -29,7 +29,7 @@ class JobManager:
         """Initialize the job manager with in-memory storage."""
         self._jobs: Dict[str, Dict[str, Any]] = {}
         self._lock = threading.RLock()  # Reentrant lock for nested calls
-        self._cleanup_interval = 3600  # 1 hour
         self._max_job_age = 7200  # 2 hours
     def create_job(self, request_id: Optional[str] = None, initial_data: Optional[Dict] = None) -> str:

         """Initialize the job manager with in-memory storage."""
         self._jobs: Dict[str, Dict[str, Any]] = {}
         self._lock = threading.RLock()  # Reentrant lock for nested calls
+        self._cleanup_interval = 31200  # 1 hour
         self._max_job_age = 7200  # 2 hours
     def create_job(self, request_id: Optional[str] = None, initial_data: Optional[Dict] = None) -> str:

services/ai-service/src/ai_med_extract/services/request_queue.py CHANGED Viewed

@@ -229,7 +229,7 @@ class RequestQueueManager:
                 ]
             }
-    def cleanup_old_requests(self, max_age: int = 3600) -> int:
         """
         Clean up old requests from tracking.
@@ -289,7 +289,7 @@ def get_queue_manager() -> RequestQueueManager:
             _queue_manager = RequestQueueManager(
                 max_concurrent=6,
                 max_queue_size=6,
-                queue_timeout=600
             )
             logger.info("Initialized RequestQueueManager for Hugging Face Spaces (T4 medium)")
         else:
@@ -297,7 +297,7 @@ def get_queue_manager() -> RequestQueueManager:
             _queue_manager = RequestQueueManager(
                 max_concurrent=4,
                 max_queue_size=20,
-                queue_timeout=600
             )
             logger.info("Initialized RequestQueueManager for local/development")

                 ]
             }
+    def cleanup_old_requests(self, max_age: int = 31200) -> int:
         """
         Clean up old requests from tracking.
             _queue_manager = RequestQueueManager(
                 max_concurrent=6,
                 max_queue_size=6,
+                queue_timeout=1200
             )
             logger.info("Initialized RequestQueueManager for Hugging Face Spaces (T4 medium)")
         else:
             _queue_manager = RequestQueueManager(
                 max_concurrent=4,
                 max_queue_size=20,
+                queue_timeout=1200
             )
             logger.info("Initialized RequestQueueManager for local/development")

services/ai-service/src/ai_med_extract/utils/constants.py CHANGED Viewed

@@ -24,39 +24,39 @@ CHUNK_SIZE_DAYS = 90                # Days per chunk for date-based chunking
 # ========== TIMEOUT CONFIGURATION ==========
 TIMEOUT_CONFIG = {
     "fast": {
-        "ehr_timeout": 600,
-        "generation_timeout": 600,
-        "gguf_timeout": 600,
-        "gguf_extended_timeout": 600,
         "retry_attempts": 2
     },
     "normal": {
-        "ehr_timeout": 600,
-        "generation_timeout": 600,
-        "gguf_timeout": 600,
-        "gguf_extended_timeout": 600,
         "retry_attempts": 3
     },
     "extended": {
-        "ehr_timeout": 600,
-        "generation_timeout": 600,
-        "gguf_timeout": 600,
-        "gguf_extended_timeout": 600,
         "retry_attempts": 3
     },
     "large_data": {
-        "ehr_timeout": 600,
-        "generation_timeout": 600,
-        "gguf_timeout": 600,
-        "gguf_extended_timeout": 600,
         "retry_attempts": 2
     }
 }
 # ========== SSE STREAMING CONFIGURATION ==========
 SSE_CONFIG = {
-    "max_wait_time": 3600,              # 60 minutes max wait time for normal operations
-    "extended_max_wait_time": 3600,     # 60 minutes extended wait for GGUF/long operations
     "heartbeat_interval": 5,            # Send heartbeat every 5 seconds
     "normal_heartbeat_interval": 10,    # Normal heartbeat interval
     "poll_interval": 1,                 # Check job status every second
@@ -65,7 +65,7 @@ SSE_CONFIG = {
 # ========== CACHE CONFIGURATION ==========
 CACHE_CONFIG = {
-    "ttl_seconds": 3600,  # 1 hour
     "cache_dir": "/tmp/summary_cache",
     "max_cache_size": 100
 }
@@ -89,7 +89,7 @@ MEMORY_CONFIG = {
     "enable_quantization": True,
     "cache_models": True,
     "cleanup_interval": 300,  # 5 minutes
-    "max_memory_mb": 6000,
     "memory_pressure_threshold": 0.8,
     "aggressive_cleanup_threshold": 0.9
 }

 # ========== TIMEOUT CONFIGURATION ==========
 TIMEOUT_CONFIG = {
     "fast": {
+        "ehr_timeout": 1200,
+        "generation_timeout": 1200,
+        "gguf_timeout": 1200,
+        "gguf_extended_timeout": 1200,
         "retry_attempts": 2
     },
     "normal": {
+        "ehr_timeout": 1200,
+        "generation_timeout": 1200,
+        "gguf_timeout": 1200,
+        "gguf_extended_timeout": 1200,
         "retry_attempts": 3
     },
     "extended": {
+        "ehr_timeout": 1200,
+        "generation_timeout": 1200,
+        "gguf_timeout": 1200,
+        "gguf_extended_timeout": 1200,
         "retry_attempts": 3
     },
     "large_data": {
+        "ehr_timeout": 1200,
+        "generation_timeout": 1200,
+        "gguf_timeout": 1200,
+        "gguf_extended_timeout": 1200,
         "retry_attempts": 2
     }
 }
 # ========== SSE STREAMING CONFIGURATION ==========
 SSE_CONFIG = {
+    "max_wait_time": 31200,              # 60 minutes max wait time for normal operations
+    "extended_max_wait_time": 31200,     # 60 minutes extended wait for GGUF/long operations
     "heartbeat_interval": 5,            # Send heartbeat every 5 seconds
     "normal_heartbeat_interval": 10,    # Normal heartbeat interval
     "poll_interval": 1,                 # Check job status every second
 # ========== CACHE CONFIGURATION ==========
 CACHE_CONFIG = {
+    "ttl_seconds": 31200,  # 1 hour
     "cache_dir": "/tmp/summary_cache",
     "max_cache_size": 100
 }
     "enable_quantization": True,
     "cache_models": True,
     "cleanup_interval": 300,  # 5 minutes
+    "max_memory_mb": 12000,
     "memory_pressure_threshold": 0.8,
     "aggressive_cleanup_threshold": 0.9
 }

services/ai-service/src/ai_med_extract/utils/hf_spaces_config.py CHANGED Viewed

@@ -65,7 +65,7 @@ TIMEOUT_SETTINGS = {
     "model_loading_timeout": 300,  # 5 minutes for model loading
     "inference_timeout": 120,  # 2 minutes for inference
     "ehr_fetch_timeout": 30,  # 30 seconds for EHR fetch
-    "streaming_timeout": 600  # 10 minutes for streaming responses
 }
 def get_optimized_model(model_type: str) -> str:

     "model_loading_timeout": 300,  # 5 minutes for model loading
     "inference_timeout": 120,  # 2 minutes for inference
     "ehr_fetch_timeout": 30,  # 30 seconds for EHR fetch
+    "streaming_timeout": 1200  # 10 minutes for streaming responses
 }
 def get_optimized_model(model_type: str) -> str:

services/ai-service/src/ai_med_extract/utils/openvino_summarizer_utils.py CHANGED Viewed

@@ -238,7 +238,7 @@ def delta_to_text(delta):
 from concurrent.futures import ThreadPoolExecutor, as_completed
 import threading
-def generate_section(pipeline, prompt, section_name, timeout=600):
     """Generate one section with timeout protection."""
     try:
         # If your pipeline supports timeout, pass it. Otherwise, wrap in future.

 from concurrent.futures import ThreadPoolExecutor, as_completed
 import threading
+def generate_section(pipeline, prompt, section_name, timeout=1200):
     """Generate one section with timeout protection."""
     try:
         # If your pipeline supports timeout, pass it. Otherwise, wrap in future.

services/ai-service/src/ai_med_extract/utils/performance_monitor.py CHANGED Viewed

@@ -76,7 +76,7 @@ class PerformanceMonitor:
 class RobustParsingCache:
     """Intelligent caching system for robust JSON parsing operations."""
-    def __init__(self, cache_dir: str = "/tmp/medical_ai_cache", ttl: int = 3600):
         self.cache_dir = cache_dir
         self.ttl = ttl  # Time to live in seconds
         os.makedirs(cache_dir, exist_ok=True)

 class RobustParsingCache:
     """Intelligent caching system for robust JSON parsing operations."""
+    def __init__(self, cache_dir: str = "/tmp/medical_ai_cache", ttl: int = 31200):
         self.cache_dir = cache_dir
         self.ttl = ttl  # Time to live in seconds
         os.makedirs(cache_dir, exist_ok=True)

services/ai-service/src/ai_med_extract/utils/unified_model_manager.py CHANGED Viewed

@@ -499,7 +499,7 @@ class UnifiedModelManager:
         for key, model in self._models.items():
             # Remove models not used in last hour
-            if current_time - model._last_used > 3600:
                 to_remove.append(key)
         for key in to_remove:

         for key, model in self._models.items():
             # Remove models not used in last hour
+            if current_time - model._last_used > 31200:
                 to_remove.append(key)
         for key in to_remove: