===== Application Startup at 2025-10-09 06:25:23 ===== [ENTRYPOINT] Clearing Hugging Face / Torch / tmp cache... chmod: changing permissions of '/tmp': Operation not permitted 2025-10-09 06:27:43,466 - INFO - Added src_dir to Python path: /app/services/ai-service/src 2025-10-09 06:27:43,466 - INFO - src_dir exists: True 2025-10-09 06:27:43,466 - INFO - Contents of src_dir: ['__main__.py', 'agents', 'ai_med_extract', 'app.py', 'config_settings.py', 'gradio_app.py', 'routes', 'wsgi.py'] 2025-10-09 06:27:43,466 - INFO - ai_med_extract_path exists: True 2025-10-09 06:27:43,466 - INFO - Contents of ai_med_extract: ['__init__.py', 'agents', 'api', 'api_endpoints.py', 'api_middleware.py', 'app.py', 'core_exceptions.py', 'core_logger.py', 'core_security.py', 'database_audit.py', 'gradio_app.py', 'health_endpoints.py', 'inference_service.py', 'main.py', 'metrics_adapter.py', 'monitoring_observability.py', 'phi_scrubber_service.py', 'scalable_service_mesh.py', 'utils'] 2025-10-09 06:27:43,466 - INFO - Detected Hugging Face Spaces environment 2025-10-09 06:27:43,466 - INFO - Attempting to import from ai_med_extract package... 2025-10-09 06:27:43,466 - INFO - Python path: ['/app/services/ai-service/src', '', '/opt/venv/bin'] 2025-10-09 06:27:43,466 - INFO - Current working directory: /app 2025-10-09 06:27:43,467 - INFO - Files in current directory: ['README.md', '__init__.py', 'app.py', 'requirements.txt', 'services'] libgomp: Invalid value for environment variable OMP_NUM_THREADS libgomp: Invalid value for environment variable OMP_NUM_THREADS 2025-10-09 06:27:48,840 - INFO - Model manager imported successfully 2025-10-09 06:27:49,115 - INFO - Model manager imported successfully in initialize_agents 2025-10-09 06:27:49,139 - INFO - PatientSummarizerAgent created for Falconsai/medical_summarization (summarization) on cuda (loader deferred) 2025-10-09 06:27:49,239 - WARNING - ONNX Runtime not available: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Invalid argument [GGUF] Preloading GGUF models as requested by settings... [GGUF] Preload failed (non-fatal): Model path does not exist: microsoft/Phi-3-mini-4k-instruct-gguf 2025-10-09 06:27:49,285 - INFO - Main router registered with app 2025-10-09 06:27:49,290 - INFO - Redis URL not configured, PHI scrubbing will use fallback mode 2025-10-09 06:27:49,298 - INFO - API router registered with app 2025-10-09 06:27:49,305 - INFO - Model management router registered with app 2025-10-09 06:27:49,305 - INFO - All routes registered successfully 2025-10-09 06:27:49,314 - INFO - Agents initialized and routes registered 2025-10-09 06:27:49,314 - INFO - ============================================================ 2025-10-09 06:27:49,314 - INFO - REGISTERED ROUTES: 2025-10-09 06:27:49,314 - INFO - ['HEAD', 'GET'] /openapi.json 2025-10-09 06:27:49,314 - INFO - ['HEAD', 'GET'] /docs 2025-10-09 06:27:49,314 - INFO - ['HEAD', 'GET'] /docs/oauth2-redirect 2025-10-09 06:27:49,314 - INFO - ['HEAD', 'GET'] /redoc 2025-10-09 06:27:49,314 - INFO - ['GET'] /live 2025-10-09 06:27:49,314 - INFO - ['GET'] /ready 2025-10-09 06:27:49,314 - INFO - ['GET'] /metrics 2025-10-09 06:27:49,314 - INFO - ['GET'] / 2025-10-09 06:27:49,314 - INFO - ['GET'] /api/info 2025-10-09 06:27:49,314 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:27:49,314 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:27:49,315 - INFO - ['POST'] /summarize 2025-10-09 06:27:49,315 - INFO - ['POST'] /upload 2025-10-09 06:27:49,315 - INFO - ['POST'] /phi/scrub 2025-10-09 06:27:49,315 - INFO - ['POST'] /api/models/load 2025-10-09 06:27:49,315 - INFO - ['POST'] /api/models/generate 2025-10-09 06:27:49,315 - INFO - ['GET'] /api/models/info 2025-10-09 06:27:49,315 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:27:49,315 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:27:49,315 - INFO - ['POST'] /api/models/switch 2025-10-09 06:27:49,315 - INFO - ['GET'] /api/models/health 2025-10-09 06:27:49,315 - INFO - ['GET'] /debug/routes 2025-10-09 06:27:49,315 - INFO - ['GET'] /health/live 2025-10-09 06:27:49,315 - INFO - ['GET'] /health/ready 2025-10-09 06:27:49,315 - INFO - ============================================================ 2025-10-09 06:27:49,315 - INFO - Agents initialized successfully 2025-10-09 06:27:49,315 - INFO - Model manager imported successfully in initialize_agents 2025-10-09 06:27:49,315 - INFO - PatientSummarizerAgent created for Falconsai/medical_summarization (summarization) on cuda (loader deferred) [GGUF] Preloading GGUF models as requested by settings... [GGUF] Preload failed (non-fatal): Model path does not exist: microsoft/Phi-3-mini-4k-instruct-gguf 2025-10-09 06:27:49,324 - INFO - Main router registered with app 2025-10-09 06:27:49,327 - INFO - API router registered with app 2025-10-09 06:27:49,329 - INFO - Model management router registered with app 2025-10-09 06:27:49,329 - INFO - All routes registered successfully 2025-10-09 06:27:49,338 - INFO - Agents initialized and routes registered 2025-10-09 06:27:49,338 - INFO - ============================================================ 2025-10-09 06:27:49,338 - INFO - REGISTERED ROUTES: 2025-10-09 06:27:49,338 - INFO - ['HEAD', 'GET'] /openapi.json 2025-10-09 06:27:49,338 - INFO - ['HEAD', 'GET'] /docs 2025-10-09 06:27:49,338 - INFO - ['HEAD', 'GET'] /docs/oauth2-redirect 2025-10-09 06:27:49,338 - INFO - ['HEAD', 'GET'] /redoc 2025-10-09 06:27:49,338 - INFO - ['GET'] /live 2025-10-09 06:27:49,338 - INFO - ['GET'] /ready 2025-10-09 06:27:49,338 - INFO - ['GET'] /metrics [GGUF] Preloading GGUF models as requested by settings... [GGUF] Preload failed (non-fatal): Model path does not exist: microsoft/Phi-3-mini-4k-instruct-gguf 2025-10-09 06:27:49,338 - INFO - ['GET'] / 2025-10-09 06:27:49,338 - INFO - ['GET'] /api/info 2025-10-09 06:27:49,338 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:27:49,338 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:27:49,338 - INFO - ['POST'] /upload 2025-10-09 06:27:49,338 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:27:49,338 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:27:49,338 - INFO - ['POST'] /transcribe 2025-10-09 06:27:49,338 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:27:49,338 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:27:49,338 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:27:49,338 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:27:49,338 - INFO - ['POST'] /summarize 2025-10-09 06:27:49,338 - INFO - ['POST'] /upload 2025-10-09 06:27:49,338 - INFO - ['POST'] /phi/scrub 2025-10-09 06:27:49,338 - INFO - ['POST'] /api/models/load 2025-10-09 06:27:49,338 - INFO - ['POST'] /api/models/generate 2025-10-09 06:27:49,338 - INFO - ['GET'] /api/models/info 2025-10-09 06:27:49,338 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:27:49,339 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:27:49,339 - INFO - ['POST'] /api/models/switch 2025-10-09 06:27:49,339 - INFO - ['GET'] /api/models/health 2025-10-09 06:27:49,339 - INFO - ['GET'] /debug/routes 2025-10-09 06:27:49,339 - INFO - ['GET'] /health/live 2025-10-09 06:27:49,339 - INFO - ['GET'] /health/ready 2025-10-09 06:27:49,339 - INFO - ============================================================ 2025-10-09 06:27:49,339 - INFO - Agents initialized successfully 2025-10-09 06:27:49,339 - INFO - Successfully imported create_app and initialize_agents 2025-10-09 06:27:49,339 - INFO - App instance created successfully (without agents) 2025-10-09 06:27:49,339 - INFO - App title: Medical AI Service 2025-10-09 06:27:49,339 - INFO - App version: 1.0.0 2025-10-09 06:27:49,339 - INFO - Initializing agents with preload_small_models=False... 2025-10-09 06:27:49,339 - INFO - Model manager imported successfully in initialize_agents 2025-10-09 06:27:49,339 - INFO - PatientSummarizerAgent created for Falconsai/medical_summarization (summarization) on cuda (loader deferred) 2025-10-09 06:27:49,354 - INFO - Main router registered with app 2025-10-09 06:27:49,358 - INFO - API router registered with app 2025-10-09 06:27:49,360 - INFO - Model management router registered with app 2025-10-09 06:27:49,360 - INFO - All routes registered successfully 2025-10-09 06:27:49,369 - INFO - Agents initialized and routes registered 2025-10-09 06:27:49,369 - INFO - ============================================================ 2025-10-09 06:27:49,369 - INFO - REGISTERED ROUTES: 2025-10-09 06:27:49,369 - INFO - ['HEAD', 'GET'] /openapi.json 2025-10-09 06:27:49,369 - INFO - ['HEAD', 'GET'] /docs 2025-10-09 06:27:49,369 - INFO - ['HEAD', 'GET'] /docs/oauth2-redirect 2025-10-09 06:27:49,369 - INFO - ['HEAD', 'GET'] /redoc 2025-10-09 06:27:49,369 - INFO - ['GET'] /live 2025-10-09 06:27:49,369 - INFO - ['GET'] /ready 2025-10-09 06:27:49,369 - INFO - ['GET'] /metrics 2025-10-09 06:27:49,369 - INFO - ['GET'] / 2025-10-09 06:27:49,369 - INFO - ['GET'] /api/info 2025-10-09 06:27:49,369 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:27:49,369 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:27:49,369 - INFO - ['POST'] /upload 2025-10-09 06:27:49,369 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:27:49,369 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:27:49,369 - INFO - ['POST'] /transcribe 2025-10-09 06:27:49,369 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:27:49,369 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:27:49,369 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:27:49,370 - INFO - ['POST'] /upload 2025-10-09 06:27:49,370 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:27:49,370 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:27:49,370 - INFO - ['POST'] /transcribe 2025-10-09 06:27:49,370 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:27:49,370 - INFO - ['POST'] /summarize 2025-10-09 06:27:49,370 - INFO - ['POST'] /upload 2025-10-09 06:27:49,370 - INFO - ['POST'] /phi/scrub 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/models/load 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/models/generate 2025-10-09 06:27:49,370 - INFO - ['GET'] /api/models/info 2025-10-09 06:27:49,370 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:27:49,370 - INFO - ['POST'] /api/models/switch 2025-10-09 06:27:49,370 - INFO - ['GET'] /api/models/health 2025-10-09 06:27:49,370 - INFO - ['GET'] /debug/routes 2025-10-09 06:27:49,370 - INFO - ['GET'] /health/live 2025-10-09 06:27:49,370 - INFO - ['GET'] /health/ready 2025-10-09 06:27:49,370 - INFO - ============================================================ 2025-10-09 06:27:49,371 - INFO - Agents initialized successfully 2025-10-09 06:27:49,371 - INFO - ============================================================ 2025-10-09 06:27:49,371 - INFO - FINAL REGISTERED ROUTES ON HF SPACES: 2025-10-09 06:27:49,371 - INFO - ['HEAD', 'GET'] /openapi.json 2025-10-09 06:27:49,371 - INFO - ['HEAD', 'GET'] /docs 2025-10-09 06:27:49,371 - INFO - ['HEAD', 'GET'] /docs/oauth2-redirect 2025-10-09 06:27:49,371 - INFO - ['HEAD', 'GET'] /redoc 2025-10-09 06:27:49,371 - INFO - ['GET'] /live 2025-10-09 06:27:49,371 - INFO - ['GET'] /ready 2025-10-09 06:27:49,371 - INFO - ['GET'] /metrics 2025-10-09 06:27:49,371 - INFO - ['GET'] / 2025-10-09 06:27:49,371 - INFO - ['GET'] /api/info 2025-10-09 06:27:49,371 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:27:49,371 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:27:49,371 - INFO - ['POST'] /upload 2025-10-09 06:27:49,371 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:27:49,371 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:27:49,371 - INFO - ['POST'] /transcribe 2025-10-09 06:27:49,371 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:27:49,371 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:27:49,372 - INFO - ['POST'] /upload 2025-10-09 06:27:49,372 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:27:49,372 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:27:49,372 - INFO - ['POST'] /transcribe 2025-10-09 06:27:49,372 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:27:49,372 - INFO - ['POST'] /summarize 2025-10-09 06:27:49,372 - INFO - ['POST'] /upload 2025-10-09 06:27:49,372 - INFO - ['POST'] /phi/scrub 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/models/load 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/models/generate 2025-10-09 06:27:49,372 - INFO - ['GET'] /api/models/info 2025-10-09 06:27:49,372 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:27:49,372 - INFO - ['POST'] /api/models/switch 2025-10-09 06:27:49,372 - INFO - ['GET'] /api/models/health 2025-10-09 06:27:49,372 - INFO - ['GET'] /debug/routes 2025-10-09 06:27:49,372 - INFO - ['GET'] /health/live 2025-10-09 06:27:49,372 - INFO - ['GET'] /health/ready 2025-10-09 06:27:49,373 - INFO - Total routes registered: 40 2025-10-09 06:27:49,373 - INFO - ============================================================ INFO: Started server process [1] INFO: Waiting for application startup. 2025-10-09 06:27:49,373 - INFO - Detected Hugging Face Spaces environment - disabling Redis and Database connections 2025-10-09 06:27:49,373 - INFO - Skipping Redis initialization on HF Spaces 2025-10-09 06:27:49,373 - INFO - Skipping Database initialization on HF Spaces 2025-10-09 06:27:49,373 - INFO - Scalable service mesh not initialized (Redis not available) 2025-10-09 06:27:49,374 - INFO - Monitoring not initialized (Redis not available) 2025-10-09 06:27:49,374 - INFO - Application started without scalable features (Redis not available) 2025-10-09 06:27:49,374 - INFO - Starting FastAPI application with scalable architecture 2025-10-09 06:27:49,374 - INFO - Python version: 3.10.18 (main, Sep 30 2025, 00:42:07) [GCC 14.2.0] 2025-10-09 06:27:49,374 - INFO - PyTorch version: 2.3.0+cu121 2025-10-09 06:27:49,374 - INFO - CUDA available: True 2025-10-09 06:27:49,374 - INFO - CUDA version: 12.1 2025-10-09 06:27:49,374 - INFO - GPU count: 1 2025-10-09 06:27:49,374 - INFO - GPU 0: Tesla T4 INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit) INFO: 10.16.38.247:4107 - "GET / HTTP/1.1" 200 OK INFO: 10.16.23.21:26164 - "GET / HTTP/1.1" 200 OK INFO: 10.16.14.117:47211 - "GET / HTTP/1.1" 200 OK INFO: 10.16.23.21:14054 - "GET / HTTP/1.1" 200 OK INFO: 10.16.38.247:57416 - "GET / HTTP/1.1" 200 OK Background task started for job_id: 9b064426-5331-47a9-9407-224dc2aba308INFO: 10.16.23.21:18245 - "POST /generate_patient_summary?stream=true HTTP/1.1" 200 OK 2025-10-09 06:28:56,222 - INFO - EHR API response status: 200 Step 1 - EHR fetch took 0.82s Step 2 - Processing took 0.00s Step 3 - Computing baseline took 0.00s 🧠 GGUF MODE: Single-prompt generation for microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf 📦 Using cache key: repo_id='microsoft/Phi-3-mini-4k-instruct-gguf', filename='Phi-3-mini-4k-instruct-q4.gguf' 🔄 Loading new GGUF pipeline for ('microsoft/Phi-3-mini-4k-instruct-gguf', 'Phi-3-mini-4k-instruct-q4.gguf') 2025-10-09 06:28:56,225 - INFO - Downloading model from microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf /opt/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:945: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( 2025-10-09 06:29:11,970 - INFO - Model downloaded successfully to /tmp/huggingface/models--microsoft--Phi-3-mini-4k-instruct-gguf/snapshots/999f761fe19e26cf1a339a5ec5f9f201301cbb83/Phi-3-mini-4k-instruct-q4.gguf 2025-10-09 06:29:11,970 - INFO - Model file size: 2282.36 MB 2025-10-09 06:29:12,821 - INFO - [GGUF] Model initialized in 0.85s from /tmp/huggingface/models--microsoft--Phi-3-mini-4k-instruct-gguf/snapshots/999f761fe19e26cf1a339a5ec5f9f201301cbb83/Phi-3-mini-4k-instruct-q4.gguf (threads=4, batch=64) [GGUF] Model loaded successfully in 16.60s: microsoft/Phi-3-mini-4k-instruct-gguf [ENTRYPOINT] Clearing Hugging Face / Torch / tmp cache... chmod: changing permissions of '/tmp': Operation not permitted 2025-10-09 06:34:15,501 - INFO - Added src_dir to Python path: /app/services/ai-service/src 2025-10-09 06:34:15,501 - INFO - src_dir exists: True 2025-10-09 06:34:15,501 - INFO - Contents of src_dir: ['__main__.py', 'agents', 'ai_med_extract', 'app.py', 'config_settings.py', 'gradio_app.py', 'routes', 'wsgi.py'] 2025-10-09 06:34:15,501 - INFO - ai_med_extract_path exists: True 2025-10-09 06:34:15,502 - INFO - Contents of ai_med_extract: ['__init__.py', 'agents', 'api', 'api_endpoints.py', 'api_middleware.py', 'app.py', 'core_exceptions.py', 'core_logger.py', 'core_security.py', 'database_audit.py', 'gradio_app.py', 'health_endpoints.py', 'inference_service.py', 'main.py', 'metrics_adapter.py', 'monitoring_observability.py', 'phi_scrubber_service.py', 'scalable_service_mesh.py', 'utils'] 2025-10-09 06:34:15,502 - INFO - Detected Hugging Face Spaces environment 2025-10-09 06:34:15,502 - INFO - Attempting to import from ai_med_extract package... 2025-10-09 06:34:15,502 - INFO - Python path: ['/app/services/ai-service/src', '', '/opt/venv/bin'] 2025-10-09 06:34:15,502 - INFO - Current working directory: /app 2025-10-09 06:34:15,502 - INFO - Files in current directory: ['README.md', '__init__.py', 'app.py', 'requirements.txt', 'services'] libgomp: Invalid value for environment variable OMP_NUM_THREADS libgomp: Invalid value for environment variable OMP_NUM_THREADS 2025-10-09 06:34:20,990 - INFO - Model manager imported successfully 2025-10-09 06:34:21,303 - INFO - Model manager imported successfully in initialize_agents 2025-10-09 06:34:21,330 - INFO - PatientSummarizerAgent created for Falconsai/medical_summarization (summarization) on cuda (loader deferred) 2025-10-09 06:34:21,430 - WARNING - ONNX Runtime not available: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Invalid argument [GGUF] Preloading GGUF models as requested by settings... [GGUF] Preload failed (non-fatal): Model path does not exist: microsoft/Phi-3-mini-4k-instruct-gguf 2025-10-09 06:34:21,476 - INFO - Main router registered with app 2025-10-09 06:34:21,483 - INFO - Redis URL not configured, PHI scrubbing will use fallback mode 2025-10-09 06:34:21,491 - INFO - API router registered with app 2025-10-09 06:34:21,498 - INFO - Model management router registered with app 2025-10-09 06:34:21,498 - INFO - All routes registered successfully 2025-10-09 06:34:21,507 - INFO - Agents initialized and routes registered 2025-10-09 06:34:21,507 - INFO - ============================================================ 2025-10-09 06:34:21,507 - INFO - REGISTERED ROUTES: 2025-10-09 06:34:21,507 - INFO - ['GET', 'HEAD'] /openapi.json 2025-10-09 06:34:21,507 - INFO - ['GET', 'HEAD'] /docs 2025-10-09 06:34:21,507 - INFO - ['GET', 'HEAD'] /docs/oauth2-redirect 2025-10-09 06:34:21,507 - INFO - ['GET', 'HEAD'] /redoc 2025-10-09 06:34:21,507 - INFO - ['GET'] /live 2025-10-09 06:34:21,507 - INFO - ['GET'] /ready 2025-10-09 06:34:21,507 - INFO - ['GET'] /metrics 2025-10-09 06:34:21,507 - INFO - ['GET'] / 2025-10-09 06:34:21,507 - INFO - ['GET'] /api/info 2025-10-09 06:34:21,507 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:34:21,507 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:34:21,507 - INFO - ['POST'] /summarize 2025-10-09 06:34:21,507 - INFO - ['POST'] /upload [GGUF] Preloading GGUF models as requested by settings... [GGUF] Preload failed (non-fatal): Model path does not exist: microsoft/Phi-3-mini-4k-instruct-gguf 2025-10-09 06:34:21,507 - INFO - ['POST'] /phi/scrub 2025-10-09 06:34:21,507 - INFO - ['POST'] /api/models/load 2025-10-09 06:34:21,508 - INFO - ['POST'] /api/models/generate 2025-10-09 06:34:21,508 - INFO - ['GET'] /api/models/info 2025-10-09 06:34:21,508 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:34:21,508 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:34:21,508 - INFO - ['POST'] /api/models/switch 2025-10-09 06:34:21,508 - INFO - ['GET'] /api/models/health 2025-10-09 06:34:21,508 - INFO - ['GET'] /debug/routes 2025-10-09 06:34:21,508 - INFO - ['GET'] /health/live 2025-10-09 06:34:21,508 - INFO - ['GET'] /health/ready 2025-10-09 06:34:21,508 - INFO - ============================================================ 2025-10-09 06:34:21,508 - INFO - Agents initialized successfully 2025-10-09 06:34:21,508 - INFO - Model manager imported successfully in initialize_agents 2025-10-09 06:34:21,508 - INFO - PatientSummarizerAgent created for Falconsai/medical_summarization (summarization) on cuda (loader deferred) 2025-10-09 06:34:21,516 - INFO - Main router registered with app 2025-10-09 06:34:21,519 - INFO - API router registered with app 2025-10-09 06:34:21,521 - INFO - Model management router registered with app 2025-10-09 06:34:21,521 - INFO - All routes registered successfully 2025-10-09 06:34:21,529 - INFO - Agents initialized and routes registered 2025-10-09 06:34:21,529 - INFO - ============================================================ 2025-10-09 06:34:21,529 - INFO - REGISTERED ROUTES: 2025-10-09 06:34:21,529 - INFO - ['GET', 'HEAD'] /openapi.json 2025-10-09 06:34:21,529 - INFO - ['GET', 'HEAD'] /docs 2025-10-09 06:34:21,530 - INFO - ['GET', 'HEAD'] /docs/oauth2-redirect 2025-10-09 06:34:21,530 - INFO - ['GET', 'HEAD'] /redoc 2025-10-09 06:34:21,530 - INFO - ['GET'] /live 2025-10-09 06:34:21,530 - INFO - ['GET'] /ready 2025-10-09 06:34:21,530 - INFO - ['GET'] /metrics 2025-10-09 06:34:21,530 - INFO - ['GET'] / 2025-10-09 06:34:21,530 - INFO - ['GET'] /api/info 2025-10-09 06:34:21,530 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:34:21,530 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:34:21,530 - INFO - ['POST'] /upload 2025-10-09 06:34:21,530 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:34:21,530 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:34:21,530 - INFO - ['POST'] /transcribe 2025-10-09 06:34:21,530 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:34:21,530 - INFO - ['POST'] /api/generate_summary [GGUF] Preloading GGUF models as requested by settings... [GGUF] Preload failed (non-fatal): Model path does not exist: microsoft/Phi-3-mini-4k-instruct-gguf 2025-10-09 06:34:21,530 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:34:21,530 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:34:21,530 - INFO - ['POST'] /summarize 2025-10-09 06:34:21,530 - INFO - ['POST'] /upload 2025-10-09 06:34:21,530 - INFO - ['POST'] /phi/scrub 2025-10-09 06:34:21,530 - INFO - ['POST'] /api/models/load 2025-10-09 06:34:21,530 - INFO - ['POST'] /api/models/generate 2025-10-09 06:34:21,530 - INFO - ['GET'] /api/models/info 2025-10-09 06:34:21,530 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:34:21,531 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:34:21,531 - INFO - ['POST'] /api/models/switch 2025-10-09 06:34:21,531 - INFO - ['GET'] /api/models/health 2025-10-09 06:34:21,531 - INFO - ['GET'] /debug/routes 2025-10-09 06:34:21,531 - INFO - ['GET'] /health/live 2025-10-09 06:34:21,531 - INFO - ['GET'] /health/ready 2025-10-09 06:34:21,531 - INFO - ============================================================ 2025-10-09 06:34:21,531 - INFO - Agents initialized successfully 2025-10-09 06:34:21,531 - INFO - Successfully imported create_app and initialize_agents 2025-10-09 06:34:21,531 - INFO - App instance created successfully (without agents) 2025-10-09 06:34:21,531 - INFO - App title: Medical AI Service 2025-10-09 06:34:21,531 - INFO - App version: 1.0.0 2025-10-09 06:34:21,531 - INFO - Initializing agents with preload_small_models=False... 2025-10-09 06:34:21,531 - INFO - Model manager imported successfully in initialize_agents 2025-10-09 06:34:21,531 - INFO - PatientSummarizerAgent created for Falconsai/medical_summarization (summarization) on cuda (loader deferred) 2025-10-09 06:34:21,546 - INFO - Main router registered with app 2025-10-09 06:34:21,550 - INFO - API router registered with app 2025-10-09 06:34:21,551 - INFO - Model management router registered with app 2025-10-09 06:34:21,551 - INFO - All routes registered successfully 2025-10-09 06:34:21,559 - INFO - Agents initialized and routes registered 2025-10-09 06:34:21,559 - INFO - ============================================================ 2025-10-09 06:34:21,559 - INFO - REGISTERED ROUTES: 2025-10-09 06:34:21,559 - INFO - ['GET', 'HEAD'] /openapi.json 2025-10-09 06:34:21,559 - INFO - ['GET', 'HEAD'] /docs 2025-10-09 06:34:21,560 - INFO - ['GET', 'HEAD'] /docs/oauth2-redirect 2025-10-09 06:34:21,560 - INFO - ['GET', 'HEAD'] /redoc 2025-10-09 06:34:21,560 - INFO - ['GET'] /live 2025-10-09 06:34:21,560 - INFO - ['GET'] /ready 2025-10-09 06:34:21,560 - INFO - ['GET'] /metrics 2025-10-09 06:34:21,560 - INFO - ['GET'] / 2025-10-09 06:34:21,560 - INFO - ['GET'] /api/info 2025-10-09 06:34:21,560 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:34:21,560 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:34:21,560 - INFO - ['POST'] /upload 2025-10-09 06:34:21,560 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:34:21,560 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:34:21,560 - INFO - ['POST'] /transcribe 2025-10-09 06:34:21,560 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:34:21,560 - INFO - ['POST'] /upload 2025-10-09 06:34:21,560 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:34:21,560 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:34:21,560 - INFO - ['POST'] /transcribe 2025-10-09 06:34:21,560 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:34:21,560 - INFO - ['POST'] /summarize 2025-10-09 06:34:21,560 - INFO - ['POST'] /upload 2025-10-09 06:34:21,560 - INFO - ['POST'] /phi/scrub 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/models/load 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/models/generate 2025-10-09 06:34:21,560 - INFO - ['GET'] /api/models/info 2025-10-09 06:34:21,560 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:34:21,560 - INFO - ['POST'] /api/models/switch 2025-10-09 06:34:21,560 - INFO - ['GET'] /api/models/health 2025-10-09 06:34:21,560 - INFO - ['GET'] /debug/routes 2025-10-09 06:34:21,560 - INFO - ['GET'] /health/live 2025-10-09 06:34:21,560 - INFO - ['GET'] /health/ready 2025-10-09 06:34:21,561 - INFO - ============================================================ 2025-10-09 06:34:21,561 - INFO - Agents initialized successfully 2025-10-09 06:34:21,561 - INFO - ============================================================ 2025-10-09 06:34:21,561 - INFO - FINAL REGISTERED ROUTES ON HF SPACES: 2025-10-09 06:34:21,561 - INFO - ['GET', 'HEAD'] /openapi.json 2025-10-09 06:34:21,561 - INFO - ['GET', 'HEAD'] /docs 2025-10-09 06:34:21,561 - INFO - ['GET', 'HEAD'] /docs/oauth2-redirect 2025-10-09 06:34:21,561 - INFO - ['GET', 'HEAD'] /redoc 2025-10-09 06:34:21,561 - INFO - ['GET'] /live 2025-10-09 06:34:21,561 - INFO - ['GET'] /ready 2025-10-09 06:34:21,561 - INFO - ['GET'] /metrics 2025-10-09 06:34:21,561 - INFO - ['GET'] / 2025-10-09 06:34:21,561 - INFO - ['GET'] /api/info 2025-10-09 06:34:21,561 - INFO - ['GET'] /api/performance_metrics 2025-10-09 06:34:21,561 - INFO - ['POST'] /generate_patient_summary 2025-10-09 06:34:21,561 - INFO - ['POST'] /upload 2025-10-09 06:34:21,561 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:34:21,561 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:34:21,561 - INFO - ['POST'] /transcribe 2025-10-09 06:34:21,561 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:34:21,561 - INFO - ['POST'] /upload 2025-10-09 06:34:21,561 - INFO - ['GET'] /get_updated_medical_data 2025-10-09 06:34:21,561 - INFO - ['PUT'] /update_medical_data 2025-10-09 06:34:21,561 - INFO - ['POST'] /transcribe 2025-10-09 06:34:21,561 - INFO - ['POST'] /extract_medical_data 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/generate_summary 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/extract_medical_data_from_audio 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/patient_summary_openvino 2025-10-09 06:34:21,561 - INFO - ['POST'] /summarize 2025-10-09 06:34:21,561 - INFO - ['POST'] /upload 2025-10-09 06:34:21,561 - INFO - ['POST'] /phi/scrub 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/models/load 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/models/generate 2025-10-09 06:34:21,561 - INFO - ['GET'] /api/models/info 2025-10-09 06:34:21,561 - INFO - ['GET'] /api/models/defaults 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/models/clear_cache 2025-10-09 06:34:21,561 - INFO - ['POST'] /api/models/switch 2025-10-09 06:34:21,561 - INFO - ['GET'] /api/models/health 2025-10-09 06:34:21,561 - INFO - ['GET'] /debug/routes 2025-10-09 06:34:21,561 - INFO - ['GET'] /health/live 2025-10-09 06:34:21,561 - INFO - ['GET'] /health/ready 2025-10-09 06:34:21,561 - INFO - Total routes registered: 40 2025-10-09 06:34:21,562 - INFO - ============================================================ INFO: Started server process [1] INFO: Waiting for application startup. 2025-10-09 06:34:21,562 - INFO - Detected Hugging Face Spaces environment - disabling Redis and Database connections 2025-10-09 06:34:21,562 - INFO - Skipping Redis initialization on HF Spaces 2025-10-09 06:34:21,562 - INFO - Skipping Database initialization on HF Spaces 2025-10-09 06:34:21,562 - INFO - Scalable service mesh not initialized (Redis not available) 2025-10-09 06:34:21,562 - INFO - Monitoring not initialized (Redis not available) 2025-10-09 06:34:21,562 - INFO - Application started without scalable features (Redis not available) 2025-10-09 06:34:21,562 - INFO - Starting FastAPI application with scalable architecture 2025-10-09 06:34:21,562 - INFO - Python version: 3.10.18 (main, Sep 30 2025, 00:42:07) [GCC 14.2.0] 2025-10-09 06:34:21,562 - INFO - PyTorch version: 2.3.0+cu121 2025-10-09 06:34:21,562 - INFO - CUDA available: True 2025-10-09 06:34:21,562 - INFO - CUDA version: 12.1 2025-10-09 06:34:21,562 - INFO - GPU count: 1 2025-10-09 06:34:21,562 - INFO - GPU 0: Tesla T4 INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit) INFO: 10.16.23.21:47109 - "GET / HTTP/1.1" 200 OK INFO: 10.16.23.21:47109 - "GET / HTTP/1.1" 200 OK Background task started for job_id: 8bf78119-dd65-4e2f-b674-28c0d7976c61INFO: 10.16.14.52:34296 - "POST /generate_patient_summary?stream=true HTTP/1.1" 200 OK 2025-10-09 06:34:31,881 - INFO - EHR API response status: 200 Step 1 - EHR fetch took 0.82s Step 2 - Processing took 0.00s Step 3 - Computing baseline took 0.00s 📝 SUMMARIZATION MODE: google/flan-t5-large 2025-10-09 06:34:31,887 - INFO - Loading Transformers model: google/flan-t5-large (summarization) 2025-10-09 06:34:32,578 - WARNING - AutoModelForSeq2SeqLM failed for google/flan-t5-large: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` 2025-10-09 06:34:59,574 - INFO - Loaded google/flan-t5-large using AutoModel fallback Device set to use cuda:0 2025-10-09 06:35:00,635 - WARNING - Pipeline creation failed with device 0, trying CPU: 'SummarizationPipeline' object has no attribute 'assistant_model' Device set to use cuda:0 2025-10-09 06:35:00,636 - ERROR - Failed to load Transformers model: 'SummarizationPipeline' object has no attribute 'assistant_model' 2025-10-09 06:35:00,636 - ERROR - Failed to create model loader for google/flan-t5-large (summarization): Transformers model loading failed: 'SummarizationPipeline' object has no attribute 'assistant_model' Unified manager load failed for summarization, falling back: Model loading failed for google/flan-t5-large (summarization): Transformers model loading failed: 'SummarizationPipeline' object has no attribute 'assistant_model' Both `device` and `device_map` are specified. `device` will override `device_map`. You will most likely encounter unexpected behavior. Please remove `device` and keep `device_map`. Memory cleanup completed. Current usage: 1119.7 MB 2025-10-09 06:35:00,871 - ERROR - Error details: ValueError: Could not load model Falconsai/medical_summarization with any of the following classes: (, ). See the original errors: while loading with AutoModelForSeq2SeqLM, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` while loading with T5ForConditionalGeneration, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` Background task completed successfully for job_id: 8bf78119-dd65-4e2f-b674-28c0d7976c61 Background task started for job_id: cb243477-e435-49df-938d-c47090dfa52eINFO: 10.16.23.21:7623 - "POST /generate_patient_summary?stream=true HTTP/1.1" 200 OK 2025-10-09 06:35:38,103 - INFO - EHR API response status: 200 Step 1 - EHR fetch took 0.79s Step 2 - Processing took 0.00s Step 3 - Computing baseline took 0.00s 🔤 TEXT-GENERATION MODE: microsoft/Phi-3-mini-4k-instruct 2025-10-09 06:35:38,171 - WARNING - Failed to create openvino_telemetry file. Please allow write access to the following directory: / 2025-10-09 06:35:38,171 - WARNING - Failed to create openvino_telemetry file. Please allow write access to the following directory: / 2025-10-09 06:35:38,171 - WARNING - Could not create directory for storing client ID. No data will be sent. 2025-10-09 06:35:38,196 - WARNING - Failed to create openvino_telemetry file. Please allow write access to the following directory: / 2025-10-09 06:35:38,196 - WARNING - Failed to create openvino_telemetry file. Please allow write access to the following directory: / 2025-10-09 06:35:38,196 - WARNING - Could not create directory for storing client ID. No data will be sent. 2025-10-09 06:35:38,627 - WARNING - mkdir -p failed for path /.config/matplotlib: [Errno 13] Permission denied: '/.config' 2025-10-09 06:35:38,628 - WARNING - Matplotlib created a temporary cache directory at /tmp/matplotlib-x1q3_89e because there was an issue with the default path (/.config/matplotlib); it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. No OpenVINO files were found for microsoft/Phi-3-mini-4k-instruct, setting `export=True` to convert the model to the OpenVINO IR. Don't forget to save the resulting model with `.save_pretrained()` Fetching 2 files: 0%| | 0/2 [00:00 0: /opt/venv/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:332: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. torch.tensor(0.0, device=mask.device, dtype=dtype), /opt/venv/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:333: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. torch.tensor(torch.finfo(torch.float16).min, device=mask.device, dtype=dtype), /opt/venv/lib/python3.10/site-packages/transformers/cache_utils.py:551: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! elif ( /opt/venv/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py:59: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! is_causal = query.shape[2] > 1 and attention_mask is None and getattr(module, "is_causal", True) 2025-10-09 06:37:10,477 - WARNING - Failed to create openvino_telemetry file. Please allow write access to the following directory: / 2025-10-09 06:37:10,477 - WARNING - Failed to create openvino_telemetry file. Please allow write access to the following directory: / 2025-10-09 06:37:10,477 - WARNING - Could not create directory for storing client ID. No data will be sent. INFO:nncf:Statistics of the bitwidth distribution: ┍━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑ │ Weight compression mode │ % all parameters (layers) │ % ratio-defining parameters (layers) │ ┝━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥ │ int8_asym │ 100% (130 / 130) │ 100% (130 / 130) │ ┕━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙ Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 0:00:25 • 0:00:00 [✅ SUCCESS] Text-generation | TIMEOUT_MODE: normal | TOTAL: 395.2s Background task completed successfully for job_id: cb243477-e435-49df-938d-c47090dfa52e Background task started for job_id: 9cfb2d70-505a-40d9-b8e9-c0b433b01da4INFO: 10.16.23.21:27796 - "POST /generate_patient_summary?stream=true HTTP/1.1" 200 OK 2025-10-09 06:43:33,022 - INFO - EHR API response status: 200 Step 1 - EHR fetch took 0.83s Step 2 - Processing took 0.00s Step 3 - Computing baseline took 0.00s 🔤 TEXT-GENERATION MODE: microsoft/Phi-3-mini-4k-instruct No OpenVINO files were found for microsoft/Phi-3-mini-4k-instruct, setting `export=True` to convert the model to the OpenVINO IR. Don't forget to save the resulting model with `.save_pretrained()` Loading checkpoint shards: 0%| | 0/2 [00:00, ). See the original errors: while loading with AutoModelForSeq2SeqLM, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` while loading with T5ForConditionalGeneration, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` Background task completed successfully for job_id: 698e3f44-fd3d-48b3-aee6-db67a64b8361 Background task started for job_id: fae0f26d-3e7a-4d00-92da-16a64fecb19eINFO: 10.16.23.21:50073 - "POST /generate_patient_summary?stream=true HTTP/1.1" 200 OK 2025-10-09 06:46:37,228 - INFO - EHR API response status: 200 Step 1 - EHR fetch took 0.80s Step 2 - Processing took 0.00s Step 3 - Computing baseline took 0.00s 📝 SUMMARIZATION MODE: facebook/bart-large-cnn 2025-10-09 06:46:37,234 - INFO - Loading Transformers model: facebook/bart-large-cnn (summarization) 2025-10-09 06:46:37,558 - WARNING - AutoModelForSeq2SeqLM failed for facebook/bart-large-cnn: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` 2025-10-09 06:46:38,088 - INFO - Loaded facebook/bart-large-cnn using AutoModel fallback Device set to use cuda:0 2025-10-09 06:46:38,775 - WARNING - Pipeline creation failed with device 0, trying CPU: 'SummarizationPipeline' object has no attribute 'assistant_model' Device set to use cuda:0 2025-10-09 06:46:38,775 - ERROR - Failed to load Transformers model: 'SummarizationPipeline' object has no attribute 'assistant_model' 2025-10-09 06:46:38,775 - ERROR - Failed to create model loader for facebook/bart-large-cnn (summarization): Transformers model loading failed: 'SummarizationPipeline' object has no attribute 'assistant_model' Unified manager load failed for summarization, falling back: Model loading failed for facebook/bart-large-cnn (summarization): Transformers model loading failed: 'SummarizationPipeline' object has no attribute 'assistant_model' Both `device` and `device_map` are specified. `device` will override `device_map`. You will most likely encounter unexpected behavior. Please remove `device` and keep `device_map`. Memory cleanup completed. Current usage: 9365.8 MB 2025-10-09 06:46:39,084 - ERROR - Error details: ValueError: Could not load model Falconsai/medical_summarization with any of the following classes: (, ). See the original errors: while loading with AutoModelForSeq2SeqLM, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` while loading with T5ForConditionalGeneration, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` Background task completed successfully for job_id: fae0f26d-3e7a-4d00-92da-16a64fecb19e Background task started for job_id: 29410a9f-c9cc-42ea-8188-611bacc0ce86INFO: 10.16.14.52:39092 - "POST /generate_patient_summary?stream=true HTTP/1.1" 200 OK 2025-10-09 06:47:41,657 - INFO - EHR API response status: 200 2025-10-09 06:47:41,661 - INFO - Loading Transformers model: google/flan-t5-large (summarization) Step 1 - EHR fetch took 0.81s Step 2 - Processing took 0.00s Step 3 - Computing baseline took 0.00s 📝 SUMMARIZATION MODE: google/flan-t5-large 2025-10-09 06:47:41,896 - WARNING - AutoModelForSeq2SeqLM failed for google/flan-t5-large: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` 2025-10-09 06:47:43,725 - INFO - Loaded google/flan-t5-large using AutoModel fallback Device set to use cuda:0 Unified manager load failed for summarization, falling back: Model loading failed for google/flan-t5-large (summarization): Transformers model loading failed: 'SummarizationPipeline' object has no attribute 'assistant_model' 2025-10-09 06:47:50,211 - WARNING - Pipeline creation failed with device 0, trying CPU: 'SummarizationPipeline' object has no attribute 'assistant_model' Device set to use cuda:0 2025-10-09 06:47:50,213 - ERROR - Failed to load Transformers model: 'SummarizationPipeline' object has no attribute 'assistant_model' 2025-10-09 06:47:50,213 - ERROR - Failed to create model loader for google/flan-t5-large (summarization): Transformers model loading failed: 'SummarizationPipeline' object has no attribute 'assistant_model' Both `device` and `device_map` are specified. `device` will override `device_map`. You will most likely encounter unexpected behavior. Please remove `device` and keep `device_map`. Memory cleanup completed. Current usage: 7539.6 MB 2025-10-09 06:47:50,484 - ERROR - Error details: ValueError: Could not load model Falconsai/medical_summarization with any of the following classes: (, ). See the original errors: while loading with AutoModelForSeq2SeqLM, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained return model_class.from_pretrained( File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` while loading with T5ForConditionalGeneration, an error is thrown: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 310, in infer_framework_load_model model = model_class.from_pretrained(model, **fp32_kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper return func(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4546, in from_pretrained raise ValueError( ValueError: Using a `device_map`, `tp_plan`, `torch.device` context manager or setting `torch.set_default_device(device)` requires `accelerate`. You can install it with `pip install accelerate` Background task completed successfully for job_id: 29410a9f-c9cc-42ea-8188-611bacc0ce86 [✅ SUCCESS] Text-generation | TIMEOUT_MODE: normal | TOTAL: 312.7s Background task completed successfully for job_id: 9cfb2d70-505a-40d9-b8e9-c0b433b01da4 Background task started for job_id: baa38ef8-4af1-4aa3-9590-5fa3640147c4INFO: 10.16.14.52:20443 - "POST /generate_patient_summary?stream=true HTTP/1.1" 200 OK 2025-10-09 06:49:01,217 - INFO - EHR API response status: 200 Step 1 - EHR fetch took 0.85s 2025-10-09 06:49:01,221 - INFO - Loading Transformers model: patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 (seq2seq) Step 2 - Processing took 0.00s Step 3 - Computing baseline took 0.00s 🔄 SEQ2SEQ MODE: patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 2025-10-09 06:49:01,433 - ERROR - Failed to load Transformers model: The checkpoint you are trying to load has model type `encoder_decoder` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date. You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git` 2025-10-09 06:49:01,433 - ERROR - Failed to create model loader for patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 (seq2seq): Transformers model loading failed: The checkpoint you are trying to load has model type `encoder_decoder` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date. You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git` Seq2Seq model failed: Model loading failed for patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 (seq2seq): Transformers model loading failed: The checkpoint you are trying to load has model type `encoder_decoder` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date. You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git` Background task completed successfully for job_id: baa38ef8-4af1-4aa3-9590-5fa3640147c4 INFO: 10.16.23.21:18218 - "GET / HTTP/1.1" 200 OK INFO: Shutting down INFO: Waiting for application shutdown. 2025-10-09 06:50:52,880 - INFO - Shutting down FastAPI application 2025-10-09 06:50:52,880 - INFO - Scalable components shutdown complete INFO: Application shutdown complete. INFO: Finished server process [1]