Revert "feat: Add T4-optimized unified model manager for seamless loading and generation across various AI model types, updating model configurations."
feat: Establish AI medical extraction service with performance optimizations, unified model management, and detailed Hugging Face Spaces deployment guides.
Update maximum token limits to 8192 in `fallback_pipeline.py` and `unified_model_manager.py` for improved handling of longer inputs and enhanced performance in summary generation.
Update maximum token limits in `patient_summary_agent.py` and `summarizer.py` to 8192 for enhanced summary generation capabilities, allowing for better handling of longer inputs and fuller outputs.
Increase `max_new_tokens` limit to 8192 in `unified_model_manager.py` for improved summary generation, allowing for fuller outputs and better handling of longer inputs.
Update model configuration to increase maximum token limits for improved summary generation. Adjusted `max_length`, `max_new_tokens`, and context window settings from 2048 to 8192 in `model_config.py` and `unified_model_manager.py` for enhanced performance and better handling of longer inputs.
Enhance model loading and result handling in `async_patient_summary` by adding fallback tracking. Update `build_result_dict` to include fallback status and reason, improving error reporting and user feedback. Refactor model loading logic to propagate fallback information throughout the summary generation process.
Revert "Enhance patient summary generation by adding model information tracking in `build_result_dict` and related functions. Update `async_patient_summary` and `process_patient_summary_background` to accept and propagate model info, improving error handling and logging for model loading. Refactor model configuration in `async_patient_summary_optimized` to dynamically handle requested models and fallback scenarios, ensuring better user feedback and performance."
Enhance patient summary generation by adding model information tracking in `build_result_dict` and related functions. Update `async_patient_summary` and `process_patient_summary_background` to accept and propagate model info, improving error handling and logging for model loading. Refactor model configuration in `async_patient_summary_optimized` to dynamically handle requested models and fallback scenarios, ensuring better user feedback and performance.
Implement T4 Medium optimizations in model handling and logging. Enhance PatientSummarizerAgent for dynamic model configuration, allowing flexible model loading based on user input. Update environment variable checks for improved performance on Hugging Face Spaces. Refactor model generation settings for T4 compatibility, including memory management and generation parameters. Improve error handling and logging throughout the application for better user support.
Enhance PatientSummarizerAgent and user_models_config with improved environment variable handling for Hugging Face Spaces. Introduce async support for clinical summary generation and refine model loading error handling. Update model type definitions for clarity and adjust model retrieval functions to ignore active status.
Enhance model loading in PatientSummarizerAgent with improved error handling and fallback mechanism. Introduce environment variable check for HF Spaces, update logging for better clarity, and refine fallback summary generation to include extracted patient information and error details.
Refactor Docker configurations to use `uvicorn` as the entry point for FastAPI applications. Update `.huggingface.yaml` to remove legacy app configuration and clarify hardware requirements. Modify `Dockerfile.prod` to install `uvicorn` and adjust the command for production deployment.
Update binary cache files and enhance dynamic cache handling in model generation. Refactor `routes_fastapi.py` to simplify model type determination and improve clarity in the unified model manager. Add support for dynamic cache detection in OpenVINO models to optimize performance during single generations.
Enhance app initialization in `app.py` for compatibility with Hugging Face Spaces by exporting an app instance. Update import statements in `main.py` for consistency. Refactor model generation parameters in `unified_model_manager.py` to improve clarity and maintainability.
Update .gitignore to include additional files and directories for macOS, Linux, and application-specific configurations. Modify .huggingface.yaml to enhance Docker build settings and hardware requirements. Refactor app.py to remove legacy code and improve error handling. Remove deprecated files related to comprehensive streaming fixes, deployment scripts, and optimized Docker configurations. Update Dockerfile.prod to extend Gunicorn timeout for better performance. Enhance health endpoints and model management with improved logging and error handling. Consolidate routes and simplify architecture for better maintainability.
Revert "Remove legacy `app.py` file and streamline startup process for Hugging Face Spaces. Refactor `start_hf_spaces.py` to simplify environment setup and application initialization. Enhance `ai_med_extract.app` with improved logging and error handling during app creation and agent initialization. Update route registration in `routes_fastapi.py` for better organization and clarity."
Revert "Update Dockerfiles to use `asgi:app` as the entry point, resolving deployment issues caused by the removal of `app.py`. This change ensures compatibility with the new structure and improves initialization for production environments."
Update Dockerfiles to use `asgi:app` as the entry point, resolving deployment issues caused by the removal of `app.py`. This change ensures compatibility with the new structure and improves initialization for production environments.
Remove legacy `app.py` file and streamline startup process for Hugging Face Spaces. Refactor `start_hf_spaces.py` to simplify environment setup and application initialization. Enhance `ai_med_extract.app` with improved logging and error handling during app creation and agent initialization. Update route registration in `routes_fastapi.py` for better organization and clarity.
Revert "Refactor `routes_fastapi.py` to enhance performance and maintainability. Introduced `CacheManager`, `ErrorResponseBuilder`, and `PerformanceTracker` for optimized caching, consistent error handling, and improved performance metrics. Updated logging to use safe methods, eliminated redundant code, and maintained backward compatibility. Overall, these changes streamline the patient summary generation process and improve error visibility."
Revert "Refactor `build_result_dict` function by moving it to `routes_helpers.py` to eliminate duplication and improve code organization. Updated timing calculations for better precision and added prompt information handling. This change enhances maintainability and streamlines the result building process."
Refactor `build_result_dict` function by moving it to `routes_helpers.py` to eliminate duplication and improve code organization. Updated timing calculations for better precision and added prompt information handling. This change enhances maintainability and streamlines the result building process.
Refactor `routes_fastapi.py` to enhance performance and maintainability. Introduced `CacheManager`, `ErrorResponseBuilder`, and `PerformanceTracker` for optimized caching, consistent error handling, and improved performance metrics. Updated logging to use safe methods, eliminated redundant code, and maintained backward compatibility. Overall, these changes streamline the patient summary generation process and improve error visibility.
Refactor patient summary processing to improve job status updates. Removed redundant progress updates and ensured accurate visit count reporting after data parsing and computation steps. Enhanced error handling and streamlined the workflow for better maintainability.
Enhance patient summary generation with improved progress updates and error handling. Updated SSEGenerator to ensure frequent data transmission, preventing HTTP/2 protocol errors. Refined job status monitoring and heartbeat intervals for better connection stability during long-running tasks. Enhanced user feedback with detailed progress messages throughout the generation process.
Implement timeout protection and progress updates for patient summary generation. Enhanced error handling for both text generation and summarization processes, ensuring robust job management and improved user feedback during long-running tasks. Updated request queue management to handle job IDs more flexibly, allowing for better tracking and processing of requests.
Enhance patient summary processing with queue management and improved error handling. Introduced a queue manager to handle request slots, ensuring efficient processing and timeout management. Updated background task logic to include performance metrics and detailed error responses, enhancing overall reliability and maintainability of the patient summary generation workflow.
Enhance SSEGenerator job monitoring and error handling. Introduced a mechanism to wait for job creation before erroring out, improved timeout handling to send warnings instead of stopping processing, and adjusted max wait times for operations. Updated heartbeat and progress reporting to ensure more reliable streaming responses.
Refactor streaming response handling in patient summary generation to utilize a centralized SSE generator service. This change simplifies the code by removing custom streaming logic, enhances job status monitoring, and improves error handling. The job management process is also streamlined for better maintainability and performance.
Refactor patient summary generation to enhance performance and reliability. Key improvements include a centralized job management service, standardized error handling, and optimized SSE generation. Introduced new constants for data size thresholds and chunking configurations, ensuring better maintainability and scalability. All changes maintain backward compatibility and improve overall code quality.
Refactor PyTorch compatibility handling by centralizing the RMSNorm patch into a dedicated utility function. This ensures consistent application across modules and improves maintainability. Update logging to reflect the new approach.
Implement RMSNorm patch for PyTorch in ai_med_extract modules to ensure compatibility with models like Phi-3, enhancing tensor normalization functionality and logging.
Remove obsolete .pyc files and add RMSNorm compatibility patch for PyTorch in model_loader_spaces.py, enhancing error handling and fallback mechanisms for model loading.
Revert "Refactor async_patient_summary to unify model selection and enhance summary generation. Introduce robust fallback mechanisms for model types, including support for summarization, seq2seq, gguf, and causal-openvino. Improve logging and error handling for better diagnostics during summary generation."
Refactor async_patient_summary to unify model selection and enhance summary generation. Introduce robust fallback mechanisms for model types, including support for summarization, seq2seq, gguf, and causal-openvino. Improve logging and error handling for better diagnostics during summary generation.