Refactor text generation in routes_fastapi.py to return raw summaries instead of formatted markdown. Remove unnecessary markdown processing functions and streamline summary handling, enhancing performance and clarity in the output structure.
Enhance caching behavior in text generation processes across multiple files. Update patient_summary_agent.py and routes_fastapi.py to ensure proper dynamic cache handling, preventing stale cache issues during single generations. Modify model_loader_spaces.py and unified_model_manager.py to explicitly manage cache settings based on model capabilities, improving overall generation reliability. Update binary files in __pycache__ directories.
Refactor memory management and logging in routes_fastapi.py to enhance monitoring and prevent leaks. Introduce helper functions for safe logging and streamline text generation processes. Update cleanup_memory function to provide detailed memory usage metrics and warnings for high usage scenarios, improving overall performance and reliability.
Update requirements to pin transformers version and modify caching behavior for OpenVINO models. Adjust logic in routes_fastapi.py to disable cache for compatibility with newer transformers, ensuring stability in model generation processes.
Refactor text generation handling in OpenVinoPipeline to prioritize max_new_tokens over max_length, ensuring proper token management for causal models.
Refactor caching behavior in model configuration and pipeline to prevent DynamicCache errors. Set use_cache to None for model's default handling and update related settings in TransformersModel generation parameters.
Revert "Enhance model configuration and unified model manager to improve performance. Update max_length and max_new_tokens for consistency, and explicitly disable cache to prevent DynamicCache errors. Add logger import in FastAPI routes for better logging capabilities."
Enhance model configuration and unified model manager to improve performance. Update max_length and max_new_tokens for consistency, and explicitly disable cache to prevent DynamicCache errors. Add logger import in FastAPI routes for better logging capabilities.
Implement Hugging Face Spaces configuration and memory management utilities. Enhance model loading and cleanup processes, enabling optimized deployment on HF Spaces. Update memory optimization settings and model configurations for improved performance and resource management.
Refactor patient summary generation to standardize custom prompt formatting. Update logic to ensure consistent structure across different modes, enhancing clarity and usability in generating comprehensive summaries. Adjust context handling to align with expected input formats for summarization models.
Enhance patient summary generation by introducing support for custom prompts. Modify the processing logic to append visit data when a custom prompt is provided, improving flexibility and user experience in generating patient summaries. Update related sections to ensure consistent handling of prompts across different modes.
Refactor patient summary generation to support a flexible structure, allowing for comprehensive summaries without enforcing fixed sections. Update related methods and prompts to enhance clarity and usability. Improve error handling and logging for summary generation processes, ensuring better performance and user experience.
Remove obsolete documentation and test files related to GGUF operations, streaming fixes, and device parameter handling. This cleanup enhances project maintainability by eliminating unused code and files that are no longer relevant to the current implementation.
Enhance patient summary generation with optimized parallel processing and intelligent chunking for large datasets. Introduce extended timeout configurations for complex cases, improving error handling and logging. Update API endpoints for large data processing and streaming, ensuring better performance and user experience. Refactor model loading to support OpenVINO and standard transformers with improved fallback strategies.
Refactor model management by replacing the legacy model manager with a unified model manager across the application. Update imports and method calls to ensure compatibility with the new structure. Enhance error handling and logging for model loading processes, improving overall performance and maintainability.
Refactor summarizer pipeline creation and enhance model loading for HF Spaces compatibility. Introduce a unified approach for model management, including new user models endpoint and improved error handling. Update model configurations and logging for better monitoring during model loading processes.
Implement global exception handling and memory-aware logging across the application. Introduce logging enhancements in the AI service to capture memory snapshots during errors and key operations. Update middleware for request/response logging and improve model loading with detailed progress updates. Refactor patient summary generation to include concise logging for each step, ensuring better monitoring and error handling.
Remove 'Connection: keep-alive' header from event-stream response in patient summary generation. Update binary cache files for model configurations and loaders.
Enhance GGUF model loading and generation process with improved progress updates and logging. Updated job status messages to include visual indicators for different stages of model loading and text generation. Streamlined the use of extended streaming for all requests to prevent timeout issues, ensuring a more responsive user experience.
Refactor GGUF model handling for HF Spaces compatibility. Adjusted timeouts for GGUF operations, introduced an extended SSE generator for long-running tasks, and optimized model loading with environment checks. Enhanced logging for job status and progress updates.
Enhance SSE generator with debug logging and improved responsiveness. Added debug statements for job status and completion, reduced sleep duration for more frequent updates, and updated CORS headers for API responses.
Update application logs and model loading mechanisms; enhance error handling for Transformers models. Adjusted GGUF model path for improved loading and added new API routes for performance metrics. Cleaned up binary cache files and improved logging for model initialization and processing steps.
Enhanced patient summary generation with robust data processing and flexible key matching. Introduced new API endpoints for performance metrics and cache management. Improved logging for better traceability during data handling and model generation.
Enhanced EHR data processing with robust key matching and error handling. Updated context window settings for model loading to 8192. Added new function for improved patient record processing.
`Added memory monitoring and cleanup features to model loading and generation pipelines. Updated model manager to track memory usage and perform periodic cleanup. Added API endpoints for monitoring memory status and performance metrics.`