Spaces:

salvinjose
/

HNTAI

Paused

App Files Files Community

HNTAI / REFACTORED_README.md

sachinchandrankallar

Revert "Merge branch 'FT-DEV-17/09/2025'"

aba0d25 9 months ago

preview code

Raw

History Blame

12.2 kB

	# HNTAI Medical Data Extraction - Refactored System

	## Overview

	This project has been completely refactored to provide a unified, flexible model management system that supports any model name and type, including GGUF models for patient summary generation. The system now offers dynamic model loading, runtime model switching, and robust fallback mechanisms.

	## 🚀 Key Features

	### ✨ Universal Model Support
	- Any Model Name: Use any Hugging Face model, local model, or custom model
	- Any Model Type: Support for text-generation, summarization, NER, GGUF, OpenVINO, and more
	- Automatic Type Detection: The system automatically detects model types from names
	- Dynamic Loading: Load models at runtime without restarting the application

	### 🔄 GGUF Model Integration
	- Seamless GGUF Support: Full integration with llama.cpp for GGUF models
	- Patient Summary Generation: Optimized for medical text summarization
	- Memory Efficient: Ultra-conservative settings for Hugging Face Spaces
	- Fallback Mechanisms: Automatic fallback when GGUF models fail

	### 🧠 Unified Model Manager
	- Single Interface: One manager handles all model types
	- Smart Caching: Intelligent model caching with memory management
	- Fallback Chains: Multiple fallback options for robustness
	- Performance Monitoring: Built-in timing and memory tracking

	## 🏗️ Architecture

	### Core Components

	1. `UnifiedModelManager` - Central model management system
	2. `BaseModelLoader` - Abstract interface for all model loaders
	3. `TransformersModelLoader` - Hugging Face Transformers models
	4. `GGUFModelLoader` - GGUF models via llama.cpp
	5. `OpenVINOModelLoader` - OpenVINO optimized models
	6. `PatientSummarizerAgent` - Enhanced patient summary generation

	### Model Type Support

	\| Model Type \| Description \| Example Models \|
	\|------------\|-------------\|----------------\|
	\| `text-generation` \| Causal language models \| `facebook/bart-base`, `microsoft/DialoGPT-medium` \|
	\| `summarization` \| Text summarization models \| `Falconsai/medical_summarization`, `facebook/bart-large-cnn` \|
	\| `ner` \| Named Entity Recognition \| `dslim/bert-base-NER`, `Jean-Baptiste/roberta-large-ner-english` \|
	\| `gguf` \| GGUF format models \| `microsoft/Phi-3-mini-4k-instruct-gguf` \|
	\| `openvino` \| OpenVINO optimized models \| `microsoft/Phi-3-mini-4k-instruct` \|

	## 🚀 Quick Start

	### 1. Basic Usage

	```python
	from ai_med_extract.utils.model_manager import model_manager

	# Load any model dynamically
	loader = model_manager.get_model_loader(
	model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
	model_type="gguf",
	filename="Phi-3-mini-4k-instruct-q4.gguf"
	)

	# Generate text
	result = loader.generate("Generate a medical summary for...")
	```

	### 2. Patient Summary Generation

	```python
	from ai_med_extract.agents.patient_summary_agent import PatientSummarizerAgent

	# Create agent with any model
	agent = PatientSummarizerAgent(
	model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
	model_type="gguf"
	)

	# Generate clinical summary
	summary = agent.generate_clinical_summary(patient_data)
	```

	### 3. Runtime Model Switching

	```python
	# Switch models at runtime
	agent.update_model(
	model_name="Falconsai/medical_summarization",
	model_type="summarization"
	)
	```

	## 📡 API Endpoints

	### Model Management API

	#### Load Model
	```http
	POST /api/models/load
	Content-Type: application/json

	{
	"model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
	"model_type": "gguf",
	"filename": "Phi-3-mini-4k-instruct-q4.gguf",
	"force_reload": false
	}
	```

	#### Generate Text
	```http
	POST /api/models/generate
	Content-Type: application/json

	{
	"model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
	"model_type": "gguf",
	"prompt": "Generate a medical summary for...",
	"max_tokens": 512,
	"temperature": 0.7
	}
	```

	#### Switch Agent Model
	```http
	POST /api/models/switch
	Content-Type: application/json

	{
	"agent_name": "patient_summarizer",
	"model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
	"model_type": "gguf"
	}
	```

	#### Get Model Information
	```http
	GET /api/models/info?model_name=microsoft/Phi-3-mini-4k-instruct-gguf
	```

	#### Health Check
	```http
	GET /api/models/health
	```

	### Patient Summary API

	#### Generate Patient Summary
	```http
	POST /generate_patient_summary
	Content-Type: application/json

	{
	"patientid": "12345",
	"token": "your_token",
	"key": "your_api_key",
	"patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
	"patient_summarizer_model_type": "gguf"
	}
	```

	## 🔧 Configuration

	### Environment Variables

	```bash
	# Cache directories
	HF_HOME=/tmp/huggingface
	XDG_CACHE_HOME=/tmp
	TORCH_HOME=/tmp/torch
	WHISPER_CACHE=/tmp/whisper

	# GGUF optimization
	GGUF_N_THREADS=2
	GGUF_N_BATCH=64
	```

	### Model Configuration

	The system automatically uses optimized models for different environments:

	- Local Development: Full model capabilities
	- Hugging Face Spaces: Memory-optimized models
	- Production: Configurable based on resources

	## 🎯 Use Cases

	### 1. Medical Document Processing
	```python
	# Extract medical data with any model
	medical_data = model_manager.generate_text(
	model_name="facebook/bart-base",
	model_type="text-generation",
	prompt="Extract medical entities from: " + document_text
	)
	```

	### 2. Patient Summary Generation
	```python
	# Use GGUF model for patient summaries
	summary = model_manager.generate_text(
	model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
	model_type="gguf",
	prompt=patient_data_prompt,
	max_tokens=512
	)
	```

	### 3. Dynamic Model Switching
	```python
	# Switch between models based on task requirements
	if task == "summarization":
	model_name = "Falconsai/medical_summarization"
	model_type = "summarization"
	elif task == "extraction":
	model_name = "facebook/bart-base"
	model_type = "text-generation"

	loader = model_manager.get_model_loader(model_name, model_type)
	```

	## 🔒 Memory Management

	### Hugging Face Spaces Optimization

	The system automatically detects Hugging Face Spaces and applies ultra-conservative memory settings:

	- GGUF Models: 1 thread, 16 batch size, 512 context
	- Transformers: Float32 precision, minimal memory usage
	- Automatic Fallbacks: Graceful degradation when memory is limited

	### Memory Monitoring

	```python
	# Check memory usage
	health = requests.get("/api/models/health").json()
	print(f"GPU Memory: {health['gpu_info']['memory_allocated']}")
	print(f"Loaded Models: {health['loaded_models_count']}")
	```

	## 🧪 Testing

	### Test GGUF Models

	```bash
	# Test GGUF model loading
	python test_gguf.py

	# Test specific model
	python -c "
	from ai_med_extract.utils.model_manager import model_manager
	loader = model_manager.get_model_loader('microsoft/Phi-3-mini-4k-instruct-gguf', 'gguf')
	result = loader.generate('Test prompt')
	print(f'Success: {len(result)} characters generated')
	"
	```

	### Model Validation

	```python
	from ai_med_extract.utils.model_config import validate_model_config

	# Validate model configuration
	validation = validate_model_config(
	model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
	model_type="gguf"
	)

	print(f"Valid: {validation['valid']}")
	print(f"Warnings: {validation['warnings']}")
	```

	## 🚨 Error Handling

	### Fallback Mechanisms

	1. Primary Model: Attempts to load the specified model
	2. Fallback Model: Uses predefined fallback for the model type
	3. Text Fallback: Generates structured text responses
	4. Graceful Degradation: Continues operation with reduced functionality

	### Common Issues

	#### GGUF Model Loading Fails
	```python
	# Check model file
	if not os.path.exists(model_path):
	# Download from Hugging Face
	from huggingface_hub import hf_hub_download
	model_path = hf_hub_download(repo_id, filename)
	```

	#### Memory Issues
	```python
	# Clear cache and reload
	model_manager.clear_cache()
	torch.cuda.empty_cache()

	# Use smaller model
	loader = model_manager.get_model_loader(
	model_name="facebook/bart-base", # Smaller model
	model_type="text-generation"
	)
	```

	## 📊 Performance

	### Benchmarking

	```python
	import time

	# Time model loading
	start = time.time()
	loader = model_manager.get_model_loader(model_name, model_type)
	load_time = time.time() - start

	# Time generation
	start = time.time()
	result = loader.generate(prompt)
	gen_time = time.time() - start

	print(f"Load: {load_time:.2f}s, Generate: {gen_time:.2f}s")
	```

	### Optimization Tips

	1. Use Appropriate Model Size: Smaller models for limited resources
	2. Enable Caching: Models are cached after first load
	3. Batch Processing: Process multiple requests together
	4. Memory Monitoring: Regular health checks

	## 🔮 Future Enhancements

	### Planned Features

	- Model Quantization: Automatic model optimization
	- Distributed Loading: Load models across multiple devices
	- Model Versioning: Track and manage model versions
	- Performance Analytics: Detailed performance metrics
	- Auto-scaling: Automatic model scaling based on load

	### Extensibility

	The system is designed for easy extension:

	```python
	class CustomModelLoader(BaseModelLoader):
	def __init__(self, model_name: str):
	self.model_name = model_name

	def load(self):
	# Custom loading logic
	pass

	def generate(self, prompt: str, **kwargs):
	# Custom generation logic
	pass
	```

	## 📝 Migration Guide

	### From Old System

	1. Replace Hardcoded Models:
	```python
	# Old
	model = LazyModelLoader("facebook/bart-base", "text-generation")

	# New
	model = model_manager.get_model_loader("facebook/bart-base", "text-generation")
	```

	2. Update Patient Summarizer:
	```python
	# Old
	agent = PatientSummarizerAgent()

	# New
	agent = PatientSummarizerAgent(
	model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
	model_type="gguf"
	)
	```

	3. Use Dynamic Model Selection:
	```python
	# Old: Fixed model types
	# New: Dynamic model selection
	model_type = request.form.get("model_type", "text-generation")
	model_name = request.form.get("model_name", "facebook/bart-base")
	```

	## 🤝 Contributing

	### Development Setup

	```bash
	# Clone repository
	git clone <repository-url>
	cd HNTAI

	# Install dependencies
	pip install -r requirements.txt

	# Run tests
	python -m pytest tests/

	# Start development server
	python -m ai_med_extract.app
	```

	### Adding New Model Types

	1. Create Loader Class:
	```python
	class CustomModelLoader(BaseModelLoader):
	# Implement required methods
	pass
	```

	2. Update Model Manager:
	```python
	if model_type == "custom":
	loader = CustomModelLoader(model_name)
	```

	3. Add Configuration:
	```python
	DEFAULT_MODELS["custom"] = {
	"primary": "default/custom-model",
	"fallback": "fallback/custom-model"
	}
	```

	## 📄 License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## 🆘 Support

	### Getting Help

	- Documentation: This README and inline code comments
	- Issues: GitHub Issues for bug reports
	- Discussions: GitHub Discussions for questions
	- Examples: See `test_gguf.py` and other test files

	### Common Questions

	Q: Can I use my own GGUF model?
	A: Yes! Just provide the path to your .gguf file or upload it to Hugging Face.

	Q: How do I optimize for memory?
	A: Use smaller models, enable caching, and monitor memory usage via `/api/models/health`.

	Q: Can I switch models without restarting?
	A: Yes! Use the `/api/models/switch` endpoint to change models at runtime.

	Q: What if a model fails to load?
	A: The system automatically falls back to alternative models and provides detailed error information.

	---

	🎉 Congratulations! You now have a powerful, flexible system that can work with any model name and type, including GGUF models for patient summary generation. The system is designed to be robust, efficient, and easy to use while maintaining backward compatibility.