Spaces:
Paused
Paused
| # HNTAI - Comprehensive Technical Architecture Documentation | |
| **Version:** 1.0 | |
| **Last Updated:** December 5, 2025 | |
| **Project:** Medical Data Extraction & AI Processing Platform | |
| --- | |
| ## Table of Contents | |
| 1. [Executive Summary](#executive-summary) | |
| 2. [System Overview](#system-overview) | |
| 3. [Architecture Design](#architecture-design) | |
| 4. [Technology Stack](#technology-stack) | |
| 5. [Core Components](#core-components) | |
| 6. [AI/ML Architecture](#aiml-architecture) | |
| 7. [API Architecture](#api-architecture) | |
| 8. [Data Flow & Processing](#data-flow--processing) | |
| 9. [Database Design](#database-design) | |
| 10. [Security Architecture](#security-architecture) | |
| 11. [Deployment Architecture](#deployment-architecture) | |
| 12. [Performance Optimization](#performance-optimization) | |
| 13. [Monitoring & Observability](#monitoring--observability) | |
| 14. [Development Workflow](#development-workflow) | |
| 15. [Integration Patterns](#integration-patterns) | |
| 16. [Scalability Considerations](#scalability-considerations) | |
| 17. [Future Roadmap](#future-roadmap) | |
| --- | |
| ## 1. Executive Summary | |
| HNTAI (Healthcare AI Text Analysis & Interpretation) is a production-ready, enterprise-grade medical AI platform designed for medical data extraction, processing, and analysis. The system provides HIPAA-compliant document processing, PHI scrubbing, and AI-powered patient summary generation with support for multiple AI model backends. | |
| ### Key Capabilities | |
| - **Multi-format Document Processing**: PDF, DOCX, images, and audio transcription | |
| - **HIPAA Compliance**: Automated PHI scrubbing with comprehensive audit logging | |
| - **Multi-Model AI Support**: Transformers, OpenVINO, and GGUF models with automatic optimization | |
| - **Scalable Architecture**: Kubernetes-ready with horizontal scaling capabilities | |
| - **Production-Ready**: Health checks, metrics, structured logging, and error handling | |
| ### Target Deployment Environments | |
| - **Hugging Face Spaces** (T4 Medium GPU) | |
| - **Kubernetes Clusters** (On-premise or cloud) | |
| - **Docker Containers** (Standalone or orchestrated) | |
| - **Local Development** (CPU or GPU) | |
| --- | |
| ## 2. System Overview | |
| ### 2.1 Purpose & Scope | |
| HNTAI serves as a comprehensive medical AI platform that bridges the gap between raw medical documents and actionable clinical insights. The system is designed to: | |
| 1. **Extract** structured medical data from unstructured documents | |
| 2. **Anonymize** protected health information (PHI) for compliance | |
| 3. **Summarize** patient records into comprehensive clinical assessments | |
| 4. **Process** multi-modal medical data (text, images, audio) | |
| ### 2.2 Design Principles | |
| - **Simplicity**: Clean, maintainable codebase with essential features | |
| - **Flexibility**: Support for multiple AI model types and backends | |
| - **Security**: HIPAA-compliant with comprehensive audit trails | |
| - **Performance**: Optimized for T4 GPU with intelligent caching | |
| - **Reliability**: Robust error handling and automatic fallback mechanisms | |
| ### 2.3 High-Level Architecture | |
| ```mermaid | |
| graph TB | |
| subgraph "Client Layer" | |
| A[Web Client] | |
| B[Mobile Client] | |
| C[API Client] | |
| end | |
| subgraph "API Gateway" | |
| D[FastAPI Application] | |
| E[Health Endpoints] | |
| F[Metrics Endpoint] | |
| end | |
| subgraph "Service Layer" | |
| G[Document Processing Service] | |
| H[PHI Scrubbing Service] | |
| I[Patient Summary Service] | |
| J[Model Management Service] | |
| end | |
| subgraph "AI/ML Layer" | |
| K[Unified Model Manager] | |
| L[Transformers Models] | |
| M[GGUF Models] | |
| N[OpenVINO Models] | |
| O[Whisper Audio Models] | |
| end | |
| subgraph "Data Layer" | |
| P[PostgreSQL - Audit Logs] | |
| Q[File Storage] | |
| R[Model Cache] | |
| end | |
| A --> D | |
| B --> D | |
| C --> D | |
| D --> E | |
| D --> F | |
| D --> G | |
| D --> H | |
| D --> I | |
| D --> J | |
| G --> K | |
| H --> K | |
| I --> K | |
| J --> K | |
| K --> L | |
| K --> M | |
| K --> N | |
| K --> O | |
| D --> P | |
| G --> Q | |
| K --> R | |
| ``` | |
| --- | |
| ## 3. Architecture Design | |
| ### 3.1 Architectural Style | |
| HNTAI follows a **Layered Monolithic Architecture** with clear separation of concerns: | |
| 1. **Presentation Layer**: FastAPI routes and endpoints | |
| 2. **Service Layer**: Business logic and orchestration | |
| 3. **Agent Layer**: Specialized AI agents for specific tasks | |
| 4. **Utility Layer**: Shared utilities and helpers | |
| 5. **Data Layer**: Database and file storage | |
| ### 3.2 Component Architecture | |
| ```mermaid | |
| graph LR | |
| subgraph "FastAPI Application" | |
| A[routes_fastapi.py] | |
| B[app.py] | |
| C[main.py] | |
| end | |
| subgraph "Agents" | |
| D[patient_summary_agent.py] | |
| E[phi_scrubber.py] | |
| F[text_extractor.py] | |
| G[medical_data_extractor.py] | |
| end | |
| subgraph "Services" | |
| H[job_manager.py] | |
| I[request_queue.py] | |
| J[error_handler.py] | |
| K[sse_generator.py] | |
| end | |
| subgraph "Utils" | |
| L[unified_model_manager.py] | |
| M[model_config.py] | |
| N[robust_json_parser.py] | |
| O[memory_manager.py] | |
| end | |
| A --> D | |
| A --> E | |
| A --> F | |
| A --> G | |
| A --> H | |
| A --> I | |
| D --> L | |
| E --> L | |
| F --> L | |
| G --> L | |
| L --> M | |
| L --> O | |
| ``` | |
| ### 3.3 Directory Structure | |
| ``` | |
| HNTAI/ | |
| ├── services/ | |
| │ └── ai-service/ | |
| │ └── src/ | |
| │ └── ai_med_extract/ | |
| │ ├── agents/ # AI agents for specific tasks | |
| │ │ ├── patient_summary_agent.py | |
| │ │ ├── phi_scrubber.py | |
| │ │ ├── text_extractor.py | |
| │ │ └── medical_data_extractor.py | |
| │ ├── api/ # FastAPI routes | |
| │ │ └── routes_fastapi.py | |
| │ ├── services/ # Business logic services | |
| │ │ ├── job_manager.py | |
| │ │ ├── request_queue.py | |
| │ │ ├── error_handler.py | |
| │ │ └── sse_generator.py | |
| │ ├── utils/ # Utilities and helpers | |
| │ │ ├── unified_model_manager.py | |
| │ │ ├── model_config.py | |
| │ │ ├── robust_json_parser.py | |
| │ │ ├── memory_manager.py | |
| │ │ ├── openvino_summarizer_utils.py | |
| │ │ └── patient_summary_utils.py | |
| │ ├── app.py # FastAPI app factory | |
| │ ├── main.py # Entry point | |
| │ ├── health_endpoints.py # Health checks | |
| │ └── database_audit.py # HIPAA audit logging | |
| ├── docs/ # Documentation | |
| ├── infra/ # Infrastructure configs | |
| │ └── k8s/ # Kubernetes manifests | |
| ├── app.py # HF Spaces entry point | |
| ├── Dockerfile # Multi-stage Docker build | |
| ├── Dockerfile.hf-spaces # HF Spaces optimized | |
| ├── .huggingface.yaml # HF Spaces config | |
| ├── models_config.json # Model configuration | |
| ├── requirements.txt # Python dependencies | |
| └── README.md # Project documentation | |
| ``` | |
| --- | |
| ## 4. Technology Stack | |
| ### 4.1 Core Technologies | |
| | Category | Technology | Version | Purpose | | |
| |----------|-----------|---------|---------| | |
| | **Runtime** | Python | 3.10+ | Primary language | | |
| | **Web Framework** | FastAPI | Latest | REST API framework | | |
| | **ASGI Server** | Uvicorn | Latest | Production server | | |
| | **AI/ML Framework** | PyTorch | 2.x | Deep learning | | |
| | **Transformers** | Hugging Face Transformers | Latest | Model loading | | |
| | **GGUF Support** | llama-cpp-python | Latest | Quantized models | | |
| | **OpenVINO** | optimum-intel | Latest | Intel optimization | | |
| | **Audio Processing** | Whisper | Latest | Speech-to-text | | |
| ### 4.2 Supporting Technologies | |
| | Category | Technology | Purpose | | |
| |----------|-----------|---------| | |
| | **Database** | PostgreSQL 13+ | Audit logs (optional) | | |
| | **Caching** | In-memory LRU | Model caching | | |
| | **Document Processing** | PyPDF2, python-docx | PDF/DOCX parsing | | |
| | **OCR** | Tesseract | Image text extraction | | |
| | **Audio** | FFmpeg | Audio processing | | |
| | **Containerization** | Docker | Deployment | | |
| | **Orchestration** | Kubernetes | Scaling | | |
| | **Monitoring** | Prometheus | Metrics | | |
| ### 4.3 Development Tools | |
| - **Code Quality**: Black, isort, flake8, mypy | |
| - **Testing**: pytest | |
| - **Version Control**: Git | |
| - **CI/CD**: GitHub Actions (potential) | |
| - **Documentation**: Markdown, Mermaid diagrams | |
| --- | |
| ## 5. Core Components | |
| ### 5.1 FastAPI Application (`app.py`) | |
| **Purpose**: Application factory and initialization | |
| **Key Responsibilities**: | |
| - Create and configure FastAPI application | |
| - Initialize agents and services | |
| - Register routes and middleware | |
| - Configure CORS and security | |
| **Key Functions**: | |
| ```python | |
| def create_app(initialize: bool = True) -> FastAPI | |
| def initialize_agents(app: FastAPI, preload_small_models: bool = False) | |
| def run_dev() # Development server | |
| ``` | |
| ### 5.2 API Routes (`routes_fastapi.py`) | |
| **Purpose**: RESTful API endpoints | |
| **Endpoint Categories**: | |
| #### Health & Monitoring | |
| - `GET /health/live` - Liveness probe | |
| - `GET /health/ready` - Readiness probe | |
| - `GET /metrics` - Prometheus metrics | |
| #### Document Processing | |
| - `POST /upload` - Upload and process documents | |
| - `POST /transcribe` - Audio transcription | |
| - `GET /get_updated_medical_data` - Retrieve processed data | |
| - `PUT /update_medical_data` - Update medical records | |
| #### AI Processing | |
| - `POST /generate_patient_summary` - Generate patient summaries | |
| - `POST /api/generate_summary` - Text summarization | |
| - `POST /api/patient_summary_openvino` - OpenVINO summaries | |
| - `POST /extract_medical_data` - Extract structured data | |
| #### Model Management | |
| - `POST /api/load_model` - Load specific models | |
| - `GET /api/model_info` - Model information | |
| - `POST /api/switch_model` - Switch models | |
| ### 5.3 Agents | |
| #### 5.3.1 Patient Summary Agent (`patient_summary_agent.py`) | |
| **Purpose**: Generate comprehensive patient summaries | |
| **Key Features**: | |
| - Dynamic model configuration | |
| - Multi-section summary generation | |
| - Chronological narrative building | |
| - Clinical guideline evaluation | |
| - Fallback text-based summarization | |
| **Core Methods**: | |
| ```python | |
| def configure_model(model_name: str, model_type: str) | |
| def generate_clinical_summary(patient_data: Union[List[str], Dict]) | |
| def generate_patient_summary(patient_data: Union[List[str], Dict]) | |
| def build_chronological_narrative(patient_data: dict) | |
| def format_clinical_output(raw_summary: str, patient_data: dict) | |
| ``` | |
| #### 5.3.2 PHI Scrubber (`phi_scrubber.py`) | |
| **Purpose**: Remove protected health information | |
| **Scrubbing Capabilities**: | |
| - Patient names | |
| - Medical record numbers (MRN) | |
| - Dates of birth | |
| - Phone numbers | |
| - Email addresses | |
| - Social Security Numbers | |
| - Addresses | |
| **Compliance**: HIPAA-compliant with audit logging | |
| #### 5.3.3 Text Extractor (`text_extractor.py`) | |
| **Purpose**: Extract text from various document formats | |
| **Supported Formats**: | |
| - PDF documents | |
| - DOCX files | |
| - Images (via OCR) | |
| - Plain text | |
| #### 5.3.4 Medical Data Extractor (`medical_data_extractor.py`) | |
| **Purpose**: Extract structured medical data from text | |
| **Extraction Targets**: | |
| - Diagnoses | |
| - Medications | |
| - Procedures | |
| - Lab results | |
| - Vital signs | |
| - Allergies | |
| ### 5.4 Services | |
| #### 5.4.1 Job Manager (`job_manager.py`) | |
| **Purpose**: Manage long-running jobs | |
| **Features**: | |
| - Job lifecycle management | |
| - Progress tracking | |
| - Status updates | |
| - Result caching | |
| - Cleanup of completed jobs | |
| #### 5.4.2 Request Queue (`request_queue.py`) | |
| **Purpose**: Queue and prioritize requests | |
| **Features**: | |
| - Request queuing | |
| - Priority handling | |
| - Concurrency control | |
| - Timeout management | |
| #### 5.4.3 Error Handler (`error_handler.py`) | |
| **Purpose**: Centralized error handling | |
| **Features**: | |
| - Error categorization | |
| - Contextual logging | |
| - Job error updates | |
| - Graceful degradation | |
| #### 5.4.4 SSE Generator (`sse_generator.py`) | |
| **Purpose**: Server-Sent Events for real-time updates | |
| **Features**: | |
| - Progress streaming | |
| - Status updates | |
| - Error notifications | |
| - Completion events | |
| --- | |
| ## 6. AI/ML Architecture | |
| ### 6.1 Unified Model Manager | |
| **File**: `unified_model_manager.py` | |
| **Purpose**: Single interface for all AI model types | |
| **Architecture**: | |
| ```mermaid | |
| classDiagram | |
| class BaseModel { | |
| <<abstract>> | |
| +name: str | |
| +model_type: str | |
| +status: ModelStatus | |
| +load() | |
| +generate(prompt, config)* | |
| +unload() | |
| } | |
| class TransformersModel { | |
| +_model: Pipeline | |
| +_load_implementation() | |
| +generate(prompt, config) | |
| } | |
| class GGUFModel { | |
| +_model: Llama | |
| +filename: str | |
| +_extract_filename() | |
| +_load_implementation() | |
| +generate(prompt, config) | |
| } | |
| class OpenVINOModel { | |
| +_model: OVModelForCausalLM | |
| +_tokenizer: AutoTokenizer | |
| +_load_implementation() | |
| +generate(prompt, config) | |
| } | |
| class FallbackModel { | |
| +_load_implementation() | |
| +generate(prompt, config) | |
| } | |
| class UnifiedModelManager { | |
| +max_models: int | |
| +max_memory_mb: int | |
| +get_model(name, type) | |
| +generate_text(name, prompt) | |
| +cleanup() | |
| } | |
| BaseModel <|-- TransformersModel | |
| BaseModel <|-- GGUFModel | |
| BaseModel <|-- OpenVINOModel | |
| BaseModel <|-- FallbackModel | |
| UnifiedModelManager --> BaseModel | |
| ``` | |
| ### 6.2 Model Types | |
| #### 6.2.1 Transformers Models | |
| **Backend**: Hugging Face Transformers | |
| **Device**: GPU (CUDA) or CPU | |
| **Use Cases**: General text generation, summarization | |
| **Supported Models**: | |
| - `microsoft/Phi-3-mini-4k-instruct` | |
| - `facebook/bart-large-cnn` (deprecated) | |
| - `google/flan-t5-large` | |
| **Configuration**: | |
| ```python | |
| { | |
| "model_name": "microsoft/Phi-3-mini-4k-instruct", | |
| "model_type": "text-generation", | |
| "device_map": "auto", | |
| "torch_dtype": "float16" | |
| } | |
| ``` | |
| #### 6.2.2 GGUF Models | |
| **Backend**: llama-cpp-python | |
| **Device**: CPU or GPU (via Metal/CUDA) | |
| **Use Cases**: Efficient inference with quantized models | |
| **Supported Models**: | |
| - `microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf` (PRIMARY) | |
| **Configuration**: | |
| ```python | |
| { | |
| "model_path": "path/to/model.gguf", | |
| "n_ctx": 8192, | |
| "n_threads": 4, | |
| "n_gpu_layers": 35 # GPU acceleration | |
| } | |
| ``` | |
| #### 6.2.3 OpenVINO Models | |
| **Backend**: Intel OpenVINO | |
| **Device**: CPU (Intel optimized) or GPU | |
| **Use Cases**: Production deployment on Intel hardware | |
| **Supported Models**: | |
| - `OpenVINO/Phi-3-mini-4k-instruct-fp16-ov` | |
| **Configuration**: | |
| ```python | |
| { | |
| "model_path": "OpenVINO/Phi-3-mini-4k-instruct-fp16-ov", | |
| "device": "GPU" if available else "CPU" | |
| } | |
| ``` | |
| ### 6.3 Model Selection Strategy | |
| ```mermaid | |
| flowchart TD | |
| A[Request with model_name] --> B{Model specified?} | |
| B -->|Yes| C{Model type?} | |
| B -->|No| D[Use default: Phi-3 GGUF] | |
| C -->|GGUF| E[Load GGUF Model] | |
| C -->|OpenVINO| F[Load OpenVINO Model] | |
| C -->|Transformers| G[Load Transformers Model] | |
| C -->|Unknown| H[Auto-detect type] | |
| E --> I{Load successful?} | |
| F --> I | |
| G --> I | |
| H --> I | |
| D --> I | |
| I -->|Yes| J[Generate with model] | |
| I -->|No| K[Try fallback model] | |
| K --> L{Fallback successful?} | |
| L -->|Yes| J | |
| L -->|No| M[Use text-based fallback] | |
| ``` | |
| ### 6.4 Model Configuration | |
| **File**: `models_config.json` | |
| ```json | |
| { | |
| "patient_summary_models": [ | |
| { | |
| "name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf", | |
| "type": "gguf", | |
| "is_active": true, | |
| "cached": true, | |
| "description": "Phi-3 Mini GGUF Q4 quantized - PRIMARY MODEL", | |
| "use_case": "Fast patient summary generation with CPU/GPU", | |
| "repo_id": "microsoft/Phi-3-mini-4k-instruct-gguf", | |
| "filename": "Phi-3-mini-4k-instruct-q4.gguf" | |
| } | |
| ], | |
| "runtime_behavior": { | |
| "allow_runtime_downloads": true, | |
| "cache_runtime_downloads": true, | |
| "fallback_to_cached": true | |
| } | |
| } | |
| ``` | |
| ### 6.5 Token Management | |
| **Token Limit Handling**: | |
| - Automatic token counting (heuristic: ~4 chars/token) | |
| - Pre-generation validation | |
| - Token limit error detection | |
| - Graceful degradation | |
| **Token Limits by Model**: | |
| - Phi-3 models: 4096 tokens (context window) | |
| - BART models: 1024 tokens | |
| - T5 models: 512 tokens | |
| ### 6.6 Generation Configuration | |
| ```python | |
| @dataclass | |
| class GenerationConfig: | |
| max_tokens: int = 8192 # Maximum output tokens | |
| min_tokens: int = 50 # Minimum output tokens | |
| temperature: float = 0.3 # Deterministic for medical | |
| top_p: float = 0.9 # Nucleus sampling | |
| timeout: float = 180.0 # T4 timeout | |
| stream: bool = False # Streaming support | |
| ``` | |
| ### 6.7 T4 GPU Optimizations | |
| **Hardware Target**: NVIDIA T4 Medium (16GB GPU, 16GB RAM) | |
| **Optimizations**: | |
| 1. **Memory Management**: | |
| - Max 2 models in memory | |
| - Automatic model unloading | |
| - GPU memory clearing | |
| - Garbage collection | |
| 2. **Model Loading**: | |
| - Lazy loading (on-demand) | |
| - Intelligent caching | |
| - LRU eviction policy | |
| 3. **Inference**: | |
| - FP16 precision | |
| - Batch size: 1 | |
| - Context window: 8192 tokens | |
| - GPU layer offloading (GGUF) | |
| --- | |
| ## 7. API Architecture | |
| ### 7.1 RESTful Design | |
| **Principles**: | |
| - Resource-oriented URLs | |
| - HTTP methods for CRUD operations | |
| - JSON request/response format | |
| - Stateless communication | |
| - Proper HTTP status codes | |
| ### 7.2 Request/Response Flow | |
| ```mermaid | |
| sequenceDiagram | |
| participant C as Client | |
| participant A as API Gateway | |
| participant S as Service Layer | |
| participant M as Model Manager | |
| participant D as Database | |
| C->>A: POST /generate_patient_summary | |
| A->>A: Validate request | |
| A->>S: Create job | |
| S->>D: Log job creation | |
| A-->>C: 202 Accepted (job_id) | |
| S->>M: Load model | |
| M->>M: Check cache | |
| M->>M: Load if needed | |
| M-->>S: Model ready | |
| S->>M: Generate summary | |
| M->>M: Process prompt | |
| M-->>S: Generated text | |
| S->>D: Log completion | |
| S->>A: Update job status | |
| A-->>C: SSE: Progress updates | |
| C->>A: GET /job/{job_id} | |
| A->>S: Get job status | |
| S->>D: Retrieve job | |
| S-->>A: Job result | |
| A-->>C: 200 OK (result) | |
| ``` | |
| ### 7.3 Authentication & Authorization | |
| **Current State**: Basic API key authentication (optional) | |
| **Planned Enhancements**: | |
| - JWT-based authentication | |
| - Role-based access control (RBAC) | |
| - OAuth2 integration | |
| - API rate limiting | |
| ### 7.4 Error Handling | |
| **Error Response Format**: | |
| ```json | |
| { | |
| "error": { | |
| "code": "MODEL_LOAD_FAILED", | |
| "message": "Failed to load model: microsoft/Phi-3-mini-4k-instruct", | |
| "details": { | |
| "model_name": "microsoft/Phi-3-mini-4k-instruct", | |
| "error_type": "initialization_error", | |
| "timestamp": "2025-12-05T17:23:52Z" | |
| } | |
| } | |
| } | |
| ``` | |
| **HTTP Status Codes**: | |
| - `200 OK` - Successful request | |
| - `202 Accepted` - Job created | |
| - `400 Bad Request` - Invalid input | |
| - `404 Not Found` - Resource not found | |
| - `500 Internal Server Error` - Server error | |
| - `503 Service Unavailable` - Service degraded | |
| ### 7.5 Rate Limiting | |
| **Strategy**: Token bucket algorithm | |
| **Limits**: | |
| - 100 requests/minute per IP | |
| - 1000 requests/hour per API key | |
| - Burst allowance: 20 requests | |
| --- | |
| ## 8. Data Flow & Processing | |
| ### 8.1 Document Processing Pipeline | |
| ```mermaid | |
| flowchart LR | |
| A[Upload Document] --> B{File Type?} | |
| B -->|PDF| C[PDF Parser] | |
| B -->|DOCX| D[DOCX Parser] | |
| B -->|Image| E[OCR Engine] | |
| B -->|Audio| F[Whisper Transcription] | |
| C --> G[Text Extraction] | |
| D --> G | |
| E --> G | |
| F --> G | |
| G --> H[PHI Scrubbing] | |
| H --> I[Medical Data Extraction] | |
| I --> J[Store Processed Data] | |
| J --> K[Return Results] | |
| ``` | |
| ### 8.2 Patient Summary Generation Flow | |
| ```mermaid | |
| flowchart TD | |
| A[Patient Data Input] --> B[Parse EHR Data] | |
| B --> C[Convert to Plain Text] | |
| C --> D{Data Size Check} | |
| D -->|Small| E[Single-pass Generation] | |
| D -->|Large| F[Chunking Strategy] | |
| F --> G[Chunk by Date/Size] | |
| G --> H[Process Chunks in Parallel] | |
| H --> I[Combine Chunk Summaries] | |
| E --> J[Generate with Model] | |
| I --> J | |
| J --> K[Format Clinical Output] | |
| K --> L[Evaluate Against Guidelines] | |
| L --> M[Return Summary] | |
| ``` | |
| ### 8.3 Data Transformation | |
| **Input Formats**: | |
| - Raw EHR JSON | |
| - HL7 FHIR resources | |
| - Plain text documents | |
| - Scanned images | |
| - Audio recordings | |
| **Output Formats**: | |
| - Structured JSON | |
| - Clinical summary (Markdown) | |
| - FHIR-compliant resources | |
| - Audit logs | |
| ### 8.4 Caching Strategy | |
| **Multi-Level Caching**: | |
| 1. **Model Cache**: Loaded models in memory | |
| 2. **Result Cache**: Generated summaries (LRU) | |
| 3. **File Cache**: Processed documents | |
| 4. **Hugging Face Cache**: Downloaded models | |
| **Cache Invalidation**: | |
| - Time-based expiration | |
| - Manual invalidation | |
| - Memory pressure-based eviction | |
| --- | |
| ## 9. Database Design | |
| ### 9.1 Database Schema | |
| **Primary Database**: PostgreSQL (optional, for audit logs) | |
| #### Audit Logs Table | |
| ```sql | |
| CREATE TABLE audit_logs ( | |
| id SERIAL PRIMARY KEY, | |
| timestamp TIMESTAMP NOT NULL DEFAULT NOW(), | |
| user_id VARCHAR(255), | |
| action VARCHAR(100) NOT NULL, | |
| resource_type VARCHAR(100), | |
| resource_id VARCHAR(255), | |
| phi_accessed BOOLEAN DEFAULT FALSE, | |
| ip_address INET, | |
| user_agent TEXT, | |
| request_data JSONB, | |
| response_status INTEGER, | |
| error_message TEXT, | |
| created_at TIMESTAMP DEFAULT NOW() | |
| ); | |
| CREATE INDEX idx_audit_timestamp ON audit_logs(timestamp); | |
| CREATE INDEX idx_audit_user ON audit_logs(user_id); | |
| CREATE INDEX idx_audit_action ON audit_logs(action); | |
| CREATE INDEX idx_audit_phi ON audit_logs(phi_accessed); | |
| ``` | |
| ### 9.2 Data Models | |
| **Patient Data Model** (In-memory): | |
| ```python | |
| { | |
| "patient_id": "string", | |
| "demographics": { | |
| "name": "string", | |
| "dob": "date", | |
| "gender": "string", | |
| "mrn": "string" | |
| }, | |
| "visits": [ | |
| { | |
| "visit_id": "string", | |
| "date": "datetime", | |
| "chief_complaint": "string", | |
| "diagnoses": ["string"], | |
| "medications": ["string"], | |
| "procedures": ["string"], | |
| "vitals": {}, | |
| "labs": [] | |
| } | |
| ] | |
| } | |
| ``` | |
| ### 9.3 File Storage | |
| **Storage Strategy**: Local filesystem or cloud storage | |
| **Directory Structure**: | |
| ``` | |
| /data/ | |
| ├── uploads/ # Uploaded documents | |
| ├── processed/ # Processed documents | |
| ├── cache/ # Temporary cache | |
| └── models/ # Model files | |
| ``` | |
| --- | |
| ## 10. Security Architecture | |
| ### 10.1 HIPAA Compliance | |
| **Requirements Met**: | |
| 1. **Access Controls**: Authentication and authorization | |
| 2. **Audit Logging**: Comprehensive activity logs | |
| 3. **Data Encryption**: In-transit and at-rest | |
| 4. **PHI Scrubbing**: Automated anonymization | |
| 5. **Secure Communication**: HTTPS/TLS | |
| ### 10.2 PHI Scrubbing | |
| **Scrubbing Patterns**: | |
| ```python | |
| PATTERNS = { | |
| "name": r'\b[A-Z][a-z]+ [A-Z][a-z]+\b', | |
| "mrn": r'\bMRN[:\s]*\d{6,10}\b', | |
| "dob": r'\b\d{1,2}/\d{1,2}/\d{2,4}\b', | |
| "phone": r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', | |
| "email": r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', | |
| "ssn": r'\b\d{3}-\d{2}-\d{4}\b' | |
| } | |
| ``` | |
| ### 10.3 Container Security | |
| **Security Measures**: | |
| - Non-root user execution | |
| - Read-only root filesystem | |
| - Resource limits (CPU, memory) | |
| - Network policies | |
| - Secrets management | |
| - Minimal base images | |
| ### 10.4 API Security | |
| **Security Headers**: | |
| ```python | |
| { | |
| "X-Content-Type-Options": "nosniff", | |
| "X-Frame-Options": "DENY", | |
| "X-XSS-Protection": "1; mode=block", | |
| "Strict-Transport-Security": "max-age=31536000" | |
| } | |
| ``` | |
| --- | |
| ## 11. Deployment Architecture | |
| ### 11.1 Deployment Options | |
| #### 11.1.1 Hugging Face Spaces | |
| **Configuration**: `.huggingface.yaml` | |
| ```yaml | |
| runtime: docker | |
| sdk: docker | |
| python_version: "3.10" | |
| build: | |
| dockerfile: Dockerfile.hf-spaces | |
| cache: true | |
| hardware: | |
| gpu: t4-medium # 16GB GPU RAM, 16GB System RAM | |
| env: | |
| - SPACE_ID=$SPACE_ID | |
| - HF_HOME=/app/.cache/huggingface | |
| - TORCH_HOME=/app/.cache/torch | |
| - MODEL_CACHE_DIR=/app/models | |
| - PRELOAD_GGUF=true | |
| - HF_SPACES=true | |
| ``` | |
| **Optimizations**: | |
| - Pre-cached models in Docker image | |
| - Lazy model loading | |
| - Memory-efficient inference | |
| - Automatic GPU detection | |
| #### 11.1.2 Kubernetes | |
| **Deployment Manifest**: | |
| ```yaml | |
| apiVersion: apps/v1 | |
| kind: Deployment | |
| metadata: | |
| name: hntai-deployment | |
| spec: | |
| replicas: 3 | |
| selector: | |
| matchLabels: | |
| app: hntai | |
| template: | |
| metadata: | |
| labels: | |
| app: hntai | |
| spec: | |
| containers: | |
| - name: hntai | |
| image: hntai:latest | |
| ports: | |
| - containerPort: 7860 | |
| resources: | |
| requests: | |
| memory: "4Gi" | |
| cpu: "2" | |
| limits: | |
| memory: "8Gi" | |
| cpu: "4" | |
| livenessProbe: | |
| httpGet: | |
| path: /health/live | |
| port: 7860 | |
| initialDelaySeconds: 30 | |
| periodSeconds: 10 | |
| readinessProbe: | |
| httpGet: | |
| path: /health/ready | |
| port: 7860 | |
| initialDelaySeconds: 10 | |
| periodSeconds: 5 | |
| ``` | |
| #### 11.1.3 Docker | |
| **Multi-Stage Dockerfile**: | |
| ```dockerfile | |
| # Stage 1: Builder | |
| FROM python:3.10-slim AS builder | |
| RUN apt-get update && apt-get install -y build-essential | |
| COPY requirements.txt . | |
| RUN pip install --prefix=/install -r requirements.txt | |
| # Stage 2: Runtime | |
| FROM python:3.10-slim AS runtime | |
| COPY --from=builder /install /usr/local | |
| WORKDIR /app | |
| COPY . . | |
| ENV PYTHONUNBUFFERED=1 | |
| EXPOSE 7860 | |
| CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"] | |
| ``` | |
| ### 11.2 Scaling Strategy | |
| **Horizontal Scaling**: | |
| - Multiple replicas behind load balancer | |
| - Stateless design for easy scaling | |
| - Shared model cache (optional) | |
| **Vertical Scaling**: | |
| - Increase CPU/memory per instance | |
| - GPU acceleration for inference | |
| - Larger model support | |
| ### 11.3 High Availability | |
| **Components**: | |
| 1. **Load Balancer**: Distribute traffic | |
| 2. **Health Checks**: Automatic failover | |
| 3. **Auto-scaling**: Based on CPU/memory | |
| 4. **Graceful Shutdown**: Drain connections | |
| --- | |
| ## 12. Performance Optimization | |
| ### 12.1 Model Optimization | |
| **Techniques**: | |
| 1. **Quantization**: GGUF Q4 models (4-bit) | |
| 2. **Precision**: FP16 for GPU inference | |
| 3. **Batching**: Batch size optimization | |
| 4. **Caching**: Model and result caching | |
| 5. **Lazy Loading**: On-demand model loading | |
| ### 12.2 Memory Management | |
| **Strategies**: | |
| - Automatic garbage collection | |
| - GPU memory clearing | |
| - Model unloading (LRU) | |
| - Memory pressure monitoring | |
| **Memory Limits**: | |
| - T4 Medium: 16GB GPU, 16GB RAM | |
| - Max 2 models in memory | |
| - Automatic eviction at 80% usage | |
| ### 12.3 Inference Optimization | |
| **T4-Specific Optimizations**: | |
| ```python | |
| { | |
| "max_models": 2, | |
| "max_memory_mb": 14000, | |
| "n_ctx": 8192, | |
| "n_threads": 4, | |
| "n_gpu_layers": 35, | |
| "torch_dtype": "float16", | |
| "device_map": "auto" | |
| } | |
| ``` | |
| ### 12.4 Caching Strategy | |
| **Cache Hierarchy**: | |
| 1. **L1 - Model Cache**: In-memory loaded models | |
| 2. **L2 - Result Cache**: Generated summaries (LRU, 100 items) | |
| 3. **L3 - File Cache**: Processed documents (disk) | |
| 4. **L4 - HF Cache**: Downloaded models (disk) | |
| ### 12.5 Performance Metrics | |
| **Target Metrics**: | |
| - Model load time: < 10 seconds | |
| - Summary generation: < 60 seconds (small), < 180 seconds (large) | |
| - API response time: < 100ms (excluding generation) | |
| - Memory usage: < 80% of available | |
| - GPU utilization: > 70% during inference | |
| --- | |
| ## 13. Monitoring & Observability | |
| ### 13.1 Health Checks | |
| **Liveness Probe** (`/health/live`): | |
| ```python | |
| { | |
| "status": "alive", | |
| "timestamp": "2025-12-05T17:23:52Z" | |
| } | |
| ``` | |
| **Readiness Probe** (`/health/ready`): | |
| ```python | |
| { | |
| "status": "ready", | |
| "checks": { | |
| "database": "ok", | |
| "model_manager": "ok", | |
| "file_storage": "ok" | |
| }, | |
| "timestamp": "2025-12-05T17:23:52Z" | |
| } | |
| ``` | |
| ### 13.2 Metrics | |
| **Prometheus Metrics** (`/metrics`): | |
| ``` | |
| # Model metrics | |
| model_load_time_seconds{model_name="phi-3-gguf"} 8.5 | |
| model_inference_time_seconds{model_name="phi-3-gguf"} 45.2 | |
| model_memory_usage_bytes{model_name="phi-3-gguf"} 4294967296 | |
| # API metrics | |
| http_requests_total{method="POST",endpoint="/generate_patient_summary"} 1234 | |
| http_request_duration_seconds{method="POST",endpoint="/generate_patient_summary"} 52.3 | |
| # System metrics | |
| memory_usage_percent 65.2 | |
| gpu_memory_usage_percent 72.1 | |
| cpu_usage_percent 45.8 | |
| ``` | |
| ### 13.3 Logging | |
| **Structured Logging**: | |
| ```python | |
| { | |
| "timestamp": "2025-12-05T17:23:52Z", | |
| "level": "INFO", | |
| "logger": "ai_med_extract.agents.patient_summary_agent", | |
| "message": "Generated patient summary", | |
| "context": { | |
| "job_id": "abc123", | |
| "model_name": "phi-3-gguf", | |
| "duration_seconds": 45.2, | |
| "token_count": 2048 | |
| } | |
| } | |
| ``` | |
| **Log Levels**: | |
| - `DEBUG`: Detailed diagnostic information | |
| - `INFO`: General informational messages | |
| - `WARNING`: Warning messages | |
| - `ERROR`: Error messages | |
| - `CRITICAL`: Critical failures | |
| ### 13.4 Audit Logging | |
| **HIPAA Audit Trail**: | |
| ```python | |
| { | |
| "timestamp": "2025-12-05T17:23:52Z", | |
| "user_id": "user123", | |
| "action": "PHI_ACCESS", | |
| "resource_type": "patient_summary", | |
| "resource_id": "patient456", | |
| "phi_accessed": true, | |
| "ip_address": "192.168.1.100", | |
| "user_agent": "Mozilla/5.0...", | |
| "request_data": {...}, | |
| "response_status": 200 | |
| } | |
| ``` | |
| --- | |
| ## 14. Development Workflow | |
| ### 14.1 Local Development | |
| **Setup**: | |
| ```bash | |
| # Clone repository | |
| git clone <repository-url> | |
| cd HNTAI | |
| # Create virtual environment | |
| python -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Set environment variables | |
| export DATABASE_URL="postgresql://user:pass@localhost:5432/hntai" | |
| export SECRET_KEY="your-secret-key" | |
| export HF_HOME="/tmp/huggingface" | |
| # Run development server | |
| cd services/ai-service/src | |
| python -m ai_med_extract.app run_dev | |
| ``` | |
| ### 14.2 Testing | |
| **Test Structure**: | |
| ``` | |
| tests/ | |
| ├── unit/ | |
| │ ├── test_agents.py | |
| │ ├── test_model_manager.py | |
| │ └── test_utils.py | |
| ├── integration/ | |
| │ ├── test_api.py | |
| │ └── test_workflows.py | |
| └── conftest.py | |
| ``` | |
| **Running Tests**: | |
| ```bash | |
| # Unit tests | |
| python -m pytest tests/unit/ | |
| # Integration tests | |
| python -m pytest tests/integration/ | |
| # Coverage report | |
| python -m pytest --cov=ai_med_extract tests/ | |
| ``` | |
| ### 14.3 Code Quality | |
| **Tools**: | |
| ```bash | |
| # Format code | |
| black . | |
| isort . | |
| # Lint code | |
| flake8 . | |
| # Type checking | |
| mypy services/ai-service/src/ai_med_extract/ | |
| ``` | |
| ### 14.4 Git Workflow | |
| **Branching Strategy**: | |
| - `main`: Production-ready code | |
| - `develop`: Integration branch | |
| - `feature/*`: Feature branches | |
| - `bugfix/*`: Bug fix branches | |
| - `hotfix/*`: Production hotfixes | |
| **Commit Convention**: | |
| ``` | |
| <type>(<scope>): <subject> | |
| <body> | |
| <footer> | |
| ``` | |
| Types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore` | |
| --- | |
| ## 15. Integration Patterns | |
| ### 15.1 External System Integration | |
| **Integration Points**: | |
| 1. **EHR Systems**: HL7, FHIR APIs | |
| 2. **Document Management**: File uploads, cloud storage | |
| 3. **Authentication**: OAuth2, SAML | |
| 4. **Monitoring**: Prometheus, Grafana | |
| 5. **Logging**: ELK Stack, CloudWatch | |
| ### 15.2 API Integration | |
| **Client Libraries** (Planned): | |
| - Python SDK | |
| - JavaScript SDK | |
| - REST API documentation (OpenAPI/Swagger) | |
| **Example Integration**: | |
| ```python | |
| import requests | |
| # Upload document | |
| response = requests.post( | |
| "https://api.hntai.com/upload", | |
| files={"file": open("document.pdf", "rb")}, | |
| headers={"Authorization": "Bearer <token>"} | |
| ) | |
| # Generate patient summary | |
| response = requests.post( | |
| "https://api.hntai.com/generate_patient_summary", | |
| json={ | |
| "patient_data": {...}, | |
| "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf", | |
| "model_type": "gguf" | |
| }, | |
| headers={"Authorization": "Bearer <token>"} | |
| ) | |
| job_id = response.json()["job_id"] | |
| # Poll for results | |
| while True: | |
| response = requests.get( | |
| f"https://api.hntai.com/job/{job_id}", | |
| headers={"Authorization": "Bearer <token>"} | |
| ) | |
| if response.json()["status"] == "completed": | |
| break | |
| time.sleep(5) | |
| ``` | |
| ### 15.3 Webhook Support | |
| **Planned Feature**: Webhook notifications for job completion | |
| ```python | |
| { | |
| "event": "job.completed", | |
| "job_id": "abc123", | |
| "timestamp": "2025-12-05T17:23:52Z", | |
| "data": { | |
| "status": "completed", | |
| "result": {...} | |
| } | |
| } | |
| ``` | |
| --- | |
| ## 16. Scalability Considerations | |
| ### 16.1 Horizontal Scaling | |
| **Strategies**: | |
| 1. **Stateless Design**: No session state in application | |
| 2. **Load Balancing**: Distribute requests across instances | |
| 3. **Shared Cache**: Redis for distributed caching | |
| 4. **Message Queue**: RabbitMQ/Kafka for async processing | |
| ### 16.2 Vertical Scaling | |
| **Resource Scaling**: | |
| - CPU: 2-8 cores per instance | |
| - Memory: 8-32 GB per instance | |
| - GPU: T4, V100, A100 for inference | |
| ### 16.3 Database Scaling | |
| **Strategies**: | |
| 1. **Read Replicas**: For audit log queries | |
| 2. **Partitioning**: Time-based partitioning for logs | |
| 3. **Indexing**: Optimize query performance | |
| 4. **Archiving**: Move old logs to cold storage | |
| ### 16.4 Model Serving | |
| **Scaling Options**: | |
| 1. **Model Replication**: Same model on multiple instances | |
| 2. **Model Sharding**: Different models on different instances | |
| 3. **Model Versioning**: A/B testing with multiple versions | |
| 4. **Dedicated Inference**: Separate inference service | |
| --- | |
| ## 17. Future Roadmap | |
| ### 17.1 Short-Term (3-6 months) | |
| 1. **Enhanced Model Support**: | |
| - Support for Llama 3, Mistral models | |
| - Fine-tuned medical models | |
| - Multi-modal models (text + images) | |
| 2. **Improved Performance**: | |
| - Model quantization (INT8, INT4) | |
| - Batch inference support | |
| - Streaming responses | |
| 3. **Additional Features**: | |
| - Real-time collaboration | |
| - Version control for summaries | |
| - Template-based summaries | |
| ### 17.2 Medium-Term (6-12 months) | |
| 1. **Advanced AI Capabilities**: | |
| - Multi-agent orchestration | |
| - Retrieval-Augmented Generation (RAG) | |
| - Knowledge graph integration | |
| 2. **Enterprise Features**: | |
| - Multi-tenancy support | |
| - Advanced RBAC | |
| - SSO integration | |
| - Compliance reporting | |
| 3. **Platform Enhancements**: | |
| - Web UI for management | |
| - Mobile app support | |
| - Plugin architecture | |
| ### 17.3 Long-Term (12+ months) | |
| 1. **AI/ML Advancements**: | |
| - Custom model training pipeline | |
| - Federated learning support | |
| - Explainable AI (XAI) | |
| 2. **Ecosystem Integration**: | |
| - FHIR server integration | |
| - HL7 v3 support | |
| - DICOM image analysis | |
| 3. **Global Expansion**: | |
| - Multi-language support | |
| - Regional compliance (GDPR, etc.) | |
| - Edge deployment | |
| --- | |
| ## Appendix A: Configuration Reference | |
| ### Environment Variables | |
| | Variable | Description | Default | Required | | |
| |----------|-------------|---------|----------| | |
| | `DATABASE_URL` | PostgreSQL connection string | - | No | | |
| | `SECRET_KEY` | Application secret key | - | Yes | | |
| | `JWT_SECRET_KEY` | JWT signing key | - | Yes | | |
| | `HF_HOME` | Hugging Face cache directory | `/tmp/huggingface` | No | | |
| | `TORCH_HOME` | PyTorch cache directory | `/tmp/torch` | No | | |
| | `WHISPER_CACHE` | Whisper model cache | `/tmp/whisper` | No | | |
| | `HF_SPACES` | Hugging Face Spaces mode | `false` | No | | |
| | `PRELOAD_GGUF` | Preload GGUF models | `false` | No | | |
| | `MAX_NEW_TOKENS` | Max output tokens | `8192` | No | | |
| | `MAX_INPUT_TOKENS` | Max input tokens | `2048` | No | | |
| --- | |
| ## Appendix B: API Reference | |
| ### Complete Endpoint List | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/` | Root endpoint | | |
| | `GET` | `/health/live` | Liveness probe | | |
| | `GET` | `/health/ready` | Readiness probe | | |
| | `GET` | `/metrics` | Prometheus metrics | | |
| | `POST` | `/upload` | Upload document | | |
| | `POST` | `/transcribe` | Transcribe audio | | |
| | `POST` | `/generate_patient_summary` | Generate patient summary | | |
| | `POST` | `/api/generate_summary` | Generate text summary | | |
| | `POST` | `/api/patient_summary_openvino` | OpenVINO summary | | |
| | `POST` | `/extract_medical_data` | Extract medical data | | |
| | `GET` | `/get_updated_medical_data` | Get processed data | | |
| | `PUT` | `/update_medical_data` | Update medical data | | |
| | `POST` | `/api/load_model` | Load model | | |
| | `GET` | `/api/model_info` | Get model info | | |
| | `POST` | `/api/switch_model` | Switch model | | |
| --- | |
| ## Appendix C: Troubleshooting Guide | |
| ### Common Issues | |
| #### Model Loading Failures | |
| **Symptom**: Model fails to load | |
| **Causes**: | |
| - Insufficient memory | |
| - Missing dependencies | |
| - Network issues (download) | |
| **Solutions**: | |
| 1. Check memory availability | |
| 2. Verify dependencies installed | |
| 3. Check network connectivity | |
| 4. Use fallback model | |
| #### Token Limit Errors | |
| **Symptom**: "Input exceeds token limit" | |
| **Causes**: | |
| - Input too long | |
| - Model context window exceeded | |
| **Solutions**: | |
| 1. Reduce input size | |
| 2. Use chunking strategy | |
| 3. Switch to larger context model | |
| #### Performance Issues | |
| **Symptom**: Slow inference | |
| **Causes**: | |
| - CPU-only inference | |
| - Large model size | |
| - Memory pressure | |
| **Solutions**: | |
| 1. Enable GPU acceleration | |
| 2. Use quantized models (GGUF) | |
| 3. Reduce batch size | |
| 4. Clear model cache | |
| --- | |
| ## Appendix D: Glossary | |
| | Term | Definition | | |
| |------|------------| | |
| | **PHI** | Protected Health Information | | |
| | **HIPAA** | Health Insurance Portability and Accountability Act | | |
| | **EHR** | Electronic Health Record | | |
| | **FHIR** | Fast Healthcare Interoperability Resources | | |
| | **HL7** | Health Level 7 (healthcare data standard) | | |
| | **GGUF** | GPT-Generated Unified Format (quantized models) | | |
| | **OpenVINO** | Open Visual Inference and Neural Network Optimization | | |
| | **T4** | NVIDIA Tesla T4 GPU | | |
| | **LRU** | Least Recently Used (cache eviction) | | |
| | **SSE** | Server-Sent Events | | |
| | **ASGI** | Asynchronous Server Gateway Interface | | |
| --- | |
| ## Document Revision History | |
| | Version | Date | Author | Changes | | |
| |---------|------|--------|---------| | |
| | 1.0 | 2025-12-05 | System | Initial comprehensive documentation | | |
| --- | |
| **End of Technical Architecture Documentation** | |