AI-Service — Local Run Guide This file explains all practical ways to run the AI service locally on a developer machine (Windows + PowerShell examples included), including: dev server, WSGI/Gunicorn, Docker, Docker Compose, using a mock EHR for testing, and quick API calls. Repository layout - services/ai-service/src - application source - services/ai-service/Dockerfile.prod - services/ai-service/docker-compose.yml - services/ai-service/README_DOCKER.md - higher level Docker/deploy notes Ports - Default app port: 7860 1) Quick dev server (fast iteration) Requirements - Python 3.10+ - pip Start dev server (does not preload large models): ```powershell cd .\services\ai-service\src python -m ai_med_extract.app run_dev ``` - This runs Flask's built-in server (good for development). If you want to disable model preloads, ensure environment variable PRELOAD_SMALL_MODELS is false. 2) WSGI / Gunicorn (more production-like) Install gunicorn: ```powershell pip install gunicorn ``` Run using the provided `wsgi.py`: ```powershell cd .\services\ai-service\src $env:PRELOAD_SMALL_MODELS="false" # optional gunicorn -w 4 -b 0.0.0.0:7860 wsgi:app ``` 3) Docker (single container) — build & run Build image: ```powershell cd .\services\ai-service docker build -f Dockerfile.prod -t ai-service:local . ``` Run container: ```powershell docker run --rm -p 7860:7860 ` -e PRELOAD_SMALL_MODELS=false ` -e HF_HOME=/tmp/huggingface ` -e TORCH_HOME=/tmp/torch_cache ` ai-service:local ``` Tips - Use PRELOAD_SMALL_MODELS=true only if you want the container to load small models during startup. Large GGUF/LLM models might be skipped or need special images. 4) Docker Compose (recommended for local integration) ```powershell cd .\services\ai-service docker-compose up --build ``` 5) Mock EHR server (local testing) Run this minimal mock EHR to test `generate_patient_summary` without a real EHR: Create `mock_ehr.py` with the following content: ```python from flask import Flask, request, jsonify app = Flask(__name__) @app.route('/Transactionapi/api/PatientList/patientsummary', methods=['POST']) def patient_summary(): data = request.get_json() or {} patientid = data.get('patientid') example = { "result": { "chartsummarydtl": [ { "visit_date": "2025-01-01", "vitals": {"bp": "120/80"}, "notes": "Example visit notes" } ] } } return jsonify(example), 200 if __name__ == '__main__': app.run(port=9001) ``` Start mock: ```powershell python mock_ehr.py ``` 6) Calling the API (SSE streaming) Example: rule-based summary ```bash curl -N -H "Content-Type: application/json" \ -X POST "http://localhost:7860/generate_patient_summary?stream=true" \ -d '{"patientid":"12345","token":"TEST","key":"http://localhost:9001","generation_mode":"rule"}' ``` 7) Troubleshooting & logs - If you see model download activity: your dev run preloaded a model; you can set PRELOAD_SMALL_MODELS=false to avoid heavy downloads during startup. - If running on Windows with limited memory, prefer `generation_mode: rule` or `fast` (smaller summarizer models). 8) Next steps - To fully test model-based flows you will need network access to the HF model hub or local model files. - For CI/CD, add a GitHub Actions workflow to run lint/tests and build/push the Docker image. If you'd like, I can add `mock_ehr.py` and example client scripts (`examples/call_rule.py`, `examples/call_fast.py`, `examples/call_hq.py`) into the repo and run a pytest verifying the SSE flow using Flask's test client. Tell me to proceed and I'll add them and execute tests.