AI-Service — Local Run Guide

This file explains all practical ways to run the AI service locally on a developer machine (Windows + PowerShell examples included), including: dev server, WSGI/Gunicorn, Docker, Docker Compose, using a mock EHR for testing, and quick API calls.

Repository layout
- services/ai-service/src  - application source
- services/ai-service/Dockerfile.prod
- services/ai-service/docker-compose.yml
- services/ai-service/README_DOCKER.md  - higher level Docker/deploy notes

Ports
- Default app port: 7860

1) Quick dev server (fast iteration)

Requirements
- Python 3.10+
- pip

Start dev server (does not preload large models):

```powershell
cd .\services\ai-service\src
python -m ai_med_extract.app run_dev
```

- This runs Flask's built-in server (good for development). If you want to disable model preloads, ensure environment variable PRELOAD_SMALL_MODELS is false.

2) WSGI / Gunicorn (more production-like)

Install gunicorn:

```powershell
pip install gunicorn
```

Run using the provided `wsgi.py`:

```powershell
cd .\services\ai-service\src
$env:PRELOAD_SMALL_MODELS="false"   # optional
gunicorn -w 4 -b 0.0.0.0:7860 wsgi:app
```

3) Docker (single container) — build & run

Build image:

```powershell
cd .\services\ai-service
docker build -f Dockerfile.prod -t ai-service:local .
```

Run container:

```powershell
docker run --rm -p 7860:7860 `
  -e PRELOAD_SMALL_MODELS=false `
  -e HF_HOME=/tmp/huggingface `
  -e TORCH_HOME=/tmp/torch_cache `
  ai-service:local
```

Tips
- Use PRELOAD_SMALL_MODELS=true only if you want the container to load small models during startup. Large GGUF/LLM models might be skipped or need special images.

4) Docker Compose (recommended for local integration)

```powershell
cd .\services\ai-service
docker-compose up --build
```

5) Mock EHR server (local testing)

Run this minimal mock EHR to test `generate_patient_summary` without a real EHR:

Create `mock_ehr.py` with the following content:

```python
from flask import Flask, request, jsonify
app = Flask(__name__)

@app.route('/Transactionapi/api/PatientList/patientsummary', methods=['POST'])
def patient_summary():
    data = request.get_json() or {}
    patientid = data.get('patientid')
    example = {
        "result": {
            "chartsummarydtl": [
                {
                    "visit_date": "2025-01-01",
                    "vitals": {"bp": "120/80"},
                    "notes": "Example visit notes"
                }
            ]
        }
    }
    return jsonify(example), 200

if __name__ == '__main__':
    app.run(port=9001)
```

Start mock:

```powershell
python mock_ehr.py
```

6) Calling the API (SSE streaming)

Example: rule-based summary

```bash
curl -N -H "Content-Type: application/json" \
  -X POST "http://localhost:7860/generate_patient_summary?stream=true" \
  -d '{"patientid":"12345","token":"TEST","key":"http://localhost:9001","generation_mode":"rule"}'
```

7) Troubleshooting & logs
- If you see model download activity: your dev run preloaded a model; you can set PRELOAD_SMALL_MODELS=false to avoid heavy downloads during startup.
- If running on Windows with limited memory, prefer `generation_mode: rule` or `fast` (smaller summarizer models).

8) Next steps
- To fully test model-based flows you will need network access to the HF model hub or local model files.
- For CI/CD, add a GitHub Actions workflow to run lint/tests and build/push the Docker image.

If you'd like, I can add `mock_ehr.py` and example client scripts (`examples/call_rule.py`, `examples/call_fast.py`, `examples/call_hq.py`) into the repo and run a pytest verifying the SSE flow using Flask's test client. Tell me to proceed and I'll add them and execute tests.