AI Service — Docker & Production Deploy

This file explains how to build, run, and deploy the AI service using Docker, Docker Compose, and Kubernetes. It assumes the canonical source lives in `services/ai-service/src`.

1) Build locally (Docker)

Prerequisites
- Docker & docker-compose installed on your machine.
- Optional: GPU drivers and nvidia-docker for GPU-backed builds.

Build image (example tag):

```powershell
cd .\services\ai-service
# Build production image
docker build -f Dockerfile.prod -t ai-service:local .
```

Run container locally (default port 7860):

```powershell
# Simple run
docker run --rm -p 7860:7860 \
  -e PRELOAD_SMALL_MODELS=false \
  -e HF_HOME=/tmp/huggingface \
  -e TORCH_HOME=/tmp/torch_cache \
  -v ${env:USERPROFILE}:/host_user_profile:ro `# optional mounts` \
  ai-service:local
```

Notes
- Use `PRELOAD_SMALL_MODELS=false` to avoid heavy model downloads at container start. Set it to true only if you want the container to load small models at startup.
- Provide credentials to external services (EHR) as environment variables or via mounted secrets. Do NOT bake secrets into the image.

2) Docker Compose (local development)

A `docker-compose.yml` is included to make local testing easier. From repository root:

```powershell
cd .\services\ai-service
docker-compose up --build
```

This will build the image and start the service. Check logs with `docker-compose logs -f`.

3) Push to container registry

Tag and push the image to your registry (DockerHub example):

```powershell
cd .\services\ai-service
docker tag ai-service:local mydockerhubuser/ai-service:latest
docker push mydockerhubuser/ai-service:latest
```

In CI, prefer to tag with the commit SHA (`:sha-<short>`) and store registry credentials in CI secrets.

4) Kubernetes deployment (example)

A simple Kubernetes `deployment.yaml` snippet is included in `services/ai-service/k8s/deployment.yaml`. Important notes:
- Use `readinessProbe` -> `/ready` and `livenessProbe` -> `/live`.
- Mount secrets for tokens (Kubernetes Secret) and configure `env` variables for PRELOAD flags and cache directories.
- For heavier models or GPU usage, create a GPU-enabled node pool and use resource limits/requests and node selectors or tolerations.

Quick kubectl apply (example):

```powershell
kubectl apply -f services/ai-service/k8s/deployment.yaml
kubectl apply -f services/ai-service/k8s/service.yaml  # if present
```

5) CI/CD recommendations (high level)

- Build & test in CI: run ruff/flake8, pytest, and basic import-sanity checks.
- Build image in CI with commit-SHA tag and push to registry.
- Deploy to a staging cluster automatically on merge-to-main. Use manual approval for production.
- Use GitHub Actions or your preferred CI system. Store `DOCKER_USERNAME`, `DOCKER_PASSWORD`, and `KUBECONFIG` (or use an action to configure GKE/EKS) as secrets.

6) Production runtime tips

- Use a process manager (Gunicorn) with multiple workers. See `wsgi.py` in `services/ai-service/src`.
- Use PRELOAD flags and readiness gating carefully: if you `PRELOAD_SMALL_MODELS=true` the container will initialize models before readiness is reported—this can help avoid cold-start latency but increases pod startup time.
- Monitor `/metrics_text` or integrate `prometheus_client` for better metrics.
- Use liveness/readiness probes and resource requests/limits in your manifest.

7) Troubleshooting

- "docker: command not found": install Docker Desktop (Windows) and restart PowerShell.
- Model download failures: ensure network access and correct HF credentials, increase timeouts.
- Memory OOMs: reduce default batch sizes, adjust worker count, or use models that fit your hardware.

8) Security

- Keep secrets out of images. Use secrets in orchestration.
- Use TLS for all inbound traffic via ingress or a load balancer.

If you'd like, I can add a `services/ai-service/.github/workflows/ci.yml` and a short `mock_ehr.py` into the repo and run unit tests that call the route using Flask test client. Tell me to proceed and I'll add them and run pytest here.