AI Service — Docker & Production Deploy This file explains how to build, run, and deploy the AI service using Docker, Docker Compose, and Kubernetes. It assumes the canonical source lives in `services/ai-service/src`. 1) Build locally (Docker) Prerequisites - Docker & docker-compose installed on your machine. - Optional: GPU drivers and nvidia-docker for GPU-backed builds. Build image (example tag): ```powershell cd .\services\ai-service # Build production image docker build -f Dockerfile.prod -t ai-service:local . ``` Run container locally (default port 7860): ```powershell # Simple run docker run --rm -p 7860:7860 \ -e PRELOAD_SMALL_MODELS=false \ -e HF_HOME=/tmp/huggingface \ -e TORCH_HOME=/tmp/torch_cache \ -v ${env:USERPROFILE}:/host_user_profile:ro `# optional mounts` \ ai-service:local ``` Notes - Use `PRELOAD_SMALL_MODELS=false` to avoid heavy model downloads at container start. Set it to true only if you want the container to load small models at startup. - Provide credentials to external services (EHR) as environment variables or via mounted secrets. Do NOT bake secrets into the image. 2) Docker Compose (local development) A `docker-compose.yml` is included to make local testing easier. From repository root: ```powershell cd .\services\ai-service docker-compose up --build ``` This will build the image and start the service. Check logs with `docker-compose logs -f`. 3) Push to container registry Tag and push the image to your registry (DockerHub example): ```powershell cd .\services\ai-service docker tag ai-service:local mydockerhubuser/ai-service:latest docker push mydockerhubuser/ai-service:latest ``` In CI, prefer to tag with the commit SHA (`:sha-`) and store registry credentials in CI secrets. 4) Kubernetes deployment (example) A simple Kubernetes `deployment.yaml` snippet is included in `services/ai-service/k8s/deployment.yaml`. Important notes: - Use `readinessProbe` -> `/ready` and `livenessProbe` -> `/live`. - Mount secrets for tokens (Kubernetes Secret) and configure `env` variables for PRELOAD flags and cache directories. - For heavier models or GPU usage, create a GPU-enabled node pool and use resource limits/requests and node selectors or tolerations. Quick kubectl apply (example): ```powershell kubectl apply -f services/ai-service/k8s/deployment.yaml kubectl apply -f services/ai-service/k8s/service.yaml # if present ``` 5) CI/CD recommendations (high level) - Build & test in CI: run ruff/flake8, pytest, and basic import-sanity checks. - Build image in CI with commit-SHA tag and push to registry. - Deploy to a staging cluster automatically on merge-to-main. Use manual approval for production. - Use GitHub Actions or your preferred CI system. Store `DOCKER_USERNAME`, `DOCKER_PASSWORD`, and `KUBECONFIG` (or use an action to configure GKE/EKS) as secrets. 6) Production runtime tips - Use a process manager (Gunicorn) with multiple workers. See `wsgi.py` in `services/ai-service/src`. - Use PRELOAD flags and readiness gating carefully: if you `PRELOAD_SMALL_MODELS=true` the container will initialize models before readiness is reported—this can help avoid cold-start latency but increases pod startup time. - Monitor `/metrics_text` or integrate `prometheus_client` for better metrics. - Use liveness/readiness probes and resource requests/limits in your manifest. 7) Troubleshooting - "docker: command not found": install Docker Desktop (Windows) and restart PowerShell. - Model download failures: ensure network access and correct HF credentials, increase timeouts. - Memory OOMs: reduce default batch sizes, adjust worker count, or use models that fit your hardware. 8) Security - Keep secrets out of images. Use secrets in orchestration. - Use TLS for all inbound traffic via ingress or a load balancer. If you'd like, I can add a `services/ai-service/.github/workflows/ci.yml` and a short `mock_ehr.py` into the repo and run unit tests that call the route using Flask test client. Tell me to proceed and I'll add them and run pytest here.