Spaces:
Paused
AI Service — Docker & Production Deploy
This file explains how to build, run, and deploy the AI service using Docker, Docker Compose, and Kubernetes. It assumes the canonical source lives in services/ai-service/src.
- Build locally (Docker)
Prerequisites
- Docker & docker-compose installed on your machine.
- Optional: GPU drivers and nvidia-docker for GPU-backed builds.
Build image (example tag):
cd .\services\ai-service
# Build production image
docker build -f Dockerfile.prod -t ai-service:local .
Run container locally (default port 7860):
# Simple run
docker run --rm -p 7860:7860 \
-e PRELOAD_SMALL_MODELS=false \
-e HF_HOME=/tmp/huggingface \
-e TORCH_HOME=/tmp/torch_cache \
-v ${env:USERPROFILE}:/host_user_profile:ro `# optional mounts` \
ai-service:local
Notes
- Use
PRELOAD_SMALL_MODELS=falseto avoid heavy model downloads at container start. Set it to true only if you want the container to load small models at startup. - Provide credentials to external services (EHR) as environment variables or via mounted secrets. Do NOT bake secrets into the image.
- Docker Compose (local development)
A docker-compose.yml is included to make local testing easier. From repository root:
cd .\services\ai-service
docker-compose up --build
This will build the image and start the service. Check logs with docker-compose logs -f.
- Push to container registry
Tag and push the image to your registry (DockerHub example):
cd .\services\ai-service
docker tag ai-service:local mydockerhubuser/ai-service:latest
docker push mydockerhubuser/ai-service:latest
In CI, prefer to tag with the commit SHA (:sha-<short>) and store registry credentials in CI secrets.
- Kubernetes deployment (example)
A simple Kubernetes deployment.yaml snippet is included in services/ai-service/k8s/deployment.yaml. Important notes:
- Use
readinessProbe->/readyandlivenessProbe->/live. - Mount secrets for tokens (Kubernetes Secret) and configure
envvariables for PRELOAD flags and cache directories. - For heavier models or GPU usage, create a GPU-enabled node pool and use resource limits/requests and node selectors or tolerations.
Quick kubectl apply (example):
kubectl apply -f services/ai-service/k8s/deployment.yaml
kubectl apply -f services/ai-service/k8s/service.yaml # if present
- CI/CD recommendations (high level)
- Build & test in CI: run ruff/flake8, pytest, and basic import-sanity checks.
- Build image in CI with commit-SHA tag and push to registry.
- Deploy to a staging cluster automatically on merge-to-main. Use manual approval for production.
- Use GitHub Actions or your preferred CI system. Store
DOCKER_USERNAME,DOCKER_PASSWORD, andKUBECONFIG(or use an action to configure GKE/EKS) as secrets.
- Production runtime tips
- Use a process manager (Gunicorn) with multiple workers. See
wsgi.pyinservices/ai-service/src. - Use PRELOAD flags and readiness gating carefully: if you
PRELOAD_SMALL_MODELS=truethe container will initialize models before readiness is reported—this can help avoid cold-start latency but increases pod startup time. - Monitor
/metrics_textor integrateprometheus_clientfor better metrics. - Use liveness/readiness probes and resource requests/limits in your manifest.
- Troubleshooting
- "docker: command not found": install Docker Desktop (Windows) and restart PowerShell.
- Model download failures: ensure network access and correct HF credentials, increase timeouts.
- Memory OOMs: reduce default batch sizes, adjust worker count, or use models that fit your hardware.
- Security
- Keep secrets out of images. Use secrets in orchestration.
- Use TLS for all inbound traffic via ingress or a load balancer.
If you'd like, I can add a services/ai-service/.github/workflows/ci.yml and a short mock_ehr.py into the repo and run unit tests that call the route using Flask test client. Tell me to proceed and I'll add them and run pytest here.