HNTAI / services /ai-service /README_DOCKER.md
Adhil Krishna G
Deployed to Live
5aafb3a
|
Raw
History Blame
4.07 kB

AI Service — Docker & Production Deploy

This file explains how to build, run, and deploy the AI service using Docker, Docker Compose, and Kubernetes. It assumes the canonical source lives in services/ai-service/src.

  1. Build locally (Docker)

Prerequisites

  • Docker & docker-compose installed on your machine.
  • Optional: GPU drivers and nvidia-docker for GPU-backed builds.

Build image (example tag):

cd .\services\ai-service
# Build production image
docker build -f Dockerfile.prod -t ai-service:local .

Run container locally (default port 7860):

# Simple run
docker run --rm -p 7860:7860 \
  -e PRELOAD_SMALL_MODELS=false \
  -e HF_HOME=/tmp/huggingface \
  -e TORCH_HOME=/tmp/torch_cache \
  -v ${env:USERPROFILE}:/host_user_profile:ro `# optional mounts` \
  ai-service:local

Notes

  • Use PRELOAD_SMALL_MODELS=false to avoid heavy model downloads at container start. Set it to true only if you want the container to load small models at startup.
  • Provide credentials to external services (EHR) as environment variables or via mounted secrets. Do NOT bake secrets into the image.
  1. Docker Compose (local development)

A docker-compose.yml is included to make local testing easier. From repository root:

cd .\services\ai-service
docker-compose up --build

This will build the image and start the service. Check logs with docker-compose logs -f.

  1. Push to container registry

Tag and push the image to your registry (DockerHub example):

cd .\services\ai-service
docker tag ai-service:local mydockerhubuser/ai-service:latest
docker push mydockerhubuser/ai-service:latest

In CI, prefer to tag with the commit SHA (:sha-<short>) and store registry credentials in CI secrets.

  1. Kubernetes deployment (example)

A simple Kubernetes deployment.yaml snippet is included in services/ai-service/k8s/deployment.yaml. Important notes:

  • Use readinessProbe -> /ready and livenessProbe -> /live.
  • Mount secrets for tokens (Kubernetes Secret) and configure env variables for PRELOAD flags and cache directories.
  • For heavier models or GPU usage, create a GPU-enabled node pool and use resource limits/requests and node selectors or tolerations.

Quick kubectl apply (example):

kubectl apply -f services/ai-service/k8s/deployment.yaml
kubectl apply -f services/ai-service/k8s/service.yaml  # if present
  1. CI/CD recommendations (high level)
  • Build & test in CI: run ruff/flake8, pytest, and basic import-sanity checks.
  • Build image in CI with commit-SHA tag and push to registry.
  • Deploy to a staging cluster automatically on merge-to-main. Use manual approval for production.
  • Use GitHub Actions or your preferred CI system. Store DOCKER_USERNAME, DOCKER_PASSWORD, and KUBECONFIG (or use an action to configure GKE/EKS) as secrets.
  1. Production runtime tips
  • Use a process manager (Gunicorn) with multiple workers. See wsgi.py in services/ai-service/src.
  • Use PRELOAD flags and readiness gating carefully: if you PRELOAD_SMALL_MODELS=true the container will initialize models before readiness is reported—this can help avoid cold-start latency but increases pod startup time.
  • Monitor /metrics_text or integrate prometheus_client for better metrics.
  • Use liveness/readiness probes and resource requests/limits in your manifest.
  1. Troubleshooting
  • "docker: command not found": install Docker Desktop (Windows) and restart PowerShell.
  • Model download failures: ensure network access and correct HF credentials, increase timeouts.
  • Memory OOMs: reduce default batch sizes, adjust worker count, or use models that fit your hardware.
  1. Security
  • Keep secrets out of images. Use secrets in orchestration.
  • Use TLS for all inbound traffic via ingress or a load balancer.

If you'd like, I can add a services/ai-service/.github/workflows/ci.yml and a short mock_ehr.py into the repo and run unit tests that call the route using Flask test client. Tell me to proceed and I'll add them and run pytest here.