Spaces:

salvinjose
/

HNTAI

Paused

App Files Files Community

HNTAI / services /ai-service /README_DOCKER.md

Adhil Krishna G

Deployed to Live

5aafb3a 9 months ago

preview code

Raw

History Blame

4.07 kB

AI Service — Docker & Production Deploy

This file explains how to build, run, and deploy the AI service using Docker, Docker Compose, and Kubernetes. It assumes the canonical source lives in services/ai-service/src.

Build locally (Docker)

Prerequisites

Docker & docker-compose installed on your machine.
Optional: GPU drivers and nvidia-docker for GPU-backed builds.

Build image (example tag):

cd .\services\ai-service
# Build production image
docker build -f Dockerfile.prod -t ai-service:local .

Run container locally (default port 7860):

# Simple run
docker run --rm -p 7860:7860 \
  -e PRELOAD_SMALL_MODELS=false \
  -e HF_HOME=/tmp/huggingface \
  -e TORCH_HOME=/tmp/torch_cache \
  -v ${env:USERPROFILE}:/host_user_profile:ro `# optional mounts` \
  ai-service:local

Notes

Use PRELOAD_SMALL_MODELS=false to avoid heavy model downloads at container start. Set it to true only if you want the container to load small models at startup.
Provide credentials to external services (EHR) as environment variables or via mounted secrets. Do NOT bake secrets into the image.

Docker Compose (local development)

A docker-compose.yml is included to make local testing easier. From repository root:

cd .\services\ai-service
docker-compose up --build

This will build the image and start the service. Check logs with docker-compose logs -f.

Push to container registry

Tag and push the image to your registry (DockerHub example):

cd .\services\ai-service
docker tag ai-service:local mydockerhubuser/ai-service:latest
docker push mydockerhubuser/ai-service:latest

In CI, prefer to tag with the commit SHA (:sha-<short>) and store registry credentials in CI secrets.

Kubernetes deployment (example)

A simple Kubernetes deployment.yaml snippet is included in services/ai-service/k8s/deployment.yaml. Important notes:

Use readinessProbe -> /ready and livenessProbe -> /live.
Mount secrets for tokens (Kubernetes Secret) and configure env variables for PRELOAD flags and cache directories.
For heavier models or GPU usage, create a GPU-enabled node pool and use resource limits/requests and node selectors or tolerations.

Quick kubectl apply (example):

kubectl apply -f services/ai-service/k8s/deployment.yaml
kubectl apply -f services/ai-service/k8s/service.yaml  # if present

CI/CD recommendations (high level)

Build & test in CI: run ruff/flake8, pytest, and basic import-sanity checks.
Build image in CI with commit-SHA tag and push to registry.
Deploy to a staging cluster automatically on merge-to-main. Use manual approval for production.
Use GitHub Actions or your preferred CI system. Store DOCKER_USERNAME, DOCKER_PASSWORD, and KUBECONFIG (or use an action to configure GKE/EKS) as secrets.

Production runtime tips

Use a process manager (Gunicorn) with multiple workers. See wsgi.py in services/ai-service/src.
Use PRELOAD flags and readiness gating carefully: if you PRELOAD_SMALL_MODELS=true the container will initialize models before readiness is reported—this can help avoid cold-start latency but increases pod startup time.
Monitor /metrics_text or integrate prometheus_client for better metrics.
Use liveness/readiness probes and resource requests/limits in your manifest.

Troubleshooting

"docker: command not found": install Docker Desktop (Windows) and restart PowerShell.
Model download failures: ensure network access and correct HF credentials, increase timeouts.
Memory OOMs: reduce default batch sizes, adjust worker count, or use models that fit your hardware.

Security

Keep secrets out of images. Use secrets in orchestration.
Use TLS for all inbound traffic via ingress or a load balancer.

If you'd like, I can add a services/ai-service/.github/workflows/ci.yml and a short mock_ehr.py into the repo and run unit tests that call the route using Flask test client. Tell me to proceed and I'll add them and run pytest here.