# Changes Summary - HF Spaces Scheduling Error Fix ## What Was Wrong Your app was failing to deploy on Hugging Face Spaces with: - **Error:** "Scheduling failure: unable to schedule" - **Cause:** Multiple issues: 1. Conflicting entry point configuration 2. Requesting `t4-medium` GPU (often unavailable) 3. Heavy model preloading (~4.2GB) ## What I Fixed ### 1. Fixed `.huggingface.yaml` **Changed:** - ❌ Removed `app.entrypoint: services/ai-service/src/ai_med_extract/app:app` - ✅ Docker CMD now takes precedence (cleaner configuration) - ✅ Added comments about hardware alternatives **Why:** The `entrypoint` field was conflicting with the Dockerfile's CMD, causing confusion in how HF Spaces should start the app. ### 2. Fixed `Dockerfile.hf-spaces` **Changed:** ```dockerfile # Before: CMD ["uvicorn", "ai_med_extract.app:app", ...] # After: CMD ["uvicorn", "app:app", ...] ``` **Why:** The root `app.py` is specifically designed for HF Spaces with proper initialization and error handling. ### 3. Created `Dockerfile.hf-spaces-minimal` **New file:** Lightweight alternative without model preloading - Uses `/tmp` for caching (HF Spaces compatible) - Single worker (minimal memory) - Fast startup (no model preloading) - Only ~2GB RAM needed vs ~16GB ### 4. Created Documentation - `HF_SPACES_SCHEDULING_FIX.md` - Complete troubleshooting guide - `HF_SPACES_QUICK_FIX.md` - Quick reference card - `CHANGES_SUMMARY.md` - This file ## What You Should Do Now ### ⚡ FASTEST FIX (Recommended) 1. **Edit `.huggingface.yaml`** - Use this configuration: ```yaml runtime: docker sdk: docker python_version: "3.10" build: dockerfile: Dockerfile.hf-spaces-minimal cache: true # Remove hardware section to use free CPU tier env: - HF_SPACES=true - FAST_MODE=true - PRELOAD_GGUF=false - PRELOAD_SMALL_MODELS=false ``` 2. **Commit and push:** ```bash git add . git commit -m "Fix HF Spaces deployment - use minimal config" git push ``` 3. **Wait 5-10 minutes** for the build to complete 4. **Test your space:** ```bash curl https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE/health ``` ### 🎮 Alternative: Keep GPU But Use t4-small If you need GPU and have access: ```yaml runtime: docker sdk: docker build: dockerfile: Dockerfile.hf-spaces-minimal cache: true hardware: gpu: t4-small # More available than t4-medium env: - HF_SPACES=true - CUDA_VISIBLE_DEVICES=0 ``` ### 🚀 Advanced: Full Model Preloading (If You Have Pro/Enterprise) Keep the current `Dockerfile.hf-spaces` with full model preloading, but: ```yaml hardware: gpu: t4-medium # Requires Pro/Enterprise tier env: - PRELOAD_GGUF=true # Pre-cache models ``` Note: This requires ~20-30 minutes for first build, but subsequent starts are instant. ## Files Modified ``` ✅ .huggingface.yaml - Fixed configuration ✅ Dockerfile.hf-spaces - Fixed CMD entry point 🆕 Dockerfile.hf-spaces-minimal - New lightweight option 📄 HF_SPACES_SCHEDULING_FIX.md - Complete guide 📄 HF_SPACES_QUICK_FIX.md - Quick reference 📄 CHANGES_SUMMARY.md - This summary ``` ## Comparison: Minimal vs Full | Feature | Minimal | Full (Original) | |---------|---------|-----------------| | **Build Time** | 5 min | 20-30 min | | **Startup Time** | 30 sec | 1-2 min | | **Memory Usage** | 2GB | 8-16GB | | **First Request** | 2-3 min (downloads model) | Instant | | **Hardware Needed** | CPU or small GPU | t4-medium+ | | **Cost** | Free tier OK | Pro/Enterprise | | **Cold Start** | Models download | Pre-cached | ## Recommended Path ```mermaid graph TD A[Start] --> B{Need GPU?} B -->|No| C[Use Minimal + CPU] B -->|Yes| D{Have Pro/Enterprise?} D -->|No| E[Use Minimal + t4-small] D -->|Yes| F{Need instant startup?} F -->|No| E F -->|Yes| G[Use Full + t4-medium] C --> H[✅ Deploy in 5 min] E --> I[✅ Deploy in 10 min] G --> J[✅ Deploy in 30 min] ``` **My recommendation:** Start with **Minimal + CPU** to verify everything works, then upgrade to GPU if needed. ## Testing Checklist After deployment, verify these endpoints: ```bash # Replace YOUR_SPACE with your actual space name SPACE_URL="https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE" # 1. Health check curl $SPACE_URL/health # Expected: {"status": "ok"} # 2. Readiness check curl $SPACE_URL/health/ready # Expected: {"status": "ready"} # 3. Root endpoint curl $SPACE_URL/ # Expected: {"message": "Medical AI Service", ...} # 4. API docs open $SPACE_URL/docs # Should show FastAPI Swagger UI ``` ## Troubleshooting ### "Still getting scheduling error" - Check your HF account tier (Settings → Billing) - Try removing `hardware:` section entirely (use free CPU) - Check https://status.huggingface.co/ for platform issues ### "Build succeeds but app crashes" - Check Space logs for Python errors - Test Docker image locally first: ```bash docker build -f Dockerfile.hf-spaces-minimal -t test . docker run -p 7860:7860 -e HF_SPACES=true test ``` ### "App starts but requests fail" - Models are downloading on first request (wait 2-3 min) - Check memory usage in Space settings - Consider enabling PRELOAD_GGUF if using GPU ## Success Indicators Your Space logs should show: ``` ✅ Starting Medical AI Service on Hugging Face Spaces ✅ Detected Hugging Face Spaces environment ✅ Creating FastAPI application for HF Spaces... ✅ Application initialized successfully ✅ Uvicorn running on http://0.0.0.0:7860 ``` ## Need Help? 1. **Read the guides:** - `HF_SPACES_QUICK_FIX.md` - Quick solutions - `HF_SPACES_SCHEDULING_FIX.md` - Detailed troubleshooting 2. **Check logs:** - Go to your Space → Settings → Logs - Look for error messages 3. **Test locally:** - Build and run Docker image on your machine - Verify it works before pushing to HF 4. **Community support:** - HF Discord: https://discord.gg/hugging-face - HF Forum: https://discuss.huggingface.co/ ## Summary **What to do RIGHT NOW:** 1. Update `.huggingface.yaml` to use `Dockerfile.hf-spaces-minimal` 2. Remove the `hardware` section (or use `gpu: t4-small`) 3. Commit and push 4. Wait 5-10 minutes 5. Test your endpoints **Expected result:** Your Space will deploy successfully and be accessible within 10 minutes! 🎉 --- Last updated: 2025-11-13