Spaces:
Paused
Paused
feat: Add Colab patient summary script, AI service utilities for performance, and related documentation.
f091f7a | # Quick Fix Guide: Reduce Variable Response Times on HF Spaces | |
| ## Problem | |
| Your HF T4 Space has inconsistent response times: | |
| - Sometimes: **1 minute** ✅ | |
| - Sometimes: **5+ minutes** ❌ | |
| ## Root Causes | |
| 1. **Lazy model loading** - Model loads on first request | |
| 2. **Model unloading** - Models unload after inactivity | |
| 3. **Request queueing** - Only 2 concurrent requests allowed | |
| 4. **Cold starts** - HF Spaces may sleep after inactivity | |
| ## Quick Fix (5 Minutes) | |
| ### Step 1: Update `app.py` (Root Level) | |
| Add these lines at the top of your `app.py`: | |
| ```python | |
| # At the top, after imports | |
| from services.ai_service.src.ai_med_extract.utils.hf_spaces_optimizations import ( | |
| configure_hf_spaces_env, | |
| apply_hf_spaces_optimizations | |
| ) | |
| # Before creating the app | |
| configure_hf_spaces_env() | |
| # After creating the app (after line 42) | |
| app = create_app(initialize=False) | |
| initialize_agents(app, preload_small_models=False) | |
| # ADD THIS: | |
| apply_hf_spaces_optimizations(app) | |
| logging.info("Application initialized successfully") | |
| ``` | |
| ### Step 2: Configuration Applied | |
| The optimizations automatically configure: | |
| **Request Queue Settings:** | |
| - **Max Concurrent Requests**: 6 (increased from 2) | |
| - **Max Queue Size**: 10 requests | |
| - **Queue Timeout**: 20 minutes | |
| **Model Loading:** | |
| - **Eager Loading**: Enabled (models preload at startup) | |
| - **Keep-Alive Service**: Enabled (prevents model unloading) | |
| - **Keep-Alive Interval**: 5 minutes | |
| **Logging:** | |
| - **Detailed Logging**: Enabled for all operations | |
| - **Model Operation Logs**: Track loading, generation start/end | |
| - **Generation Metrics**: Track tokens/second, duration, etc. | |
| These settings are automatically applied when you call `apply_hf_spaces_optimizations(app)`. | |
| ### Step 3: Set Up External Monitoring (Optional but Recommended) | |
| Use a free service like **UptimeRobot** or **Cron-job.org** to ping your warmup endpoint every 5 minutes: | |
| **URL to ping**: `https://your-space-name.hf.space/warmup` | |
| **Interval**: Every 5 minutes | |
| This prevents your space from going cold. | |
| ### Step 4: Deploy to HF Spaces | |
| ```bash | |
| git add . | |
| git commit -m "Add HF Spaces performance optimizations" | |
| git push | |
| ``` | |
| ## Expected Results | |
| | Metric | Before | After | | |
| |--------|--------|-------| | |
| | First request (cold) | 2-5 min | 30-60 sec | | |
| | Subsequent requests | 30-60 sec | 30-60 sec | | |
| | After 15 min idle | 2-5 min | 30-60 sec | | |
| | Consistency | ❌ Variable | ✅ Consistent | | |
| ## Monitoring Endpoints | |
| After deployment, you can check these endpoints: | |
| 1. **Model Status**: `https://your-space.hf.space/model-status` | |
| - Shows which models are loaded | |
| 2. **Queue Status**: `https://your-space.hf.space/queue-status` | |
| - Shows request queue state | |
| 3. **Keep-Alive Status**: `https://your-space.hf.space/keepalive-status` | |
| - Shows keep-alive service stats | |
| 4. **Warmup**: `https://your-space.hf.space/warmup` | |
| - Manually trigger model warmup | |
| ## Troubleshooting | |
| ### Issue: "Module not found" error | |
| **Solution**: Make sure you created the new files: | |
| - `services/ai-service/src/ai_med_extract/utils/model_keepalive.py` | |
| - `services/ai-service/src/ai_med_extract/utils/hf_spaces_optimizations.py` | |
| ### Issue: GPU OOM (Out of Memory) errors | |
| **Solution**: Reduce `max_concurrent` back to 2 in `request_queue.py` | |
| ### Issue: Keep-alive not working | |
| **Solution**: Check `/keepalive-status` endpoint to verify service is running | |
| ## Advanced: Manual Testing | |
| Test the optimizations locally: | |
| ```bash | |
| # Start the app | |
| python -m uvicorn services.ai-service.src.ai_med_extract.main:app --reload --port 7860 | |
| # In another terminal, test warmup | |
| curl http://localhost:7860/warmup | |
| # Check model status | |
| curl http://localhost:7860/model-status | |
| # Check queue status | |
| curl http://localhost:7860/queue-status | |
| ``` | |
| ## Rollback Plan | |
| If something breaks, you can quickly rollback: | |
| ```bash | |
| git revert HEAD | |
| git push | |
| ``` | |
| Or simply remove the `apply_hf_spaces_optimizations(app)` line from `app.py`. | |
| ## Need More Help? | |
| Check the full guide: `docs/HF_SPACES_PERFORMANCE_GUIDE.md` | |
| --- | |
| **Estimated Time to Implement**: 5-10 minutes | |
| **Expected Performance Improvement**: 60-80% more consistent response times | |
| **Risk Level**: Low (all changes are additive, easy to rollback) | |