# Quick Fix Guide: Reduce Variable Response Times on HF Spaces ## Problem Your HF T4 Space has inconsistent response times: - Sometimes: **1 minute** ✅ - Sometimes: **5+ minutes** ❌ ## Root Causes 1. **Lazy model loading** - Model loads on first request 2. **Model unloading** - Models unload after inactivity 3. **Request queueing** - Only 2 concurrent requests allowed 4. **Cold starts** - HF Spaces may sleep after inactivity ## Quick Fix (5 Minutes) ### Step 1: Update `app.py` (Root Level) Add these lines at the top of your `app.py`: ```python # At the top, after imports from services.ai_service.src.ai_med_extract.utils.hf_spaces_optimizations import ( configure_hf_spaces_env, apply_hf_spaces_optimizations ) # Before creating the app configure_hf_spaces_env() # After creating the app (after line 42) app = create_app(initialize=False) initialize_agents(app, preload_small_models=False) # ADD THIS: apply_hf_spaces_optimizations(app) logging.info("Application initialized successfully") ``` ### Step 2: Configuration Applied The optimizations automatically configure: **Request Queue Settings:** - **Max Concurrent Requests**: 6 (increased from 2) - **Max Queue Size**: 10 requests - **Queue Timeout**: 20 minutes **Model Loading:** - **Eager Loading**: Enabled (models preload at startup) - **Keep-Alive Service**: Enabled (prevents model unloading) - **Keep-Alive Interval**: 5 minutes **Logging:** - **Detailed Logging**: Enabled for all operations - **Model Operation Logs**: Track loading, generation start/end - **Generation Metrics**: Track tokens/second, duration, etc. These settings are automatically applied when you call `apply_hf_spaces_optimizations(app)`. ### Step 3: Set Up External Monitoring (Optional but Recommended) Use a free service like **UptimeRobot** or **Cron-job.org** to ping your warmup endpoint every 5 minutes: **URL to ping**: `https://your-space-name.hf.space/warmup` **Interval**: Every 5 minutes This prevents your space from going cold. ### Step 4: Deploy to HF Spaces ```bash git add . git commit -m "Add HF Spaces performance optimizations" git push ``` ## Expected Results | Metric | Before | After | |--------|--------|-------| | First request (cold) | 2-5 min | 30-60 sec | | Subsequent requests | 30-60 sec | 30-60 sec | | After 15 min idle | 2-5 min | 30-60 sec | | Consistency | ❌ Variable | ✅ Consistent | ## Monitoring Endpoints After deployment, you can check these endpoints: 1. **Model Status**: `https://your-space.hf.space/model-status` - Shows which models are loaded 2. **Queue Status**: `https://your-space.hf.space/queue-status` - Shows request queue state 3. **Keep-Alive Status**: `https://your-space.hf.space/keepalive-status` - Shows keep-alive service stats 4. **Warmup**: `https://your-space.hf.space/warmup` - Manually trigger model warmup ## Troubleshooting ### Issue: "Module not found" error **Solution**: Make sure you created the new files: - `services/ai-service/src/ai_med_extract/utils/model_keepalive.py` - `services/ai-service/src/ai_med_extract/utils/hf_spaces_optimizations.py` ### Issue: GPU OOM (Out of Memory) errors **Solution**: Reduce `max_concurrent` back to 2 in `request_queue.py` ### Issue: Keep-alive not working **Solution**: Check `/keepalive-status` endpoint to verify service is running ## Advanced: Manual Testing Test the optimizations locally: ```bash # Start the app python -m uvicorn services.ai-service.src.ai_med_extract.main:app --reload --port 7860 # In another terminal, test warmup curl http://localhost:7860/warmup # Check model status curl http://localhost:7860/model-status # Check queue status curl http://localhost:7860/queue-status ``` ## Rollback Plan If something breaks, you can quickly rollback: ```bash git revert HEAD git push ``` Or simply remove the `apply_hf_spaces_optimizations(app)` line from `app.py`. ## Need More Help? Check the full guide: `docs/HF_SPACES_PERFORMANCE_GUIDE.md` --- **Estimated Time to Implement**: 5-10 minutes **Expected Performance Improvement**: 60-80% more consistent response times **Risk Level**: Low (all changes are additive, easy to rollback)