Spaces:
Paused
Paused
File size: 4,595 Bytes
8a71d89 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | # Comprehensive Streaming Fix - 20 Second Timeout Issue
## Problem Summary
The streaming was stopping at 20 seconds because:
1. **Detection Issue**: System wasn't properly detecting GGUF mode
2. **Generator Issue**: System was using regular `sse_generator` instead of extended one
3. **Timeout Issue**: 20-second HTTP/2 protocol timeout on Hugging Face Spaces
## Complete Solution Implemented
### **1. Universal Extended Streaming**
```python
# ALWAYS use extended streaming to prevent 20-second timeout issues
print(f"π Using extended streaming generator for ALL requests to prevent timeout issues")
return StreamingResponse(
sse_generator_extended(job_id), # Use extended generator for ALL cases
media_type="text/event-stream",
headers={...}
)
```
### **2. Enhanced GGUF Detection**
```python
# Now checks multiple fields for GGUF detection
is_gguf_mode = (data.get('generation_mode') == 'gguf' or
data.get('patient_summarizer_model_type') == 'gguf' or
'gguf' in data.get('patient_summarizer_model_name', '').lower())
```
### **3. Extended Timeout Configuration**
```python
# Extended timeout for GGUF operations
max_wait_time = 600 # 10 minutes for GGUF operations
heartbeat_interval = 5 # Every 5 seconds
```
### **4. Detailed Progress Updates**
#### **Model Loading Progress:**
- `π¦ GGUF Model Loading: Downloading model from microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf`
- `β
GGUF Model Loading: Model downloaded successfully`
- `π§ GGUF Model Loading: Initializing with context=4096, threads=2, gpu_layers=-1`
- `β
GGUF Model Loading Complete: Model loaded in 19.40s (GPU layers=-1)`
#### **Generation Progress:**
- `π§ GGUF Model Loading: Initializing model pipeline...`
- `π¦ GGUF Model Loading: Downloading model files...`
- `π GGUF Model Ready: Starting text generation...`
- `π GGUF Generation: Starting text generation (max_tokens=8192)`
- `β
GGUF Generation Complete: Generated 1500 words in 45.2s`
- `β
GGUF Generation Complete: Processing generated summary...`
### **5. Enhanced SSE Generator**
```python
def sse_generator_extended(job_id):
max_wait_time = 600 # 10 minutes for GGUF operations
heartbeat_interval = 5 # Every 5 seconds
# Enhanced logging and progress updates
```
## Expected Behavior Now
### **Timeline for 5-Minute GGUF Generation:**
```
0:00 - Request starts
0:01 - "π Using extended streaming generator for ALL requests"
0:02 - "π¦ GGUF Model Loading: Downloading model from microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf"
0:05 - "β
GGUF Model Loading: Model downloaded successfully"
0:10 - "π§ GGUF Model Loading: Initializing with context=4096, threads=2, gpu_layers=-1"
0:20 - "β
GGUF Model Loading Complete: Model loaded in 19.40s (GPU layers=-1)"
0:21 - "π GGUF Model Ready: Starting text generation..."
0:22 - "π GGUF Generation: Starting text generation (max_tokens=8192)"
0:25 - Heartbeat: "GGUF model operation in progress..."
0:30 - Heartbeat: "GGUF model operation in progress..."
...
4:55 - Heartbeat: "GGUF model operation in progress..."
5:00 - "β
GGUF Generation Complete: Generated 1500 words in 45.2s"
5:01 - "β
GGUF Generation Complete: Processing generated summary..."
5:02 - Final result delivered
```
## Key Benefits
### **β
No More 20-Second Timeout**
- Extended 10-minute timeout instead of 20 seconds
- Universal extended streaming for all requests
- Proper detection of GGUF mode
### **β
Detailed Progress Updates**
- Every step of model loading is tracked
- Generation progress is monitored
- Heartbeat every 5 seconds during long operations
### **β
Better User Experience**
- Continuous feedback throughout the process
- Clear status messages for each step
- No more silent timeouts
### **β
Robust Error Handling**
- Proper timeout management
- Clear error messages
- Graceful degradation
## Testing
The fix should now work with your exact request format:
```json
{
"mode": "stream",
"patientid": 5635,
"patient_summarizer_model_type": "gguf",
"patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf"
}
```
## Debug Output
The system now logs:
- `"π Using extended streaming generator for ALL requests to prevent timeout issues"`
- `"β
GGUF mode detected - using extended streaming approach"`
- Detailed progress updates for every step
- Heartbeat messages every 5 seconds
This ensures you can monitor the entire process and track progress throughout the GGUF model loading and generation.
|