# Model Recommendations for Medical Text Summarization ## Executive Summary **Recommended Model**: `microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf` This is the **PRIMARY** model configured in `models_config.json` with `"is_active": true`. --- ## ⚠️ Models NOT Recommended for Medical Text ### 1. patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 **Status**: ❌ **DEPRECATED - DO NOT USE** **Problem**: This model produces **irrelevant summaries** for medical text because: 1. **Training Mismatch**: Trained on news articles (CNN/DailyMail dataset), NOT medical text 2. **Domain Gap**: Cannot understand: - Clinical terminology and medical abbreviations - Structured visit data and medical codes - ICD codes, medications, dosages - Clinical narrative style 3. **Not Instruction-Tuned**: Cannot follow medical summarization instructions properly **What Happens**: The model tries to summarize medical data as if it were a news article, resulting in nonsensical output that misses critical clinical information. **Solution**: Use Phi-3-mini-4k-instruct-q4.gguf instead. --- ### 2. facebook/bart-large-cnn **Status**: ⚠️ **NOT RECOMMENDED FOR MEDICAL TEXT** **Problem**: Similar to Longformer: - Trained on news articles (CNN/DailyMail) - Limited medical domain knowledge - May produce suboptimal results for clinical text **Better Alternative**: Use Phi-3-mini-4k-instruct-q4.gguf --- ## ✅ Recommended Models ### 1. microsoft/Phi-3-mini-4k-instruct-q4.gguf (PRIMARY - ACTIVE) **Why This Model?** ✅ **Instruction-tuned**: Understands and follows complex medical summarization prompts ✅ **General domain knowledge**: Trained on diverse data including medical/technical content ✅ **Efficient**: GGUF quantization (Q4) provides excellent performance with lower resource usage ✅ **Reliable**: Produces coherent, relevant medical summaries ✅ **Fast**: CPU-optimized, works well in production **Configuration**: ```json { "name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf", "type": "gguf", "is_active": true, "cached": true, "description": "Phi-3 Mini GGUF Q4 quantized - PRIMARY MODEL", "use_case": "Fast patient summary generation with CPU/GPU" } ``` --- ### 2. google/flan-t5-large (ALTERNATIVE) **Status**: ✅ **Good Alternative** **Advantages**: - Instruction-tuned (FLAN methodology) - Can follow summarization instructions - Smaller than Phi-3, faster inference - Better than BART/Longformer for structured text **Use When**: - Need faster inference than Phi-3 - Memory constraints - Simple summarization tasks --- ## Technical Background: Why News Models Fail on Medical Text ### Training Data Mismatch **News Articles (CNN/DailyMail)**: ``` Title: New Study Shows Coffee Benefits Body: A recent study published in the Journal of Medicine found that... Summary: Research indicates coffee may have health benefits including... ``` **Medical Records**: ``` Visit 2024-01-15: Chief Complaint: SOB, DOE HPI: 65F w/ PMH of HTN, DM2, presents with 3d progressive DOE... PE: RRR, no m/r/g. Lungs CTAB. +1 bilateral LE edema... A/P: 1. CHF exacerbation - start Lasix 40mg PO daily... ``` ### What News Models Do Wrong 1. **Terminology**: Can't understand medical abbreviations (SOB, DOE, HTN, DM2, CTAB, etc.) 2. **Structure**: Expect narrative news format, not clinical structured data 3. **Priority**: News models prioritize "interesting" content; medical needs prioritize clinical significance 4. **Context**: Medical context requires understanding relationships between symptoms, diagnoses, medications 5. **Instructions**: Cannot follow complex instructions like "generate a comprehensive clinical summary focusing on changes over time" --- ## Migration Guide ### If You're Currently Using Longformer or BART: **Step 1**: Update your API request to use the recommended model: ```json { "patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf", "patient_summarizer_model_type": "gguf", "generation_mode": "gguf" } ``` **Step 2**: Remove any model-name specification to use the default (Phi-3): ```json { // Just omit model specification - defaults to Phi-3 "patientid": "12345", "token": "your-token", "key": "your-key" } ``` **Step 3**: Test the output quality and adjust parameters if needed: ```json { "max_new_tokens": 2048, // Adjust output length "temperature": 0.1, // Lower = more focused, Higher = more creative "top_p": 0.5 // Lower = more deterministic } ``` --- ## Configuration Reference ### Current Active Configuration (models_config.json) ```json { "patient_summary_models": [ { "name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf", "type": "gguf", "is_active": true, // ← PRIMARY MODEL "cached": true, "description": "Phi-3 Mini GGUF Q4 quantized - PRIMARY MODEL", "use_case": "Fast patient summary generation with CPU/GPU", "repo_id": "microsoft/Phi-3-mini-4k-instruct-gguf", "filename": "Phi-3-mini-4k-instruct-q4.gguf" } ] } ``` --- ## Performance Comparison | Model | Medical Text Quality | Speed | Memory | Instruction Following | |-------|---------------------|-------|--------|----------------------| | **Phi-3 GGUF Q4** | ⭐⭐⭐⭐⭐ Excellent | Fast | Low | ✅ Yes | | FLAN-T5 Large | ⭐⭐⭐⭐ Good | Very Fast | Low | ✅ Yes | | Longformer | ⭐ Poor (Irrelevant) | Slow | High | ❌ No | | BART-CNN | ⭐⭐ Poor | Medium | Medium | ❌ No | --- ## FAQs **Q: Can I still use Longformer/BART?** A: Technically yes (they're still cached), but **strongly not recommended**. They will produce irrelevant summaries. **Q: Why are these models still in the config?** A: For backward compatibility and documentation. They're marked as `deprecated` and `is_active: false`. **Q: What if Phi-3 is too slow?** A: Try `google/flan-t5-large` as an alternative. Still instruction-tuned but smaller/faster. **Q: Can you fix Longformer to work with medical text?** A: No. The model's training is fundamentally incompatible. Would require retraining on medical data. --- ## Summary ✅ **DO USE**: Phi-3-mini-4k-instruct-q4.gguf (default/recommended) ✅ **ALTERNATIVE**: google/flan-t5-large ⚠️ **AVOID**: facebook/bart-large-cnn ❌ **DO NOT USE**: patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 The Longformer model's irrelevant summaries are due to fundamental training mismatch with medical domain, not a bug that can be fixed.