# Fix: BART and Longformer2Roberta Summarization Models ## Issue Description The `facebook/bart-large-cnn` and `patrickvonplaten/longformer2roberta-cnn_dailymail-fp16` models were producing inaccurate or "rubbish" summaries. ## Root Cause These models are **encoder-decoder summarization models** trained on the CNN/DailyMail dataset. They are **NOT instruction-tuned models**. ### Key Distinction: **Instruction-tuned models** (like Phi-3, FLAN-T5, GPT models): - Understand and follow instructions like "Generate a summary based on..." - Can handle complex prompts with multiple directives - Trained on instruction-following datasets **Non-instruction-tuned summarization models** (like BART, Longformer2Roberta): - Trained on simple article → summary tasks - Do NOT understand instructions - Only trained to condense/extract key information from raw text - When given instructions, they try to **summarize the instruction itself** instead of following it ## The Problem Previously, these models were receiving prompts like: ``` Patient Visit Data: [data] Baseline: [baseline] Changes: [delta_text] Generate a comprehensive patient summary based on the above information. ``` The models would try to **summarize this instruction text** rather than follow it, resulting in nonsensical output. ## The Solution Modified the `build_summarization_context()` function in `routes_fastapi.py` to: 1. **Detect non-instruction-tuned models** (BART, Longformer2Roberta) 2. **Send ONLY raw text** to these models without any instructions 3. **Structure the data** with simple labels (like section headers in an article) ### Before (Incorrect): ```python prompt = f"Patient Data:\nBaseline: {baseline}\nChanges: {delta_text}\n\n" \ f"Generate a comprehensive patient summary based on the above information." ``` ### After (Correct): ```python # For BART/Longformer - NO instructions, just data prompt = f"Patient Information and Visit History:\n{visit_data}\n" \ f"\nBaseline Status:\n{baseline}\n" \ f"\nRecent Changes and Updates:\n{delta_text}" ``` ## Implementation Details ### Modified Files: 1. **`services/ai-service/src/ai_med_extract/api/routes_fastapi.py`** - Updated `build_summarization_context()` function - Added model detection logic - Updated all function calls to pass `model_name` parameter 2. **`models_config.json`** - Added notes about these models being non-instruction-tuned - Clarified their proper usage ### Code Changes: ```python def build_summarization_context(custom_prompt, visit_data_text, baseline, delta_text, model_name=None): """ Build context for summarization models. Non-instruction-tuned models (BART, Longformer2Roberta) need ONLY raw text to summarize, without any instructions. They were trained on article->summary tasks, not instruction following. """ # List of models that are NOT instruction-tuned NON_INSTRUCTION_MODELS = [ "facebook/bart-large-cnn", "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16" ] # Check if this is a non-instruction-tuned model is_non_instruction_model = model_name and any(m in model_name for m in NON_INSTRUCTION_MODELS) if is_non_instruction_model: # For non-instruction models: Send ONLY the data to be summarized # Structure it like an article with section headers data_text = f"Patient Information and Visit History:\n{visit_data_text}\n" if baseline: data_text += f"\nBaseline Status:\n{baseline}\n" if delta_text: data_text += f"\nRecent Changes and Updates:\n{delta_text}" return data_text.strip() else: # For instruction-tuned models: Include explicit instructions return f"{custom_prompt}\n\nPatient Visit Data:\n{visit_data_text}\n\n" \ f"Baseline: {baseline}\n\nChanges: {delta_text}\n\n" \ f"Generate a comprehensive patient summary based on the above information." ``` ## Expected Results After this fix: ✅ **BART and Longformer2Roberta models** now receive properly formatted input ✅ Models will extract and condense key information (their intended purpose) ✅ Output should be coherent summaries rather than garbled text ✅ No changes to instruction-tuned models (Phi-3, FLAN-T5, etc.) ## Model Comparison | Model | Type | Instruction-Tuned? | Best For | |-------|------|-------------------|----------| | `facebook/bart-large-cnn` | Summarization | ❌ No | Extracting key points from documents | | `patrickvonplaten/longformer2roberta-cnn_dailymail-fp16` | Seq2Seq | ❌ No | Long document summarization (4096+ tokens) | | `google/flan-t5-large` | Summarization | ✅ Yes | Instruction-following summarization | | `microsoft/Phi-3-mini-4k-instruct-gguf` | Text Generation | ✅ Yes | Complex patient summaries with instructions | ## Recommendations ### For Best Results: 1. **Use instruction-tuned models** (Phi-3, FLAN-T5) for patient summaries - They understand medical context better - Can follow specific formatting requirements - Handle complex multi-step instructions 2. **Use BART/Longformer for simple extraction tasks** - Quick key point extraction - Document length reduction - When you just need "the highlights" 3. **Current PRIMARY model** (`Phi-3 GGUF`) is already optimal - Instruction-tuned - Quantized for efficiency - Best quality for patient summaries ## Testing To test the fix: ```bash # Test with BART curl -X POST http://localhost:8000/api/patient_summary \ -H "Content-Type: application/json" \ -d '{ "patient_info": {...}, "model_name": "facebook/bart-large-cnn", "model_type": "summarization" }' # Test with Longformer curl -X POST http://localhost:8000/api/patient_summary \ -H "Content-Type: application/json" \ -d '{ "patient_info": {...}, "model_name": "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16", "model_type": "seq2seq" }' ``` ## Future Considerations If adding new models, check if they're instruction-tuned: **Instruction-tuned models typically have:** - "instruct" in the model name - "chat" in the model name - "flan" prefix (FLAN-T5, etc.) - Trained on datasets like: InstructGPT, Flan, Alpaca, etc. **Non-instruction-tuned models:** - Trained on simple task datasets (CNN/DailyMail, XSum, etc.) - Base models without fine-tuning - Should receive raw text only ## References - BART Paper: https://arxiv.org/abs/1910.13461 - CNN/DailyMail Dataset: https://arxiv.org/abs/1506.03340 - Longformer Paper: https://arxiv.org/abs/2004.05150 - HuggingFace Model Cards: - https://huggingface.co/facebook/bart-large-cnn - https://huggingface.co/patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 --- **Date**: 2025-11-07 **Status**: ✅ Fixed **Impact**: Medium - Affects BART and Longformer model quality **Backward Compatibility**: ✅ Yes - No breaking changes to API