# Fix: BART and Longformer2Roberta Summarization Models

## Issue Description

The `facebook/bart-large-cnn` and `patrickvonplaten/longformer2roberta-cnn_dailymail-fp16` models were producing inaccurate or "rubbish" summaries.

## Root Cause

These models are **encoder-decoder summarization models** trained on the CNN/DailyMail dataset. They are **NOT instruction-tuned models**.

### Key Distinction:

**Instruction-tuned models** (like Phi-3, FLAN-T5, GPT models):
- Understand and follow instructions like "Generate a summary based on..."
- Can handle complex prompts with multiple directives
- Trained on instruction-following datasets

**Non-instruction-tuned summarization models** (like BART, Longformer2Roberta):
- Trained on simple article → summary tasks
- Do NOT understand instructions
- Only trained to condense/extract key information from raw text
- When given instructions, they try to **summarize the instruction itself** instead of following it

## The Problem

Previously, these models were receiving prompts like:

```
Patient Visit Data: [data]

Baseline: [baseline]

Changes: [delta_text]

Generate a comprehensive patient summary based on the above information.
```

The models would try to **summarize this instruction text** rather than follow it, resulting in nonsensical output.

## The Solution

Modified the `build_summarization_context()` function in `routes_fastapi.py` to:

1. **Detect non-instruction-tuned models** (BART, Longformer2Roberta)
2. **Send ONLY raw text** to these models without any instructions
3. **Structure the data** with simple labels (like section headers in an article)

### Before (Incorrect):
```python
prompt = f"Patient Data:\nBaseline: {baseline}\nChanges: {delta_text}\n\n" \
         f"Generate a comprehensive patient summary based on the above information."
```

### After (Correct):
```python
# For BART/Longformer - NO instructions, just data
prompt = f"Patient Information and Visit History:\n{visit_data}\n" \
         f"\nBaseline Status:\n{baseline}\n" \
         f"\nRecent Changes and Updates:\n{delta_text}"
```

## Implementation Details

### Modified Files:

1. **`services/ai-service/src/ai_med_extract/api/routes_fastapi.py`**
   - Updated `build_summarization_context()` function
   - Added model detection logic
   - Updated all function calls to pass `model_name` parameter

2. **`models_config.json`**
   - Added notes about these models being non-instruction-tuned
   - Clarified their proper usage

### Code Changes:

```python
def build_summarization_context(custom_prompt, visit_data_text, baseline, delta_text, model_name=None):
    """
    Build context for summarization models.
    
    Non-instruction-tuned models (BART, Longformer2Roberta) need ONLY raw text to summarize,
    without any instructions. They were trained on article->summary tasks, not instruction following.
    """
    # List of models that are NOT instruction-tuned
    NON_INSTRUCTION_MODELS = [
        "facebook/bart-large-cnn",
        "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16"
    ]
    
    # Check if this is a non-instruction-tuned model
    is_non_instruction_model = model_name and any(m in model_name for m in NON_INSTRUCTION_MODELS)
    
    if is_non_instruction_model:
        # For non-instruction models: Send ONLY the data to be summarized
        # Structure it like an article with section headers
        data_text = f"Patient Information and Visit History:\n{visit_data_text}\n"
        if baseline:
            data_text += f"\nBaseline Status:\n{baseline}\n"
        if delta_text:
            data_text += f"\nRecent Changes and Updates:\n{delta_text}"
        return data_text.strip()
    else:
        # For instruction-tuned models: Include explicit instructions
        return f"{custom_prompt}\n\nPatient Visit Data:\n{visit_data_text}\n\n" \
               f"Baseline: {baseline}\n\nChanges: {delta_text}\n\n" \
               f"Generate a comprehensive patient summary based on the above information."
```

## Expected Results

After this fix:

✅ **BART and Longformer2Roberta models** now receive properly formatted input
✅ Models will extract and condense key information (their intended purpose)
✅ Output should be coherent summaries rather than garbled text
✅ No changes to instruction-tuned models (Phi-3, FLAN-T5, etc.)

## Model Comparison

| Model | Type | Instruction-Tuned? | Best For |
|-------|------|-------------------|----------|
| `facebook/bart-large-cnn` | Summarization | ❌ No | Extracting key points from documents |
| `patrickvonplaten/longformer2roberta-cnn_dailymail-fp16` | Seq2Seq | ❌ No | Long document summarization (4096+ tokens) |
| `google/flan-t5-large` | Summarization | ✅ Yes | Instruction-following summarization |
| `microsoft/Phi-3-mini-4k-instruct-gguf` | Text Generation | ✅ Yes | Complex patient summaries with instructions |

## Recommendations

### For Best Results:

1. **Use instruction-tuned models** (Phi-3, FLAN-T5) for patient summaries
   - They understand medical context better
   - Can follow specific formatting requirements
   - Handle complex multi-step instructions

2. **Use BART/Longformer for simple extraction tasks**
   - Quick key point extraction
   - Document length reduction
   - When you just need "the highlights"

3. **Current PRIMARY model** (`Phi-3 GGUF`) is already optimal
   - Instruction-tuned
   - Quantized for efficiency
   - Best quality for patient summaries

## Testing

To test the fix:

```bash
# Test with BART
curl -X POST http://localhost:8000/api/patient_summary \
  -H "Content-Type: application/json" \
  -d '{
    "patient_info": {...},
    "model_name": "facebook/bart-large-cnn",
    "model_type": "summarization"
  }'

# Test with Longformer
curl -X POST http://localhost:8000/api/patient_summary \
  -H "Content-Type: application/json" \
  -d '{
    "patient_info": {...},
    "model_name": "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16",
    "model_type": "seq2seq"
  }'
```

## Future Considerations

If adding new models, check if they're instruction-tuned:

**Instruction-tuned models typically have:**
- "instruct" in the model name
- "chat" in the model name
- "flan" prefix (FLAN-T5, etc.)
- Trained on datasets like: InstructGPT, Flan, Alpaca, etc.

**Non-instruction-tuned models:**
- Trained on simple task datasets (CNN/DailyMail, XSum, etc.)
- Base models without fine-tuning
- Should receive raw text only

## References

- BART Paper: https://arxiv.org/abs/1910.13461
- CNN/DailyMail Dataset: https://arxiv.org/abs/1506.03340
- Longformer Paper: https://arxiv.org/abs/2004.05150
- HuggingFace Model Cards:
  - https://huggingface.co/facebook/bart-large-cnn
  - https://huggingface.co/patrickvonplaten/longformer2roberta-cnn_dailymail-fp16

---

**Date**: 2025-11-07
**Status**: ✅ Fixed
**Impact**: Medium - Affects BART and Longformer model quality
**Backward Compatibility**: ✅ Yes - No breaking changes to API