---
license: mit
language:
- en
metrics:
- f1
- accuracy
- roc_auc
base_model:
- microsoft/deberta-v3-base
pipeline_tag: text-classification
tags:
- medical
- pharmacovigilance
- clinical-nlp
- drug-safety
- adverse-drug-reactions
- deberta
- classification
datasets:
  - custom
model-index:
  - name: MedSentinel ADR Severity Classifier
    results:
      - task:
          type: text-classification
          name: ADR Severity Classification
        metrics:
          - type: f1
            value: 0.9272
            name: Accuracy (Kaggle)
          - type: accuracy
            value: 0.9440
            name: Accuracy (Test set)
---

# MedSentinel — ADR Severity Classifier

**MedSentinel** is a fine-tuned [DeBERTa-v3-Base](https://huggingface.co/microsoft/deberta-v3-base) 
model for classifying the severity of Adverse Drug Reactions (ADRs) from patient-reported 
narrative text. It is the core AI component of the MedSentinel ADR Intelligence Platform, 
designed to assist clinical practitioners in triaging pharmacovigilance signals.

## Model Details

| Property | Value |
|---|---|
| **Base model** | microsoft/deberta-v3-base |
| **Architecture** | DeBERTa-v3 (12 layers, 768 hidden, ~86M params) |
| **Task** | Binary text classification (Severe / Non-Severe) |
| **Training strategy** | 5-fold stratified cross-validation ensemble |
| **Kaggle score** | 0.92720 (ensemble) · 0.91544 (single model) |
| **Tokenizer** | SentencePiece (max length 256) |

## Intended Use

This model is intended for **research and clinical decision support** in the context of 
pharmacovigilance. It classifies free-text patient ADR reports as either severe or 
non-severe to help clinicians prioritize signals requiring immediate attention.

**Intended users:** Clinical practitioners, pharmacovigilance researchers, healthcare data scientists.

**Out-of-scope uses:** This model should not be used as a sole basis for clinical decisions. 
It is a decision-support tool and should always be reviewed by a qualified healthcare professional.

## Training Data

The model was trained on a dataset of **8,153 patient-reported drug experience narratives** 
sourced from drug review platforms. Labels indicate ADR severity:

- `0` — Non-severe adverse drug reaction
- `1` — Severe adverse drug reaction

**Class distribution:** 53.4% severe · 46.6% non-severe (near-balanced)

## Training Configuration

```python
# Key hyperparameters
learning_rate         = 2e-5
optimizer             = "adafactor"
batch_size            = 16  # effective 64 with gradient accumulation
gradient_accumulation = 4
epochs                = 8   # with early stopping (patience=3)
warmup_ratio          = 0.1
lr_scheduler          = "cosine"
weight_decay          = 0.01
max_seq_length        = 256
fp16                  = False  # DeBERTa-v3 incompatibility
cv_folds              = 5
```

## Evaluation Results

| Metric | Score |
|---|---|
| **Kaggle F1 (ensemble)** | **0.92720** |
| Kaggle F1 (single model) | 0.91544 |
| Validation F1 (macro) | 0.9050 |
| Validation accuracy | 94.4% |

## How to Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from scipy.special import softmax

model_id  = "Izziemirg/medsentinel-adr-deberta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

def classify_adr(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=256,
        padding=True
    )
    with torch.no_grad():
        logits = model(**inputs).logits.numpy()

    probs    = softmax(logits, axis=-1)[0]
    label    = "Severe" if probs[1] > 0.5 else "Non-Severe"
    return {"label": label, "confidence": round(float(probs.max()), 4)}

# Example
text = "I experienced severe insomnia, heart palpitations, and extreme anxiety 
        after taking this medication for two weeks."
print(classify_adr(text))
# {'label': 'Severe', 'confidence': 0.9731}
```

## Limitations

- Trained on English-language patient-reported text only
- Performance may degrade on formal clinical notes (different register than training data)
- Mixed-sentiment texts (severe symptoms but positive drug efficacy) remain a known 
  edge case — the model may under-predict severity in these cases
- Not validated on real-world clinical deployment data

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{mirghani2025medsentinel,
  title        = {MedSentinel: ADR Severity Classification with DeBERTa-v3},
  author       = {Mirghani, Izzie},
  year         = {2026},
  howpublished = {HuggingFace Hub},
  url          = {https://huggingface.co/Izziemirg/medsentinel-adr-deberta}
}
```

## Developed By

**Izzie Mirghani** MS Business Analytics, UVA Darden  
Part of the **MedSentinel ADR Intelligence Platform** project.