--- license: mit language: - en metrics: - f1 - accuracy - roc_auc base_model: - microsoft/deberta-v3-base pipeline_tag: text-classification tags: - medical - pharmacovigilance - clinical-nlp - drug-safety - adverse-drug-reactions - deberta - classification datasets: - custom model-index: - name: MedSentinel ADR Severity Classifier results: - task: type: text-classification name: ADR Severity Classification metrics: - type: f1 value: 0.9272 name: Accuracy (Kaggle) - type: accuracy value: 0.9440 name: Accuracy (Test set) --- # MedSentinel — ADR Severity Classifier **MedSentinel** is a fine-tuned [DeBERTa-v3-Base](https://huggingface.co/microsoft/deberta-v3-base) model for classifying the severity of Adverse Drug Reactions (ADRs) from patient-reported narrative text. It is the core AI component of the MedSentinel ADR Intelligence Platform, designed to assist clinical practitioners in triaging pharmacovigilance signals. ## Model Details | Property | Value | |---|---| | **Base model** | microsoft/deberta-v3-base | | **Architecture** | DeBERTa-v3 (12 layers, 768 hidden, ~86M params) | | **Task** | Binary text classification (Severe / Non-Severe) | | **Training strategy** | 5-fold stratified cross-validation ensemble | | **Kaggle score** | 0.92720 (ensemble) · 0.91544 (single model) | | **Tokenizer** | SentencePiece (max length 256) | ## Intended Use This model is intended for **research and clinical decision support** in the context of pharmacovigilance. It classifies free-text patient ADR reports as either severe or non-severe to help clinicians prioritize signals requiring immediate attention. **Intended users:** Clinical practitioners, pharmacovigilance researchers, healthcare data scientists. **Out-of-scope uses:** This model should not be used as a sole basis for clinical decisions. It is a decision-support tool and should always be reviewed by a qualified healthcare professional. ## Training Data The model was trained on a dataset of **8,153 patient-reported drug experience narratives** sourced from drug review platforms. Labels indicate ADR severity: - `0` — Non-severe adverse drug reaction - `1` — Severe adverse drug reaction **Class distribution:** 53.4% severe · 46.6% non-severe (near-balanced) ## Training Configuration ```python # Key hyperparameters learning_rate = 2e-5 optimizer = "adafactor" batch_size = 16 # effective 64 with gradient accumulation gradient_accumulation = 4 epochs = 8 # with early stopping (patience=3) warmup_ratio = 0.1 lr_scheduler = "cosine" weight_decay = 0.01 max_seq_length = 256 fp16 = False # DeBERTa-v3 incompatibility cv_folds = 5 ``` ## Evaluation Results | Metric | Score | |---|---| | **Kaggle F1 (ensemble)** | **0.92720** | | Kaggle F1 (single model) | 0.91544 | | Validation F1 (macro) | 0.9050 | | Validation accuracy | 94.4% | ## How to Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch from scipy.special import softmax model_id = "Izziemirg/medsentinel-adr-deberta" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) model.eval() def classify_adr(text): inputs = tokenizer( text, return_tensors="pt", truncation=True, max_length=256, padding=True ) with torch.no_grad(): logits = model(**inputs).logits.numpy() probs = softmax(logits, axis=-1)[0] label = "Severe" if probs[1] > 0.5 else "Non-Severe" return {"label": label, "confidence": round(float(probs.max()), 4)} # Example text = "I experienced severe insomnia, heart palpitations, and extreme anxiety after taking this medication for two weeks." print(classify_adr(text)) # {'label': 'Severe', 'confidence': 0.9731} ``` ## Limitations - Trained on English-language patient-reported text only - Performance may degrade on formal clinical notes (different register than training data) - Mixed-sentiment texts (severe symptoms but positive drug efficacy) remain a known edge case — the model may under-predict severity in these cases - Not validated on real-world clinical deployment data ## Citation If you use this model in your research, please cite: ```bibtex @misc{mirghani2025medsentinel, title = {MedSentinel: ADR Severity Classification with DeBERTa-v3}, author = {Mirghani, Izzie}, year = {2026}, howpublished = {HuggingFace Hub}, url = {https://huggingface.co/Izziemirg/medsentinel-adr-deberta} } ``` ## Developed By **Izzie Mirghani** MS Business Analytics, UVA Darden Part of the **MedSentinel ADR Intelligence Platform** project.