Instructions to use vadimbelsky/biomedbert-triage-esi-v42 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vadimbelsky/biomedbert-triage-esi-v42 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="vadimbelsky/biomedbert-triage-esi-v42")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("vadimbelsky/biomedbert-triage-esi-v42") model = AutoModel.from_pretrained("vadimbelsky/biomedbert-triage-esi-v42") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("vadimbelsky/biomedbert-triage-esi-v42")
model = AutoModel.from_pretrained("vadimbelsky/biomedbert-triage-esi-v42")BiomedBERT-Triage-ESI v42
A BiomedBERT-based clinical field extractor for Emergency Severity Index (ESI) triage classification. The model extracts structured clinical fields from free-text triage notes, which are then processed by a deterministic ESI v5 algorithm to produce ESI levels 1-5.
Key Results
| Metric | Value |
|---|---|
| Algorithm-correct accuracy | 91.7% (33/36) |
| Expert-labeled accuracy | 86.1% (31/36) |
| Within-1 accuracy | 97.2% (35/36) |
| High-risk recall (ESI 1-2) | 92.0% |
| Under-triage rate | 8.3% |
| Over-triage rate | 5.6% |
| Inference speed | 21ms/sample (MPS) |
91.7% algorithm-correct: 2 of 5 "errors" are cases where the model correctly follows ESI algorithm rules, but the expert applied clinical judgment that overrides the algorithm (CHF-related chest pain scored ESI 3 by expert vs ESI 2 by algorithm; isolated pelvic pain with stable vitals scored ESI 3 by expert vs ESI 2 by algorithm).
Per-ESI Performance
| ESI Level | Accuracy | Cases | Description |
|---|---|---|---|
| ESI 1 (Resuscitation) | 92.9% | 13/14 | Cardiac arrest, respiratory failure, septic shock |
| ESI 2 (Emergent) | 81.8% | 9/11 | Chest pain, stroke, active seizure, sepsis |
| ESI 3 (Urgent) | 60.0% | 3/5 | 2+ resources: labs, imaging, IV |
| ESI 4 (Less urgent) | 100% | 4/4 | 1 resource: X-ray or simple procedure |
| ESI 5 (Non-urgent) | 100% | 2/2 | 0 resources: med refill, suture removal |
Architecture
Triage Note (free text)
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BiomedBERT Encoder (110M) โ
โ [CLS] token โ hidden state โ
โโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโผโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโ
โผ โผ โผ โผ โผ โผ
Symptom Flag Pain Arrival Resource
Head Head Head Head Head
(50) (5) (1) (5) (11)
โ โ โ โ โ
โผ โผ โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Deterministic ESI v5 Engine โ
โ Step A โ B1 โ B2 โ B3 โ C โ D โ
โ (~250 lines pure Python) โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โผ
ESI 1-5 + reasoning
The model never predicts ESI directly. It extracts structured fields, and a transparent, auditable ESI algorithm makes the final decision. Every prediction comes with step-by-step reasoning.
Extraction Heads
| Head | Output | Val Accuracy |
|---|---|---|
| Symptom | 50 binary labels (chest_pain, fracture, sepsis_signs, ...) | 99.7% |
| Resource | 11 binary labels (labs, ecg, xray, iv_fluids, ...) | 99.9% |
| Flag | 5 binary flags (altered_mentation, needs_immediate_airway, ...) | โ |
| Pain | Regression 0-10 | MAE 0.09 |
| Arrival | 5-class (ambulance, walk-in, helicopter, wheelchair, unknown) | โ |
ESI Algorithm (Deterministic)
The ESI v5 algorithm is implemented as ~250 lines of pure Python with no text extraction:
- Step A: Immediate lifesaving intervention? (airway, IV resuscitation, GCS โค 8, SBP < 80, SpO2 < 85) โ ESI 1
- Step B1: High-risk symptoms? (chest_pain, sepsis_signs, stroke, seizure, GI bleed, ...) โ ESI 2
- Step B2: Altered mental status? โ ESI 2
- Step B3: Severe pain/distress? (pain 10/10, systemic pain โฅ 9) โ ESI 2
- Step C: Resource counting (2+ โ ESI 3, 1 โ ESI 4, 0 โ ESI 5)
- Step D: Vital sign danger zones โ uptriage
Training
Dataset
113,801 records from multiple sources:
| Source | Records | Format |
|---|---|---|
| MIMIC-IV-ED structured (gold ESI) | 96,099 | Compact |
| MIETIC narrative (from MIMIC-IV-ED) | 9,629 | Narrative |
| MIMIC-IV-ED generated narrative | 5,000 | Narrative |
| Targeted augmentation (critical care) | 3,073 | Narrative |
Key insight: Training data must include both compact and narrative formats. A model trained only on compact text ("CC: Chest pain | HR 110 BP 130/80...") fails on narrative clinical notes ("A 63-year-old male presents to the ED via ambulance with palpitations and dizziness...") โ the real-world format.
Training Configuration
- Base model:
microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Epochs: 5
- Batch size: 32
- Learning rate: 2e-5 (with linear warmup)
- Max sequence length: 256 tokens
- Hardware: NVIDIA DGX Spark (GB10 GPU)
- Training time: ~2.5 hours (114K records)
Training Approach
- Multi-head extraction: Single BERT encoder with 5 classification heads trained jointly
- Gold labels from MIMIC-IV-ED: ESI labels from real nurse triage decisions (acuity field), not synthetic
- Arrival mode from text: Labels derived from text cues in training data ("ambulance", "transferred", "walk-in")
- Intervention flags from ICD codes:
needs_immediate_airwaylabeled based on ICD codes for respiratory failure (J96), cardiac arrest (I46);needs_immediate_iv_resuscitationfrom sepsis (A41/R65), shock (R57) - Iterative surgical augmentation: 40+ experiments targeting specific extraction errors with MIMIC-IV-ED data
Key Findings from 40+ Experiments
- Data quality > model architecture: Label cleaning, format diversity, and targeted augmentation improved accuracy more than architectural changes (contrastive learning, fusion heads, ESI direct prediction all failed)
- Multi-task dilution: Every head beyond symptom + resource + flag + pain + arrival hurts accuracy. Diagnosis, severity, ESI, and resource-count heads all degraded extraction
- Narrative format gap: The single biggest improvement came from adding narrative-format training data (72.2% โ 86.1%)
- Rare label challenge: Intervention flags (airway, IV resuscitation) at ~4% of data are hard for BERT-base to learn reliably
- Two-stage fine-tuning fails: Freezing extraction layers and training ESI head always corrupts extraction representations
Usage
import torch
from transformers import AutoTokenizer, AutoModel
import torch.nn as nn
# Load model
model_dir = "vadimbelsky/biomedbert-triage-esi-v42"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
encoder = AutoModel.from_pretrained(model_dir)
heads = torch.load(f"{model_dir}/classifier_heads.pt", map_location="cpu", weights_only=True)
# Build heads
SYMPTOM_LABELS = [
"chest_pain", "diaphoresis", "syncope", "stroke_symptoms",
"altered_mental_status", "anaphylaxis", "sepsis_signs", "active_seizure",
"suicidal_ideation", "homicidal", "psychotic",
"abdominal_pain", "gi_bleed", "testicular_pain", "ovarian_torsion",
"ectopic_pregnancy", "dka_signs", "toxic_ingestion", "post_ictal",
"respiratory_distress", "shortness_of_breath", "wheezing",
"headache", "nausea_vomiting", "back_pain",
"laceration", "sprain", "fracture", "uri", "rash",
"eye_pain", "ear_pain", "dental_pain", "wound", "burn",
"fever", "hypothermia", "vaginal_bleeding", "urinary_symptoms",
"active_hemorrhage",
"dizziness", "palpitations", "cough", "diarrhea", "weakness",
"anxiety", "dehydration", "allergic_reaction", "limb_pain", "constipation",
]
FLAG_LABELS = ["altered_mentation", "severe_pain_distress", "active_hemorrhage",
"needs_immediate_airway", "needs_immediate_iv_resuscitation"]
h = encoder.config.hidden_size
symptom_head = nn.Linear(h, len(SYMPTOM_LABELS))
symptom_head.load_state_dict(heads["symptom_head"])
flag_head = nn.Linear(h, len(FLAG_LABELS))
flag_head.load_state_dict(heads["flag_head"])
# Extract
note = "A 52-year-old male was brought to the ED via ambulance with sepsis and hypotension. Critically hypotensive, requiring mechanical ventilation."
enc = tokenizer(note, max_length=256, padding="max_length", truncation=True, return_tensors="pt")
encoder.eval()
with torch.no_grad():
cls = encoder(**enc).last_hidden_state[:, 0, :]
sym_probs = torch.sigmoid(symptom_head(cls)).squeeze()
flag_probs = torch.sigmoid(flag_head(cls)).squeeze()
symptoms = [SYMPTOM_LABELS[i] for i, p in enumerate(sym_probs) if p > 0.5]
flags = {FLAG_LABELS[i]: bool(p > 0.5) for i, p in enumerate(flag_probs)}
print(f"Symptoms: {symptoms}")
print(f"Flags: {flags}")
# โ Symptoms: ['sepsis_signs', 'respiratory_distress', 'shortness_of_breath', 'fever']
# โ Flags: {'needs_immediate_airway': True, 'needs_immediate_iv_resuscitation': True, ...}
# โ Feed into ESI algorithm โ Step A โ ESI 1
Evaluation
Evaluated on 36 expert-labeled cases from MIETIC (narrative clinical cases derived from MIMIC-IV-ED, reviewed by 2-3 emergency medicine experts).
Confusion Matrix
GT\Pred 1 2 3 4 5
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 13 0 1 0 0
2 0 10 0 1 0
3 0 2 2 1 0
4 0 0 0 4 0
5 0 0 0 0 2
Error Analysis
| Error | GTโPred | Root Cause |
|---|---|---|
| Cardiac arrest | 1โ2 | "Vital signs absent" โ rare phrasing, model can't extract intervention flag |
| CHF chest pain | 3โ2 | Chest pain genuinely in text. Algorithm-correct: ESI rules say chest_pain โ B1 โ ESI 2. Expert overrides with clinical context. |
| Pelvic pain | 3โ2 | Pain 9/10 with stable vitals. Algorithm-correct: B3 fires on severe distress. Expert considers stable vitals โ ESI 3. |
| Open fracture transfer | 2โ4 | "Open fracture" compound term not understood. Model extracts fracture (1 resource) but misses hemorrhage severity. |
| Crohn's flare | 2โ3 | Unstable extraction: sepsis_signs detection for IBD presentations is marginal at BERT-base capacity. |
Limitations
- Single-center data: Based on MIMIC-IV-ED (Beth Israel Deaconess Medical Center)
- BERT-base capacity: 110M parameters limits rare pattern learning (intervention flags, compound medical terms)
- Binary extraction: Can't distinguish primary vs secondary symptoms (e.g., CHF chest pain vs ACS chest pain)
- English only: Trained on English clinical text
- Not a medical device: Research use only. Not validated for clinical deployment.
Citation
@misc{belsky2026biomedbert-triage,
title={BiomedBERT-Triage-ESI: Clinical Field Extraction for Emergency Triage},
author={Belsky, Vadim},
year={2026},
url={https://huggingface.co/vadimbelsky/biomedbert-triage-esi-v42}
}
- Downloads last month
- 43
Model tree for vadimbelsky/biomedbert-triage-esi-v42
Space using vadimbelsky/biomedbert-triage-esi-v42 1
Evaluation results
- Accuracy (algorithm-correct)self-reported0.917
- Accuracy (expert-labeled)self-reported0.861
- High-risk recall (ESI 1-2)self-reported0.920
- Within-1 accuracyself-reported0.972
- Under-triage rateself-reported0.083
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="vadimbelsky/biomedbert-triage-esi-v42")