BERT v47 — Medical Triage Decision Support (19-head)

A 19-head BERT model for emergency department triage decision support. Predicts ESI (Emergency Severity Index) levels 1-5 from free-text triage narratives + supplemental heads for symptoms, resources, vitals, flags, and clinical context.

Architecture: BiomedBERT encoder (109M params) + 19 task heads, trained with focal loss, label smoothing, ordinal-distance penalty, layer-wise LR decay, and effective-number-of-samples class weighting.

Intended use: clinical decision support for triage nurses — produces ESI prediction with confidence, detected symptoms, suggested resources, and uncertainty signals. Not a standalone diagnostic system.

Eval results (epoch 3, 4 clean holdouts)

Dataset	n	Exact	Adjacent	ESI 1 recall	ESI 5 recall
MIETIC clean (narrative)	200	85.0%	94.5%	56.7%	92.5%
MIMIC-IV-ED holdout	7,917	62.9%	97.9%	65.3%	25.0%
Lukina v3 (curated narrative)	201	58.2%	86.1%	80.0%	25.0%
MC-MED Stanford clean	1,000	57.2%	96.0%	18.0%	6.0%
ER-REASON (unseen variants, 200)	200	50.5%	93.5%	n/a	n/a

Validation metrics at best checkpoint (composite=0.791):

esi_exact: 0.827
esi_adjacent: 0.993
symptom_f1_micro: 0.706
symptom_p_micro: 0.641, symptom_r_micro: 0.786
flag_f1_macro: 0.994
ner_entity_f1: 1.000

19 head outputs

PRIMARY (1 head)           — the answer
  esi_head            5    softmax over ESI levels 1-5

CRITICAL DISPLAY (3)       — trust drivers for nurse UI
  symptom_head        176  multi-label concepts (chest_pain, sepsis_signs, etc.)
  resource_head       12   multi-label resource types
  resource_count_head 3    bucket 0 / 1 / 2+

PERCEPTION (8)             — engine-input + context
  flag_head           3    severe_pain_distress, on_anticoag, altered_ms
  vitals_head         6    HR, BP_sys, BP_dia, SpO2, RR, Temp_C
  ner_head            21   BIO spans
  medrec_head         2    on_anticoag, on_antiplatelet
  pain_head           1    0-10 regression
  age_head            1    years regression
  arrival_head        5    ambulance/walk-in/EMS/etc
  gender_head         3    M/F/U

SAFETY (2)                 — rare-positive critical
  airway_head         1    pos_weight=2500
  resus_head          1    pos_weight=200

AUXILIARY (5)              — regularization + outcome signals
  gestalt_head        5    outcome tier (OUT-1..5)
  disposition_head    9
  syndrome_head       15
  history_visits_head 3
  history_admits_head 3
  last_dx_head        30

Quick start

import torch
from transformers import AutoTokenizer

# 1. Get the architecture code
#    Either clone the source repo or copy train_bert_v47.py from this repo
from train_bert_v47 import V47MultiHeadBERT

ENCODER = "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext"
tokenizer = AutoTokenizer.from_pretrained("vadimbelsky/bert-v47-medical-triage", subfolder="tokenizer")

model = V47MultiHeadBERT(ENCODER)
state = torch.load("model.pt", map_location="cpu", weights_only=False)
model.load_state_dict(state, strict=False)
model.eval()

text = """52-year-old female arrived by ambulance with chest pain.
Vital signs: HR 86, BP 134/78, RR 16, SpO2 99%, T 36.4°C. Pain 7/10."""

enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
with torch.no_grad():
    out = model(enc["input_ids"], enc["attention_mask"])
esi_pred = int(out["esi_logits"].argmax(-1)) + 1     # 1..5
print(f"ESI: {esi_pred}")

Training

Encoder:        microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
Total params:   109.7M
Loss:           focal CE γ=2 + ordinal-distance penalty (esi)
                focal BCE γ=2 + pos_weight (airway/resus)
                BCE multi-label (symptom/resource heads)
Class weights:  effective-number-of-samples β=0.999 (Cui et al. 2019)
Optimization:   AdamW + cosine schedule + layer-wise LR decay (0.9/layer)
Precision:      bf16 mixed
Checkpoint:     0.7 × esi_exact + 0.3 × symptom_f1_micro (composite)
Best checkpoint composite: 0.791 (epoch 3 of 6, early stopped at epoch 3)

Limitations

Compact CC dialect heavy in training corpus (MIMIC-IV-ED dominates at 290K/354K records) — over-fits to short telegraphic CC + vitals format
Lukina-style structured narrative: 0 representation in train; eval shows 58% exact on this dialect (vs 85% on MIETIC where MIETIC-style examples ARE in train)
MC-MED Stanford: 57% exact, ESI 1 recall 18% — Stanford triage dialect under-represented
ESI 5 underperformance: ESI 5 (non-urgent) recall 25% on MIMIC; class is rare (~5K records) and easily mis-routed to ESI 3-4
Long-note format (full ED notes): max_length=512 truncates ER-REASON discharge summaries / H&P notes severely; ER-REASON unseen variants score 50.5% exact (but 93.5% adjacent)

Citation

@misc{belsky2026berttriage,
  title  = {BERT v47: Multi-head Decision Support for Emergency Triage},
  author = {Belski, Vadzim},
  year   = {2026},
  url    = {https://huggingface.co/vadimbelsky/bert-v47-medical-triage}
}

Disclaimer

This model is research software for clinical decision support, not a standalone diagnostic system. ESI predictions are advisory only and must be reviewed by a licensed clinician. The model has known limitations on rare ESI classes (1 and 5) and out-of-distribution narrative formats. Do not deploy in production triage workflows without thorough validation on your local patient population, IRB approval, and physician oversight.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for vadimbelsky/bert-v47-medical-triage

Base model

microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext

Finetuned

(164)

this model

vadimbelsky
/

bert-v47-medical-triage