BERT v47 β€” Medical Triage Decision Support (19-head)

A 19-head BERT model for emergency department triage decision support. Predicts ESI (Emergency Severity Index) levels 1-5 from free-text triage narratives + supplemental heads for symptoms, resources, vitals, flags, and clinical context.

Architecture: BiomedBERT encoder (109M params) + 19 task heads, trained with focal loss, label smoothing, ordinal-distance penalty, layer-wise LR decay, and effective-number-of-samples class weighting.

Intended use: clinical decision support for triage nurses β€” produces ESI prediction with confidence, detected symptoms, suggested resources, and uncertainty signals. Not a standalone diagnostic system.

Eval results (epoch 3, 4 clean holdouts)

Dataset n Exact Adjacent ESI 1 recall ESI 5 recall
MIETIC clean (narrative) 200 85.0% 94.5% 56.7% 92.5%
MIMIC-IV-ED holdout 7,917 62.9% 97.9% 65.3% 25.0%
Lukina v3 (curated narrative) 201 58.2% 86.1% 80.0% 25.0%
MC-MED Stanford clean 1,000 57.2% 96.0% 18.0% 6.0%
ER-REASON (unseen variants, 200) 200 50.5% 93.5% n/a n/a

Validation metrics at best checkpoint (composite=0.791):

  • esi_exact: 0.827
  • esi_adjacent: 0.993
  • symptom_f1_micro: 0.706
  • symptom_p_micro: 0.641, symptom_r_micro: 0.786
  • flag_f1_macro: 0.994
  • ner_entity_f1: 1.000

19 head outputs

PRIMARY (1 head)           β€” the answer
  esi_head            5    softmax over ESI levels 1-5

CRITICAL DISPLAY (3)       β€” trust drivers for nurse UI
  symptom_head        176  multi-label concepts (chest_pain, sepsis_signs, etc.)
  resource_head       12   multi-label resource types
  resource_count_head 3    bucket 0 / 1 / 2+

PERCEPTION (8)             β€” engine-input + context
  flag_head           3    severe_pain_distress, on_anticoag, altered_ms
  vitals_head         6    HR, BP_sys, BP_dia, SpO2, RR, Temp_C
  ner_head            21   BIO spans
  medrec_head         2    on_anticoag, on_antiplatelet
  pain_head           1    0-10 regression
  age_head            1    years regression
  arrival_head        5    ambulance/walk-in/EMS/etc
  gender_head         3    M/F/U

SAFETY (2)                 β€” rare-positive critical
  airway_head         1    pos_weight=2500
  resus_head          1    pos_weight=200

AUXILIARY (5)              β€” regularization + outcome signals
  gestalt_head        5    outcome tier (OUT-1..5)
  disposition_head    9
  syndrome_head       15
  history_visits_head 3
  history_admits_head 3
  last_dx_head        30

Quick start

import torch
from transformers import AutoTokenizer

# 1. Get the architecture code
#    Either clone the source repo or copy train_bert_v47.py from this repo
from train_bert_v47 import V47MultiHeadBERT

ENCODER = "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext"
tokenizer = AutoTokenizer.from_pretrained("vadimbelsky/bert-v47-medical-triage", subfolder="tokenizer")

model = V47MultiHeadBERT(ENCODER)
state = torch.load("model.pt", map_location="cpu", weights_only=False)
model.load_state_dict(state, strict=False)
model.eval()

text = """52-year-old female arrived by ambulance with chest pain.
Vital signs: HR 86, BP 134/78, RR 16, SpO2 99%, T 36.4Β°C. Pain 7/10."""

enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
with torch.no_grad():
    out = model(enc["input_ids"], enc["attention_mask"])
esi_pred = int(out["esi_logits"].argmax(-1)) + 1     # 1..5
print(f"ESI: {esi_pred}")

Training

Encoder:        microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
Total params:   109.7M
Loss:           focal CE Ξ³=2 + ordinal-distance penalty (esi)
                focal BCE Ξ³=2 + pos_weight (airway/resus)
                BCE multi-label (symptom/resource heads)
Class weights:  effective-number-of-samples Ξ²=0.999 (Cui et al. 2019)
Optimization:   AdamW + cosine schedule + layer-wise LR decay (0.9/layer)
Precision:      bf16 mixed
Checkpoint:     0.7 Γ— esi_exact + 0.3 Γ— symptom_f1_micro (composite)
Best checkpoint composite: 0.791 (epoch 3 of 6, early stopped at epoch 3)

Limitations

  • Compact CC dialect heavy in training corpus (MIMIC-IV-ED dominates at 290K/354K records) β€” over-fits to short telegraphic CC + vitals format
  • Lukina-style structured narrative: 0 representation in train; eval shows 58% exact on this dialect (vs 85% on MIETIC where MIETIC-style examples ARE in train)
  • MC-MED Stanford: 57% exact, ESI 1 recall 18% β€” Stanford triage dialect under-represented
  • ESI 5 underperformance: ESI 5 (non-urgent) recall 25% on MIMIC; class is rare (~5K records) and easily mis-routed to ESI 3-4
  • Long-note format (full ED notes): max_length=512 truncates ER-REASON discharge summaries / H&P notes severely; ER-REASON unseen variants score 50.5% exact (but 93.5% adjacent)

Citation

@misc{belsky2026berttriage,
  title  = {BERT v47: Multi-head Decision Support for Emergency Triage},
  author = {Belski, Vadzim},
  year   = {2026},
  url    = {https://huggingface.co/vadimbelsky/bert-v47-medical-triage}
}

Disclaimer

This model is research software for clinical decision support, not a standalone diagnostic system. ESI predictions are advisory only and must be reviewed by a licensed clinician. The model has known limitations on rare ESI classes (1 and 5) and out-of-distribution narrative formats. Do not deploy in production triage workflows without thorough validation on your local patient population, IRB approval, and physician oversight.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vadimbelsky/bert-v47-medical-triage

Space using vadimbelsky/bert-v47-medical-triage 1