im-bin-tf-abstr

This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1908
  • Accuracy: 0.9222
  • F1: 0.9220
  • Precision: 0.9267
  • Recall: 0.9174
  • Roc Auc: 0.9781

Model description

im-bin-tf-abstr is a fine-tuned [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedB ERT-base-uncased-abstract-fulltext)
model for binary classification of biomedical text as Internal Medicine (IM) or Other.
It was trained on 300,000 PubMed article titles using journal provenance as a distant
supervision signal: titles from journals covering any of the eleven IM sub-specialties
recognised by the German Society of Internal Medicine (DGIM) were labeled IM; titles from
six other disciplines (anesthesiology, otorhinolaryngology, gynecology, surgery, psychiatry,
neurology) were labeled Other. No manual annotation was used.

The model was developed as part of a dissertation at the DGIM and evaluated in a prospective
human study at the 129th Congress of Internal Medicine (Wiesbaden, 2023), where it
significantly outperformed board-certified internists on the same classification task.

Intended uses & limitations

  • Automatic specialty triage of biomedical abstracts or titles
  • Proof-of-concept specialty routing of clinical free text (see limitations)
  • Research on distant supervision and transfer learning in biomedical NLP

Not intended for: clinical decision support, diagnostic use, or any safety-critical
application.

Training and evaluation data

300,000 PubMed article titles from 77 medical journals, aggregated via the PubMed API.
Labels derived from the primary editorial scope of each source journal (distant supervision).

  • IM sub-specialties (positive class): angiology, cardiology, gastroenterology,
    nephrology, hematology, pulmonology, endocrinology, rheumatology, geriatrics,
    intensive care medicine, infectiology
  • Other disciplines (negative class): anesthesiology, otorhinolaryngology,
    gynecology, surgery, psychiatry, neurology

Dataset: Internal medicine and other specialties —
Kaggle

Training procedure

Fine-tuned from microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext:

Hyperparameter Value
Epochs 4
Learning rate 1e-5
Batch size (train / eval) 640 / 1280
Scheduler Cosine with 0.1 warmup ratio
Max sequence length 35 tokens
Optimizer AdamW (β₁=0.9, β₂=0.999, ε=1e-8)
Seed 42

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 640
  • eval_batch_size: 1280
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss Accuracy F1 Precision Recall Roc Auc
No log 1.0 375 0.2136 0.9124 0.9131 0.9087 0.9175 0.9733
0.3086 2.0 750 0.1971 0.9195 0.9190 0.9277 0.9104 0.9770
0.1917 3.0 1125 0.1908 0.9222 0.9220 0.9267 0.9174 0.9781
0.1791 4.0 1500 0.1909 0.9224 0.9224 0.9247 0.9202 0.9785

Evaluation results

Evaluated on a held-out test set of 100,000 titles (stratified, withheld from all training
steps):

Metric Value
Accuracy 0.926
F1 Score 0.926
Precision 0.924
Recall 0.927
ROC-AUC 0.926

In a prospective human study (n = 20 board-certified internists, 594 classifications),
the model achieved accuracy 0.931 vs. participant aggregate accuracy 0.773 (superiority test p < 10⁻¹⁴).

Label mapping

Label Class Interpretation
LABEL_0 Other Not internal medicine
LABEL_1 Internal Medicine IM sub-specialty

softmax(logits)[:, 1] gives P(Internal Medicine).

Limitations

  • Trained on article titles only — short, structured academic text. Performance
    attenuates on clinical free text (discharge summaries, notes) due to domain shift.
  • Labels derived from journal scope may not reflect clinical ground truth at disciplinary
    boundaries (e.g., intensive care / anesthesiology overlap).
  • Evaluated on English-language text only.

Framework versions

  • Transformers 4.31.0.dev0

  • Pytorch 2.0.0

  • Datasets 2.1.0

  • Tokenizers 0.13.3

    Citation

    @misc{gamstaetter2023model,
      author       = {Gamstaetter, Thomas},
      title        = {im-bin-tf-abstr: Fine-tuned {PubMedBERT} for binary internal medicine          
    classification},                                                                                 
      year         = {2023},                                                                         
      howpublished = {Hugging Face},                                                                 
      url          = {https://huggingface.co/tgamstaetter/im-bin-tf-abstr}
    }                                                                                                
    

    Associated preregistration: OSF — DOI 10.17605/OSF.IO/XFDBV

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support