Instructions to use tgamstaetter/im-bin-tf-titles with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tgamstaetter/im-bin-tf-titles with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="tgamstaetter/im-bin-tf-titles")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("tgamstaetter/im-bin-tf-titles") model = AutoModelForSequenceClassification.from_pretrained("tgamstaetter/im-bin-tf-titles") - Notebooks
- Google Colab
- Kaggle
im-bin-tf-abstr
This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.1908
- Accuracy: 0.9222
- F1: 0.9220
- Precision: 0.9267
- Recall: 0.9174
- Roc Auc: 0.9781
Model description
im-bin-tf-abstr is a fine-tuned [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedB
ERT-base-uncased-abstract-fulltext)
model for binary classification of biomedical text as Internal Medicine (IM) or Other.
It was trained on 300,000 PubMed article titles using journal provenance as a distant
supervision signal: titles from journals covering any of the eleven IM sub-specialties
recognised by the German Society of Internal Medicine (DGIM) were labeled IM; titles from
six other disciplines (anesthesiology, otorhinolaryngology, gynecology, surgery, psychiatry,
neurology) were labeled Other. No manual annotation was used.
The model was developed as part of a dissertation at the DGIM and evaluated in a prospective
human study at the 129th Congress of Internal Medicine (Wiesbaden, 2023), where it
significantly outperformed board-certified internists on the same classification task.
Intended uses & limitations
- Automatic specialty triage of biomedical abstracts or titles
- Proof-of-concept specialty routing of clinical free text (see limitations)
- Research on distant supervision and transfer learning in biomedical NLP
Not intended for: clinical decision support, diagnostic use, or any safety-critical
application.
Training and evaluation data
300,000 PubMed article titles from 77 medical journals, aggregated via the PubMed API.
Labels derived from the primary editorial scope of each source journal (distant supervision).
- IM sub-specialties (positive class): angiology, cardiology, gastroenterology,
nephrology, hematology, pulmonology, endocrinology, rheumatology, geriatrics,
intensive care medicine, infectiology - Other disciplines (negative class): anesthesiology, otorhinolaryngology,
gynecology, surgery, psychiatry, neurology
Dataset: Internal medicine and other specialties —
Kaggle
Training procedure
Fine-tuned from microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext:
| Hyperparameter | Value |
|---|---|
| Epochs | 4 |
| Learning rate | 1e-5 |
| Batch size (train / eval) | 640 / 1280 |
| Scheduler | Cosine with 0.1 warmup ratio |
| Max sequence length | 35 tokens |
| Optimizer | AdamW (β₁=0.9, β₂=0.999, ε=1e-8) |
| Seed | 42 |
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 640
- eval_batch_size: 1280
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | Roc Auc |
|---|---|---|---|---|---|---|---|---|
| No log | 1.0 | 375 | 0.2136 | 0.9124 | 0.9131 | 0.9087 | 0.9175 | 0.9733 |
| 0.3086 | 2.0 | 750 | 0.1971 | 0.9195 | 0.9190 | 0.9277 | 0.9104 | 0.9770 |
| 0.1917 | 3.0 | 1125 | 0.1908 | 0.9222 | 0.9220 | 0.9267 | 0.9174 | 0.9781 |
| 0.1791 | 4.0 | 1500 | 0.1909 | 0.9224 | 0.9224 | 0.9247 | 0.9202 | 0.9785 |
Evaluation results
Evaluated on a held-out test set of 100,000 titles (stratified, withheld from all training
steps):
| Metric | Value |
|---|---|
| Accuracy | 0.926 |
| F1 Score | 0.926 |
| Precision | 0.924 |
| Recall | 0.927 |
| ROC-AUC | 0.926 |
In a prospective human study (n = 20 board-certified internists, 594 classifications),
the model achieved accuracy 0.931 vs. participant aggregate accuracy 0.773
(superiority test p < 10⁻¹⁴).
Label mapping
| Label | Class | Interpretation |
|---|---|---|
LABEL_0 |
Other | Not internal medicine |
LABEL_1 |
Internal Medicine | IM sub-specialty |
softmax(logits)[:, 1] gives P(Internal Medicine).
Limitations
- Trained on article titles only — short, structured academic text. Performance
attenuates on clinical free text (discharge summaries, notes) due to domain shift. - Labels derived from journal scope may not reflect clinical ground truth at disciplinary
boundaries (e.g., intensive care / anesthesiology overlap). - Evaluated on English-language text only.
Framework versions
Transformers 4.31.0.dev0
Pytorch 2.0.0
Datasets 2.1.0
Tokenizers 0.13.3
Citation
@misc{gamstaetter2023model, author = {Gamstaetter, Thomas}, title = {im-bin-tf-abstr: Fine-tuned {PubMedBERT} for binary internal medicine classification}, year = {2023}, howpublished = {Hugging Face}, url = {https://huggingface.co/tgamstaetter/im-bin-tf-abstr} }Associated preregistration: OSF — DOI 10.17605/OSF.IO/XFDBV
- Downloads last month
- 6