birgermoell/saga-swedish-health-linear-probe

This repository is an example model card for Assignment 1 in the 5LN712 Information Retrieval course. It shows one complete submission pattern: define a domain problem, create a custom dataset, encode text with embeddings, train classifiers, evaluate them on a held-out split, and deploy a working Hugging Face demo.

Assignment fit

Domain issue: health-information retrieval systems often need to route short texts to an appropriate source family before search, filtering, or question answering.
Embedding challenge: all labels discuss health, but they differ in audience, vocabulary, genre, and institutional purpose.
Embedding model: nicher92/saga-embed_v1.
Classifier: regularized logistic regression trained on frozen embeddings.
Deliverables represented here: custom dataset, trained model, metrics, model card, and Gradio Space.

Current example project

The example task is Swedish Health Source Triage. Given a short health-information text, the system predicts the most likely source family:

1177.se
socialstyrelsen.se
lakemedelsverket.se

The labels are intentionally source-oriented rather than diagnosis-oriented. That keeps the project aligned with Information Retrieval: routing, source selection, filtering, collection analysis, and evaluation.

Files in this model repository

model.joblib: fitted scikit-learn classifier.
resolved_config.yaml: resolved training configuration.
metrics.json: machine-readable evaluation metrics.
predictions.csv: held-out predictions for inspection.
report.md: short generated training report.
embedding_classifier_pipeline_explainer.pdf: student-facing explanation of how embeddings and the classifier work together.

Evaluation

Test size: 0.25
Random seed: 712
Accuracy: 1.0
Macro F1: 1.0
Weighted F1: 1.0

The current dataset is small and deliberately clean, so high scores should be read as a successful pipeline demonstration rather than proof of a robust medical or public-sector classifier. A stronger student submission should expand the dataset, include harder negative examples, and discuss errors.

Reproduce the run

saga-ir inspect --config configs/assignment1_swedish_health.yaml
saga-ir linear-probe --config configs/assignment1_swedish_health.yaml --batch-size 8 --device cpu
saga-ir prototypes --config configs/assignment1_swedish_health.yaml --shots-per-source 4 --batch-size 8 --device cpu
saga-ir one-vs-rest --config configs/assignment1_swedish_health.yaml \
  --target-sources 1177.se \
  --target-sources socialstyrelsen.se \
  --target-sources lakemedelsverket.se \
  --batch-size 8 --device cpu

These runs are designed to work on a 32 GB MacBook. The frozen-embedding methods are the recommended baseline because they are fast, inspectable, and easy to explain.

What students should copy from this example

State the problem before training the model.
Separate the embedding model from the classifier trained for the assignment.
Publish a custom dataset, not only code.
Save metrics and prediction examples.
Link the dataset, model, Space, GitHub repo, and report.
Explain limitations honestly, especially when a dataset is small.
Reflect on what AI tools helped with and what was manually checked.

Downloadable student explainer

A longer PDF explanation is included in this repository:

Demo inputs to try

Patient guidance: fever, back pain, pollen allergy, sleep problems.
Authority/statistics: national indicators, regional comparisons, guidelines.
Medicine regulation: recalls, product information updates, safety warnings.

Intended use and limits

This model is for teaching embeddings-based text classification in an Information Retrieval course. It is not medical advice, not a clinical triage system, and not a replacement for human review of health information.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for birgermoell/saga-swedish-health-linear-probe

Base model

answerdotai/ModernBERT-base

Finetuned

AI-Sweden-Models/ModernBERT-base

Finetuned

nicher92/saga-embed_v1

Finetuned

(1)

this model

birgermoell
/

saga-swedish-health-linear-probe

birgermoell/saga-swedish-health-linear-probe

Assignment fit

Current example project

Links

Files in this model repository

Evaluation

Reproduce the run

What students should copy from this example

Downloadable student explainer

Demo inputs to try

Intended use and limits

Model tree for birgermoell/saga-swedish-health-linear-probe

Dataset used to train birgermoell/saga-swedish-health-linear-probe

Space using birgermoell/saga-swedish-health-linear-probe 1