🌍 Multilingual Topic Classifier

A multilingual text classification model fine-tuned on the SIB-200 dataset, capable of classifying text into 7 topics across 205 languages.

Model Details

Base model: xlm-roberta-base
Task: Text Classification (Topic)
Languages: 205
Developed by: Keshav0308

Topics

Label	Description
🌍 geography	Geographic content
🔬 science/technology	Science and tech content
🎬 entertainment	Entertainment content
🏛️ politics	Political content
🏥 health	Health and medical content
✈️ travel	Travel content
⚽ sports	Sports content

Performance

Metric	Score
Test Accuracy	69.17%
Test F1 Macro	67.62%
Languages	205

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Keshav0308/multilingual-topic-classifier"
)

# Works in any language!
classifier("The patient was diagnosed with pneumonia.")
# {'label': 'health', 'score': 0.999}

classifier("El equipo ganó el campeonato mundial de fútbol.")
# {'label': 'sports', 'score': 0.999}

Training Data

Fine-tuned on SIB-200 — a massively multilingual dataset with 205 languages.

Train samples: 143,705
Validation samples: 20,295
Test samples: 41,820

Downloads last month: 12

Safetensors

Model size

0.3B params

Tensor type

F32

Keshav0308
/

multilingual-topic-classifier