Text Classification
Transformers
Safetensors
English
bert
multi-label-classification
eurovoc
eu-law
regulation
Eval Results (legacy)
text-embeddings-inference
Instructions to use jngb-labs/eurovoc-bert-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jngb-labs/eurovoc-bert-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="jngb-labs/eurovoc-bert-base")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("jngb-labs/eurovoc-bert-base") model = AutoModelForSequenceClassification.from_pretrained("jngb-labs/eurovoc-bert-base") - Notebooks
- Google Colab
- Kaggle
EuroVoc Domain Classifier (bert-base-uncased)
A fine-tuned BERT model that classifies EU legislation into the 21 top-level EuroVoc thematic domains.
What it does
Given the preamble of an EU regulation, decision, or directive, the model predicts which of the 21 EuroVoc domains apply (multi-label classification). For example, a regulation about carbon border adjustments might be classified under Energy, Trade, and Environment.
Performance
| Metric | Score |
|---|---|
| F1 micro | 0.900 |
| F1 macro | 0.800 |
| Optimal threshold | 0.40 |
Evaluated on 890 held-out EU regulations published between September 2025 and March 2026. Ground truth labels were assigned by professional librarians at the EU Publications Office.
Context: Series Results
| Method | F1 (micro) | Cost |
|---|---|---|
| bert-base (this model) | 0.900 | €10 |
| Llama 3.1 8B (QLoRA) | 0.892 | €83 |
| EUBERT (fine-tuned) | 0.891 | free |
| TF-IDF + Logistic Regression | 0.799 | free |
| DeepSeek-R1-70B (zero-shot) | 0.562 | ~€12 |
Full write-up: Pimp My LM: A Fine-Tuning Tale of Bling and Basic
The 21 Labels
AGRI-FOODSTUFFS, AGRICULTURE FORESTRY AND FISHERIES,
BUSINESS AND COMPETITION, ECONOMICS, EDUCATION AND COMMUNICATIONS,
EMPLOYMENT AND WORKING CONDITIONS, ENERGY, ENVIRONMENT,
EUROPEAN UNION, FINANCE, GEOGRAPHY, INDUSTRY,
INTERNATIONAL ORGANISATIONS, INTERNATIONAL RELATIONS, LAW,
POLITICS, PRODUCTION TECHNOLOGY AND RESEARCH, SCIENCE,
SOCIAL QUESTIONS, TRADE, TRANSPORT
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "jngb-labs/eurovoc-bert-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
text = "Regulation establishing a carbon border adjustment mechanism..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits)
LABELS = [
"AGRI-FOODSTUFFS", "AGRICULTURE, FORESTRY AND FISHERIES",
"BUSINESS AND COMPETITION", "ECONOMICS", "EDUCATION AND COMMUNICATIONS",
"EMPLOYMENT AND WORKING CONDITIONS", "ENERGY", "ENVIRONMENT",
"EUROPEAN UNION", "FINANCE", "GEOGRAPHY", "INDUSTRY",
"INTERNATIONAL ORGANISATIONS", "INTERNATIONAL RELATIONS", "LAW",
"POLITICS", "PRODUCTION, TECHNOLOGY AND RESEARCH", "SCIENCE",
"SOCIAL QUESTIONS", "TRADE", "TRANSPORT"
]
threshold = 0.40
predictions = [LABELS[i] for i, p in enumerate(probs[0]) if p > threshold]
print(predictions)
Training Details
- Base model: bert-base-uncased (110M parameters)
- Training data: 63,918 EU regulations (preambles), sourced from EUR-Lex via CELLAR API
- Test data: 890 held-out regulations, labels assigned by EU Publications Office librarians
- Architecture: BERT pooler → Dropout(0.1) → Linear(768, 21) → Sigmoid
- Loss: BCEWithLogitsLoss
- Epochs: 3
- Hardware: Nvidia T4 (Google Colab)
- Training time: 162 minutes
- Cost: €10 (Colab compute units)
Limitations
- Trained on English-language preambles only (EU legislation is published in 24 languages)
- Multi-label threshold (0.40) was optimised on the test set; may need adjustment for other corpora
- Classification granularity is limited to 21 top-level domains; finer EuroVoc concepts are not predicted
License
Apache 2.0
Author
- Downloads last month
- 5
Model tree for jngb-labs/eurovoc-bert-base
Base model
google-bert/bert-base-uncasedEvaluation results
- F1 (micro)self-reported0.900
- F1 (macro)self-reported0.800