---
library_name: transformers
license: apache-2.0
language:
- tr
tags:
- modernbert
- turkish
- encoder
- fill-mask
- masked-language-modeling
- nlp
datasets:
- HuggingFaceFW/fineweb-2
pipeline_tag: fill-mask
model-index:
- name: ModernBERT-TR
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: custom
name: Turkish NLP Benchmark (11 tasks)
metrics:
- type: f1
value: 60.2
name: Avg Score (Frozen Linear Probe)
- task:
type: text-classification
name: TabiBench (28 tasks)
dataset:
type: custom
name: TabiBench
metrics:
- type: f1
value: 77.28
name: Avg Score (Full Fine-Tuning)
---

# ModernBERT-TR
**A Modern Encoder Foundation Model for Turkish**
Besher Alkurdi, Himmet Toprak Kesgin, Muzaffer Kaan Yuce, Mehmet Fatih Amasyali
[Web Page](https://cosmos-ytu.github.io/modernbert-tr-1k/) · [Paper (soon)]() · [Training Code](https://github.com/Cosmos-YTU/ModernBERT) · [Evaluation Code](https://github.com/mrbesher/encoder-fast-eval)
## Overview
ModernBERT-TR is a 150M-parameter Turkish encoder pretrained from scratch on **144.4B tokens** using the [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) architecture. It uses a custom 50K WordPiece tokenizer optimized for Turkish morphology.
**Architecture:** 22 layers, 768 hidden, 12 heads, RoPE, GLU, alternating local-global attention, Flash Attention, sequence-packed training.
## Results
### Frozen Linear Probing (11 Turkish NLP tasks)
| Model | Params | Avg |
|---|---|---|
| **ModernBERT-TR (ours)** | **150M** | **60.2** |
| Turkish-E5-large | 560M | 53.2 |
| mmBERT | 307M | 54.9 |
| TabiBERT | ~150M | 49.1 |
| BERTurk | 111M | 35.3 |
+13.1% relative over next-best. +70.3% relative over BERTurk. Outperforms models up to 4x larger.
### TabiBench Full Fine-Tuning (28 tasks)
| Model | Params | Avg |
|---|---|---|
| ModernBERT-TR (ours) | 150M | 77.28 |
| **TabiBERT** | **~150M** | **77.58** |
| BERTurk | 110M | 75.96 |
Leads in 5/8 categories (text classification, STS, NLI, academic understanding, information retrieval). TabiBERT leads in code retrieval and QA (trained on code/math data).
## Usage
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("ytu-ce-cosmos/modernbert-tr-base-1k")
model = AutoModelForMaskedLM.from_pretrained("ytu-ce-cosmos/modernbert-tr-base-1k")
text = "Türkiye'nin başkenti [MASK]'dır."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
```
## Training Details
| | |
|---|---|
| **Data** | FineWeb-2 Turkish (41.2B tokens) + BertTurk Corpus 5x (31.0B tokens) = 72.2B/epoch, 2 epochs |
| **Tokenizer** | 50K WordPiece, trained on Turkish data |
| **Optimizer** | StableAdamW, peak LR 2e-4, cosine schedule |
| **Batch size** | 256 sequences (262K tokens/step) |
| **MLM masking** | 30% (train) / 15% (eval) |
| **Hardware** | 4x NVIDIA H100, 623 GPU-hours |
| **Precision** | BF16 mixed precision |
| **Context** | 1,024 tokens |
## Citation
```bibtex
@article{alkurdi2025modernberttr,
title={ModernBERT-TR: A Modern Encoder Foundation Model for Turkish},
author={Alkurdi, Besher and Kesgin, Himmet Toprak and Yuce, Muzaffer Kaan and Amasyali, Mehmet Fatih},
year={2025}
}
```
## Acknowledgments
Supported by Yildiz Technical University (FDK-2024-6070) and TUBITAK (124E055). Built on the [ModernBERT](https://github.com/AnswerDotAI/ModernBERT) codebase with [FineWeb-2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2) and [BertTurk Corpus](https://github.com/dbmdz/berts) data.