MohammadJRanjbar/PersianPunc
Viewer • Updated • 823k • 24 • 4
This model is a fine-tuned version of ParsBERT for Persian punctuation restoration.
Evaluated on 1,000 test sentences:
| Punctuation | Precision | Recall | F1-Score |
|---|---|---|---|
| Persian Comma (،) | 84.08% | 76.35% | 80.03% |
| Period (.) | 98.55% | 98.86% | 98.71% |
| Question (؟) | 87.50% | 90.32% | 88.89% |
| Colon (:) | 91.37% | 89.55% | 90.45% |
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "MohammadJRanjbar/parsbert-persian-punctuation"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Prepare input (text without punctuation)
text = "سلام چطوری امروز هوا خیلی خوبه"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
# Label mapping
id2label = {
0: "EMPTY",
1: "COMMA",
2: "QUESTION",
3: "PERIOD",
4: "COLON"
}
# Map predictions to punctuation
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [id2label[p.item()] for p in predictions[0]]
# Reconstruct text with punctuation
punct_map = {
"COMMA": "،",
"QUESTION": "؟",
"PERIOD": ".",
"COLON": ":"
}
result = []
for token, label in zip(tokens, labels):
if token in ["[CLS]", "[SEP]", "[PAD]"]:
continue
result.append(token)
if label in punct_map:
result.append(punct_map[label])
punctuated_text = " ".join(result).replace(" ##", "")
print(punctuated_text)
If you use this model, please cite:
@misc{kalahroodi2026persianpunclargescaledatasetbertbased,
title={PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration},
author={Mohammad Javad Ranjbar Kalahroodi and Heshaam Faili and Azadeh Shakery},
year={2026},
eprint={2603.05314},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.05314},
}
MIT License
For questions or issues, please contact: mohammadjranjbar@ut.ac.ir
Base model
HooshvareLab/bert-base-parsbert-uncased