You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Persian Sentence Completion Classifier

A BERT-based classifier that determines whether a Persian sentence is Complete or Incomplete.
Designed for use in ASR post-processing pipelines (e.g. after speech-to-text).

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_ID = "MohammadJRanjbar/persian-sentence-completion"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()

label_map = {0: "Incomplete", 1: "Complete"}

def classify_sentences(sentences, batch_size=16):
    results = []
    for i in range(0, len(sentences), batch_size):
        batch = sentences[i:i+batch_size]
        inputs = tokenizer(batch, return_tensors="pt", padding=True,
                           truncation=True, max_length=128)
        with torch.no_grad():
            outputs = model(**inputs)
            preds = torch.argmax(outputs.logits, dim=-1)
            results.extend([label_map[p.item()] for p in preds])
    return results

# Example
texts = ["امروز هوا خیلی عالی", "امروز هوا خیلی عالی است."]
print(classify_sentences(texts))
# → ['Incomplete', 'Complete']

Citation

If you use this model, please cite the following works:

@misc{kalahroodi2026persianpunclargescaledatasetbertbased,
      title={PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration}, 
      author={Mohammad Javad Ranjbar Kalahroodi and Heshaam Faili and Azadeh Shakery},
      year={2026},
      eprint={2603.05314},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.05314}, 
}

@misc{kalahroodi2025parsvoicelargescalemultispeakerpersian,
    title = {ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis},
    author = {Mohammad Javad Ranjbar Kalahroodi and Heshaam Faili and Azadeh Shakery},
    year = {2025},
    eprint = {2510.10774},
    archivePrefix = {arXiv},
    primaryClass = {cs.SD},
    url = {https://arxiv.org/abs/2510.10774},
}
Downloads last month
27
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MohammadJRanjbar/persian-sentence-completion

Finetuned
(23)
this model

Dataset used to train MohammadJRanjbar/persian-sentence-completion

Papers for MohammadJRanjbar/persian-sentence-completion