CAMeL-Lab/BAREC-Corpus-v1.0
Viewer • Updated • 69.4k • 1.84k • 2
This model is designed for fine-grained Arabic readability assessment at the sentence level, developed for the BAREC Shared Task 2025 (Strict Track). It is based on AraBERTv2 and fine-tuned using the BAREC corpus with a 19-level readability classification. The model uses D3Tok input variants and a combination of Cross-Entropy (CE) and Quadratic Weighted Kappa (WKL) losses.
| Split | QWK |
|---|---|
| Validation | 82.0% |
| Test (Public) | 84.2% |
| Blind Test* | 84.1% |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load the model and tokenizer
model_name = "shymaa25/barec-readability-sent-arabertv2-d3tok-ce-wkl-strict"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Predict readability for a single sentence
sentence = "هذه الجملة تتطلب مستوى قراءة متقدم."
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item() + 1 # Labels from 1–19
print(f"Sentence readability level: {pred}")
Base model
aubmindlab/bert-base-arabertv2