---
license: mit
language:
- ar
base_model:
- CAMeL-Lab/bert-base-arabic-camelbert-msa-pos-msa
pipeline_tag: token-classification
---

# CAMeLBERT-MSA-POS-MSA-Lemma-Clustering
# Model Description

CAMeLBERT-MSA-POS-MSA-Lemma-Clustering is a Modern Standard Arabic (MSA) lemmatization model.
It is built by fine-tuning the [CAMeLBERT-MSA-POS-MSA](https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa-pos-msa) model on the [Penn Arabic Treebank (PATB)](https://dl.acm.org/doi/pdf/10.5555/1621804.1621808) training set. This model approaches lemmatization as a classification task, where each lemma is represented as a unique class within a clustered lemma vocabulary.
The fine-tuning procedure, hyperparameters, and detailed methodology are presented in our paper [“Lemmatization as a Classification Task: Results from Arabic across Multiple Genres”](https://aclanthology.org/2025.emnlp-main.1525/)


# Intended uses
This model is integrated into the lemmatization workflow available in our [GitHub repository](https://github.com/CAMeL-Lab/lemmatization-as-classification).