🥤 SODA-BERT

Fine-tuned Arabic language model based on UBC-NLP/MARBERTv2, trained on the OmanSent dataset, the first dataset produced using the SODA data collection framework. This model focuses on sentiment analysis and text classification tasks in Arabic, with a particular emphasis on Omani and Gulf dialects.

📊 Model Details

Base model: UBC-NLP/MARBERTv2
Fine-tuning dataset:
- OmanSent (Omani dialect sentiment dataset, collected using the SODA framework — not yet publicly released)
Languages: Arabic (Modern Standard Arabic + Gulf/Omani dialects)
Task: Sentiment Analysis / Text Classification

🛠️ How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mktr/SODA-BERT")
model = AutoModelForSequenceClassification.from_pretrained("mktr/SODA-BERT")

text = "الي يقول العماني ما مال شغل تفل في وجهه"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)

# Map prediction to sentiment label
label_map = {0: "Negative", 1: "Positive", 2: "Neutral"}
predicted_label = label_map[predictions.item()]

print(f"Predicted Sentiment: {predicted_label}")

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for mktr/SODA-BERT

Base model

UBC-NLP/MARBERTv2

Finetuned

(36)

this model