---
license: mit
datasets:
- Arseniy-Sandalov/Georgian-Sentiment-Analysis
language:
- ka
metrics:
- f1
- roc_auc
- accuracy
base_model:
- google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
tags:
- Sentiment
---

# Sentiment Analysis with Fine-tuned Multilingual BERT for Georgian 🇬🇪

## 📄 Model Overview
This is a **fine-tuned BERT model** for **Georgian sentiment analysis**, based on **`bert-base-multilingual-cased`**. The model was trained using the **Georgian Sentiment Analysis dataset**.

- **Base Model:** `bert-base-multilingual-cased`
- **Fine-tuned on:** `Arseniy-Sandalov/Georgian-Sentiment-Analysis`
- **Task:** Sentiment classification (positive, negative, neutral)
- **Tokenizer:** BERT multilingual cased tokenizer
- **License:** [Check dataset source](http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf)

## 👉 Usage Example
You can load and use this model with Hugging Face Transformers:

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "Arseniy-Sandalov/GeorgianBert-Sent"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return ["negative", "neutral", "positive"][prediction]

text = "ახალი მეარი კარგია ერთილა"
print(predict_sentiment(text))
```
## 📊 Training Details

**Dataset Preprocessing:**

- Removed irrelevant columns (e.g., perturbation)

- Stratified split: 80% train, 10% validation, 10% test

**Evaluation Metric:**

- ROC AUC Score (computed on validation & test sets)

## 📖 Citation

If you use this model, please cite the original dataset:
```
@misc {Stefanovitch2023Sentiment,
  author = {Stefanovitch, Nicolas and Piskorski, Jakub and Kharazi, Sopho},
  title = {Sentiment analysis for Georgian},
  year = {2023},
  publisher = {European Commission, Joint Research Centre (JRC)},
  howpublished = {\url{http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}},
  url = {http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf},
  type = {dataset},
  note = {PID: http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}
}
```