Sentiment Analysis with Fine-tuned Multilingual BERT for Georgian 🇬🇪

📄 Model Overview

This is a fine-tuned BERT model for Georgian sentiment analysis, based on bert-base-multilingual-cased. The model was trained using the Georgian Sentiment Analysis dataset.

Base Model: bert-base-multilingual-cased
Fine-tuned on: Arseniy-Sandalov/Georgian-Sentiment-Analysis
Task: Sentiment classification (positive, negative, neutral)
Tokenizer: BERT multilingual cased tokenizer
License: Check dataset source

👉 Usage Example

You can load and use this model with Hugging Face Transformers:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "Arseniy-Sandalov/GeorgianBert-Sent"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return ["negative", "neutral", "positive"][prediction]

text = "ახალი მეარი კარგია ერთილა"
print(predict_sentiment(text))

📊 Training Details

Dataset Preprocessing:

Removed irrelevant columns (e.g., perturbation)
Stratified split: 80% train, 10% validation, 10% test

Evaluation Metric:

ROC AUC Score (computed on validation & test sets)

📖 Citation

If you use this model, please cite the original dataset:

@misc {Stefanovitch2023Sentiment,
  author = {Stefanovitch, Nicolas and Piskorski, Jakub and Kharazi, Sopho},
  title = {Sentiment analysis for Georgian},
  year = {2023},
  publisher = {European Commission, Joint Research Centre (JRC)},
  howpublished = {\url{http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}},
  url = {http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf},
  type = {dataset},
  note = {PID: http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}
}

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Arseniy-Sandalov/GeorgianBert-Sent

Base model

google-bert/bert-base-multilingual-cased

Finetuned

(999)

this model

Arseniy-Sandalov
/

GeorgianBert-Sent