Arseniy-Sandalov commited on
Commit
f197e0a
ยท
verified ยท
1 Parent(s): 52a17c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -1
README.md CHANGED
@@ -13,4 +13,64 @@ base_model:
13
  pipeline_tag: text-classification
14
  tags:
15
  - Sentiment
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  pipeline_tag: text-classification
14
  tags:
15
  - Sentiment
16
+ ---
17
+
18
+ # Sentiment Analysis with Fine-tuned Multilingual BERT for Georgian ๐Ÿ‡ฌ๐Ÿ‡ช
19
+
20
+ ## ๐Ÿ“„ Model Overview
21
+ This is a **fine-tuned BERT model** for **Georgian sentiment analysis**, based on **`bert-base-multilingual-cased`**. The model was trained using the **Georgian Sentiment Analysis dataset**.
22
+
23
+ - **Base Model:** `bert-base-multilingual-cased`
24
+ - **Fine-tuned on:** `Arseniy-Sandalov/Georgian-Sentiment-Analysis`
25
+ - **Task:** Sentiment classification (positive, negative, neutral)
26
+ - **Tokenizer:** BERT multilingual cased tokenizer
27
+ - **License:** [Check dataset source](http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf)
28
+
29
+ ## ๐Ÿ‘‰ Usage Example
30
+ You can load and use this model with Hugging Face Transformers:
31
+
32
+ ```python
33
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
34
+ import torch
35
+
36
+ model_name = "your_huggingface_model_name"
37
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
38
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
39
+
40
+ def predict_sentiment(text):
41
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
42
+ with torch.no_grad():
43
+ outputs = model(**inputs)
44
+ prediction = torch.argmax(outputs.logits, dim=1).item()
45
+ return ["negative", "neutral", "positive"][prediction]
46
+
47
+ text = "แƒแƒฎแƒแƒšแƒ˜ แƒ›แƒ”แƒแƒ แƒ˜ แƒ™แƒแƒ แƒ’แƒ˜แƒ แƒ”แƒ แƒ—แƒ˜แƒšแƒ"
48
+ print(predict_sentiment(text))
49
+ ```
50
+ ## ๐Ÿ“Š Training Details
51
+
52
+ **Dataset Preprocessing:**
53
+
54
+ - Removed irrelevant columns (e.g., perturbation)
55
+
56
+ - Stratified split: 80% train, 10% validation, 10% test
57
+
58
+ **Evaluation Metric:**
59
+
60
+ - ROC AUC Score (computed on validation & test sets)
61
+
62
+ ## ๐Ÿ“– Citation
63
+
64
+ If you use this model, please cite the original dataset:
65
+ ```
66
+ @misc {Stefanovitch2023Sentiment,
67
+ author = {Stefanovitch, Nicolas and Piskorski, Jakub and Kharazi, Sopho},
68
+ title = {Sentiment analysis for Georgian},
69
+ year = {2023},
70
+ publisher = {European Commission, Joint Research Centre (JRC)},
71
+ howpublished = {\url{http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}},
72
+ url = {http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf},
73
+ type = {dataset},
74
+ note = {PID: http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}
75
+ }
76
+ ```