Multilingual Emotional Classifier XLM-RoBERTa-Large

This model is a fine-tuned FacebookAI/xlm-roberta-large sequence classifier for multilingual emotion recognition in English and Spanish dialogue utterances.

Source code: Mario-RC/multilingual-emotion-classifier

Model Details

Emotion Labels

The model predicts one of seven normalized emotion labels:

anger, disgust, fear, happiness, neutral, sadness, surprise

Training Data

The training pipeline combines DailyDialog and EmpatheticDialogues-derived CSV resources into a multilingual English/Spanish dataset. EmpatheticDialogues labels were mapped into the seven normalized categories above, while ambiguous or underrepresented labels were removed. The training split was resampled to reduce the majority neutral class and upsample minority classes.

Training Setup

  • Framework: Hugging Face Transformers
  • Base checkpoint: FacebookAI/xlm-roberta-large
  • Task: sequence classification
  • Max sequence length: 128
  • Epochs: 3
  • Learning rate: 5e-6
  • Batch size: 32
  • Dropout: 0.2
  • Seed: 42

Evaluation

The confusion matrix shows true vs. predicted emotion labels on the multilingual test split. Diagonal cells indicate correct classifications, while off-diagonal cells show the main confusions between emotion classes.

Confusion matrix

Model Comparison

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model_id = "mario-rc/multilingual-emotional-classifier-xlm-roberta-large"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

print(classifier("I feel great today."))
print(classifier("Estoy preocupado por manana."))

Limitations

The model is designed for short dialogue utterances and seven broad emotion categories. Predictions may be less reliable for long documents, sarcasm, mixed emotions, domain-specific language, or languages beyond English and Spanish.

Citation

This work is detailed in Section 4.4.3, User Emotion Recognition, of:

Personal Assistant with Emotional and Multilingual Capabilities for Social Robots
M. Rodriguez-Cantelar, PhD Dissertation, Universidad Politecnica de Madrid, 2025.

Downloads last month
77
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mario-rc/multilingual-emotional-classifier-xlm-roberta-large

Finetuned
(964)
this model

Collection including mario-rc/multilingual-emotional-classifier-xlm-roberta-large

Evaluation results

  • Test Accuracy on Multilingual DailyDialog and EmpatheticDialogues-derived corpus
    self-reported
    0.664
  • Test Macro F1 on Multilingual DailyDialog and EmpatheticDialogues-derived corpus
    self-reported
    0.666
  • Test Macro Precision on Multilingual DailyDialog and EmpatheticDialogues-derived corpus
    self-reported
    0.675
  • Test Macro Recall on Multilingual DailyDialog and EmpatheticDialogues-derived corpus
    self-reported
    0.664