---
license: mit
language: en
library_name: scikit-learn
tags:
- emotion-classification
- text-classification
- affective-computing
- scikit-learn
- xgboost
- lightgbm
model_description:
  title: Pre-Trained Emotion Classifiers for AffectiveLens
  text: >
    This repository contains a collection of 6 machine learning models trained
    for the AffectiveLens Project. Each model is a classifier designed to
    predict the emotional valence (Positive, Negative, or Neutral) of a given
    text.

    These models were trained on 768-dimensional sentence embeddings generated
    by the distilbert-base-uncased model. The source text data comes from the
    GoEmotions dataset, which was processed and is available at
    psyrishi/MoodPulse.

    This repository allows you to easily download and use the final, trained
    artifacts of the study without needing to retrain them.
models:
- name: LightGBM_MicroF1_0.6240.pkl
  description: Champion Model
- name: XGBoost_MicroF1_0.6205.pkl
- name: Random_Forest_MicroF1_0.6192.pkl
- name: CatBoost_MicroF1_0.6175.pkl
- name: Logistic_Regression_MicroF1_0.6026.pkl
- name: Linear_SVM_MicroF1_0.6005.pkl
training_data:
  source: GoEmotions dataset
  repository: psyrishi/MoodPulse
evaluation:
  champion_model: LightGBM
  f1_score_micro: 0.6223
  accuracy: 62.23%
citation:
  paper:
    authors:
    - Demszky, Dorottya
    - Movshovitz-Attias, Dana
    - Ko, Jeongwoo
    - Cowen, Alan
    - Nemade, Gaurav
    - Ravi, Sujith
    title: GoEmotions:A Dataset of Fine-Grained Emotions
    conference: 58th Annual Meeting of the Association for Computational Linguistics (ACL)
    year: 2020
datasets:
- psyrishi/MoodPulse
---


# Pre-Trained Emotion Classifiers for AffectiveLens

## Model Description

This repository contains a collection of 6 machine learning models trained for the AffectiveLens Project. Each model is a classifier designed to predict the emotional valence (Positive, Negative, or Neutral) of a given text.

These models were trained on 768-dimensional sentence embeddings generated by the `distilbert-base-uncased` model. The source text data comes from the GoEmotions dataset, which was processed and is available at [psyrishi/MoodPulse](https://huggingface.co/psyrishi/MoodPulse).

This repository allows you to easily download and use the final, trained artifacts of the study without needing to retrain them.

## Models in this Repository

The following scikit-learn compatible models are available. They are saved as `.pkl` files using `joblib`. The filename includes the model's F1-score on the validation set, which was used for initial selection.

- **LightGBM_MicroF1_0.6240.pkl** (Champion Model)
- **XGBoost_MicroF1_0.6205.pkl**
- **Random_Forest_MicroF1_0.6192.pkl**
- **CatBoost_MicroF1_0.6175.pkl**
- **Logistic_Regression_MicroF1_0.6026.pkl**
- **Linear_SVM_MicroF1_0.6005.pkl**

## How to Use

You can easily download any model from this repository using the `huggingface_hub` library. You will also need a Transformer model (like `distilbert-base-uncased`) to generate the embeddings for your input text.

Here is an example of how to download and use the champion model (LightGBM):

```python
import joblib
import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download

# --- 1. Load the Embedding Model and Tokenizer ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
embedding_model = AutoModel.from_pretrained("distilbert-base-uncased").to(device)
embedding_model.eval()

# --- 2. Download the Champion Classifier from the Hub ---
REPO_ID = "psyrishi/affectivelens-emotion-models"
FILENAME = "LightGBM_MicroF1_0.6240.pkl"

print(f"Downloading model '{FILENAME}' from '{REPO_ID}'...")
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# --- 3. Load the Classifier ---
classifier = joblib.load(model_path)
print("Successfully loaded the champion classifier.")

# --- 4. Create a Prediction Function ---
def predict_emotion(text: str):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Get the embedding from the Transformer model
    with torch.no_grad():
        outputs = embedding_model(**inputs)
        embedding = outputs.last_hidden_state[:, 0, :].cpu().numpy()

    # Use the classifier to predict
    prediction_index = classifier.predict(embedding)[0]
    emotion_labels = ['negative', 'neutral', 'positive']
    
    return emotion_labels[prediction_index]

# --- 5. Make a Prediction ---
my_text = "This was an amazing experience, I am so happy!"
predicted_emotion = predict_emotion(my_text)
print(f"\nText: '{my_text}'")
print(f"--> Predicted Emotion: {predicted_emotion}")
````

## Training Procedure

### Training Data

The models were trained on a processed version of the GoEmotions dataset. The full data pipeline, including the raw data, tokenized data, and final embeddings, is available at the [psyrishi/MoodPulse repository](https://huggingface.co/psyrishi/MoodPulse). The training set was balanced using `RandomOversampling`.

### Training Workflow

The complete end-to-end workflow, from data preparation to model training and evaluation, is documented in the main project repository: [AffectiveLens on GitHub](https://github.com/psywarrior1998/AffectiveLens).

### Evaluation Results

The models were evaluated on a held-out, unseen test set. The **LightGBM** model emerged as the champion performer.

#### Champion Model (LightGBM) Performance on Test Set:

* **F1-Score (Micro)**: 0.6223
* **Accuracy**: 62.23%

## Citation

If you use these models, please cite the original GoEmotions paper:

```
@inproceedings{demszky2020goemotions,
  title={GoEmotions: A Dataset of Fine-Grained Emotions},
  author={Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2020}
}
```

## Licensing Information

The models in this repository are licensed under the MIT License.