--- license: mit language: en library_name: scikit-learn tags: - emotion-classification - text-classification - affective-computing - scikit-learn - xgboost - lightgbm model_description: title: Pre-Trained Emotion Classifiers for AffectiveLens text: > This repository contains a collection of 6 machine learning models trained for the AffectiveLens Project. Each model is a classifier designed to predict the emotional valence (Positive, Negative, or Neutral) of a given text. These models were trained on 768-dimensional sentence embeddings generated by the distilbert-base-uncased model. The source text data comes from the GoEmotions dataset, which was processed and is available at psyrishi/MoodPulse. This repository allows you to easily download and use the final, trained artifacts of the study without needing to retrain them. models: - name: LightGBM_MicroF1_0.6240.pkl description: Champion Model - name: XGBoost_MicroF1_0.6205.pkl - name: Random_Forest_MicroF1_0.6192.pkl - name: CatBoost_MicroF1_0.6175.pkl - name: Logistic_Regression_MicroF1_0.6026.pkl - name: Linear_SVM_MicroF1_0.6005.pkl training_data: source: GoEmotions dataset repository: psyrishi/MoodPulse evaluation: champion_model: LightGBM f1_score_micro: 0.6223 accuracy: 62.23% citation: paper: authors: - Demszky, Dorottya - Movshovitz-Attias, Dana - Ko, Jeongwoo - Cowen, Alan - Nemade, Gaurav - Ravi, Sujith title: GoEmotions:A Dataset of Fine-Grained Emotions conference: 58th Annual Meeting of the Association for Computational Linguistics (ACL) year: 2020 datasets: - psyrishi/MoodPulse --- # Pre-Trained Emotion Classifiers for AffectiveLens ## Model Description This repository contains a collection of 6 machine learning models trained for the AffectiveLens Project. Each model is a classifier designed to predict the emotional valence (Positive, Negative, or Neutral) of a given text. These models were trained on 768-dimensional sentence embeddings generated by the `distilbert-base-uncased` model. The source text data comes from the GoEmotions dataset, which was processed and is available at [psyrishi/MoodPulse](https://huggingface.co/psyrishi/MoodPulse). This repository allows you to easily download and use the final, trained artifacts of the study without needing to retrain them. ## Models in this Repository The following scikit-learn compatible models are available. They are saved as `.pkl` files using `joblib`. The filename includes the model's F1-score on the validation set, which was used for initial selection. - **LightGBM_MicroF1_0.6240.pkl** (Champion Model) - **XGBoost_MicroF1_0.6205.pkl** - **Random_Forest_MicroF1_0.6192.pkl** - **CatBoost_MicroF1_0.6175.pkl** - **Logistic_Regression_MicroF1_0.6026.pkl** - **Linear_SVM_MicroF1_0.6005.pkl** ## How to Use You can easily download any model from this repository using the `huggingface_hub` library. You will also need a Transformer model (like `distilbert-base-uncased`) to generate the embeddings for your input text. Here is an example of how to download and use the champion model (LightGBM): ```python import joblib import torch from transformers import AutoTokenizer, AutoModel from huggingface_hub import hf_hub_download # --- 1. Load the Embedding Model and Tokenizer --- device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") embedding_model = AutoModel.from_pretrained("distilbert-base-uncased").to(device) embedding_model.eval() # --- 2. Download the Champion Classifier from the Hub --- REPO_ID = "psyrishi/affectivelens-emotion-models" FILENAME = "LightGBM_MicroF1_0.6240.pkl" print(f"Downloading model '{FILENAME}' from '{REPO_ID}'...") model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME) # --- 3. Load the Classifier --- classifier = joblib.load(model_path) print("Successfully loaded the champion classifier.") # --- 4. Create a Prediction Function --- def predict_emotion(text: str): # Tokenize the input text inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512) inputs = {k: v.to(device) for k, v in inputs.items()} # Get the embedding from the Transformer model with torch.no_grad(): outputs = embedding_model(**inputs) embedding = outputs.last_hidden_state[:, 0, :].cpu().numpy() # Use the classifier to predict prediction_index = classifier.predict(embedding)[0] emotion_labels = ['negative', 'neutral', 'positive'] return emotion_labels[prediction_index] # --- 5. Make a Prediction --- my_text = "This was an amazing experience, I am so happy!" predicted_emotion = predict_emotion(my_text) print(f"\nText: '{my_text}'") print(f"--> Predicted Emotion: {predicted_emotion}") ```` ## Training Procedure ### Training Data The models were trained on a processed version of the GoEmotions dataset. The full data pipeline, including the raw data, tokenized data, and final embeddings, is available at the [psyrishi/MoodPulse repository](https://huggingface.co/psyrishi/MoodPulse). The training set was balanced using `RandomOversampling`. ### Training Workflow The complete end-to-end workflow, from data preparation to model training and evaluation, is documented in the main project repository: [AffectiveLens on GitHub](https://github.com/psywarrior1998/AffectiveLens). ### Evaluation Results The models were evaluated on a held-out, unseen test set. The **LightGBM** model emerged as the champion performer. #### Champion Model (LightGBM) Performance on Test Set: * **F1-Score (Micro)**: 0.6223 * **Accuracy**: 62.23% ## Citation If you use these models, please cite the original GoEmotions paper: ``` @inproceedings{demszky2020goemotions, title={GoEmotions: A Dataset of Fine-Grained Emotions}, author={Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith}, booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2020} } ``` ## Licensing Information The models in this repository are licensed under the MIT License.