Text Classification
sentence-transformers
Safetensors
English
hybrid-sentiment-classifier
sentiment-analysis
multiclass-classification
xgboost
reddit
hybrid-model
Instructions to use mahekgheewala/sentimental_analysis_updated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use mahekgheewala/sentimental_analysis_updated with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("mahekgheewala/sentimental_analysis_updated") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
File size: 3,822 Bytes
99561ed | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
license: mit
tags:
- sentiment-analysis
- text-classification
- multiclass-classification
- sentence-transformers
- xgboost
- reddit
- hybrid-model
language:
- en
metrics:
- accuracy
- f1
pipeline_tag: text-classification
widget:
- text: "I love this product! It's amazing and works perfectly."
example_title: "Positive Example"
- text: "This is terrible. I hate it so much."
example_title: "Negative Example"
- text: "The weather is okay today."
example_title: "Neutral Example"
---
# Reddit Sentiment Analysis - Hybrid Model
🎯 **Test Accuracy: 0.9966**
## Model Description
This hybrid sentiment analysis model combines **Sentence Transformers** for semantic embeddings with **XGBoost** for classification. Trained on Reddit comments for multiclass sentiment analysis: **Negative**, **Positive**, and **Neutral**.
### Architecture
```
Input Text → SentenceTransformer → Embeddings (768D) →
Feature Engineering (Length + Sentiment + POS) → XGBoost → Prediction
```
## Quick Start
```python
import pickle
import numpy as np
from sentence_transformers import SentenceTransformer
from textblob import TextBlob
import nltk
from huggingface_hub import hf_hub_download
# Download NLTK data
nltk.download('punkt', quiet=True)
nltk.download('averaged_perceptron_tagger', quiet=True)
# Load models
xgb_path = hf_hub_download(repo_id="USERNAME/sentimental_analysis_updated", filename="xgboost_model.pkl")
sentence_path = hf_hub_download(repo_id="USERNAME/sentimental_analysis_updated", filename="sentence_transformer")
# Load XGBoost model
with open(xgb_path, 'rb') as f:
pipeline_data = pickle.load(f)
xgb_model = pipeline_data['xgboost_model']
label_names = pipeline_data['label_names']
# Load SentenceTransformer
sentence_model = SentenceTransformer(sentence_path)
def predict_sentiment(text):
# Extract features
embedding = sentence_model.encode([text])
comment_length = np.array([len(text.split())]).reshape(-1, 1)
sentiment_polarity = np.array([TextBlob(text).sentiment.polarity]).reshape(-1, 1)
# POS counts
try:
tags = nltk.pos_tag(nltk.word_tokenize(text))
pos_counts = np.array([[
sum(1 for _, tag in tags if tag.startswith('J')), # Adjectives
sum(1 for _, tag in tags if tag.startswith('N')), # Nouns
sum(1 for _, tag in tags if tag.startswith('V')) # Verbs
]])
except:
pos_counts = np.array([[0, 0, 0]])
# Combine features
features = np.hstack([embedding, comment_length, sentiment_polarity, pos_counts])
# Predict
prediction = xgb_model.predict(features)[0]
confidence = xgb_model.predict_proba(features)[0].max()
return {
'label': label_names[prediction],
'confidence': confidence,
'prediction_id': int(prediction)
}
# Example usage
result = predict_sentiment("I love this new phone! It's amazing!")
print(f"Sentiment: {result['label']} (confidence: {result['confidence']:.3f})")
```
## Model Details
- **Base Model**: `paraphrase-mpnet-base-v2`
- **Classifier**: XGBoost with GPU acceleration
- **Features**: 772 dimensions (768 embeddings + 4 engineered)
- **Classes**: 0=Negative, 1=Positive, 2=Neutral
- **Training Data**: Reddit comments
- **Test Accuracy**: 0.9966
## Training Configuration
- **XGBoost Parameters**: n_estimators=300, learning_rate=0.05, max_depth=6
- **Features**: Embeddings + Comment Length + TextBlob Sentiment + POS Counts
- **Class Balancing**: Sample weights for imbalanced data
- **Validation**: Stratified train/val/test split
## Citation
```bibtex
@misc{reddit-sentiment-hybrid,
title={Reddit Sentiment Analysis - Hybrid Model},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/USERNAME/sentimental_analysis_updated}
}
```
## License
MIT License
|