BenjaminZ24 commited on
Commit
e88809d
·
verified ·
1 Parent(s): a191527

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +251 -95
README.md CHANGED
@@ -1,199 +1,355 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
 
10
 
 
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
 
 
35
 
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
  ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
 
 
 
 
 
 
43
 
44
- [More Information Needed]
 
 
 
 
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
 
 
 
 
 
 
55
 
56
- [More Information Needed]
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
 
 
 
75
 
76
- ## Training Details
77
 
78
- ### Training Data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
81
 
82
- [More Information Needed]
 
 
 
83
 
84
- ### Training Procedure
 
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
 
 
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
 
 
 
100
 
101
- [More Information Needed]
102
 
103
- ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
 
 
 
 
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
 
 
 
 
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
 
 
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
 
129
- [More Information Needed]
130
 
131
- #### Summary
132
 
 
133
 
 
134
 
135
- ## Model Examination [optional]
 
 
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
138
 
139
- [More Information Needed]
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
 
 
 
 
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
 
 
156
 
157
- [More Information Needed]
158
 
159
- ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
166
 
167
- #### Software
168
 
169
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
170
 
171
- ## Citation [optional]
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
175
- **BibTeX:**
 
 
 
 
 
 
176
 
177
- [More Information Needed]
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
 
194
 
195
- [More Information Needed]
 
 
 
 
 
 
 
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ datasets:
4
+ - SetFit/amazon_reviews_multi_en
5
+ metrics:
6
+ - accuracy
7
+ pipeline_tag: text-classification
8
+ tags:
9
+ - sentiment-analysis
10
+ - text-classification
11
+ - roberta
12
+ - transformers
13
+ - ecommerce
14
+ - customer-reviews
15
+ - amazon-reviews
16
+ - streamlit
17
  ---
18
 
19
+ # EcoPulse AI Sentiment Classifier
20
 
21
+ This model is the fine-tuned sentiment classification model used in **EcoPulse AI**, an e-commerce customer review sentiment classification and voice reporting system. The model classifies Amazon-style customer reviews into three sentiment categories:
22
 
23
+ * **Negative**
24
+ * **Neutral**
25
+ * **Positive**
26
 
27
+ The model is designed to help e-commerce customer support teams quickly identify customer dissatisfaction, monitor neutral feedback, and summarize positive customer experiences.
28
+
29
+ ---
30
 
31
  ## Model Details
32
 
33
  ### Model Description
34
 
35
+ This model is a fine-tuned version of `cardiffnlp/twitter-roberta-base-sentiment-latest`. The base model was selected after comparing three Hugging Face transformer models for customer review sentiment classification. It achieved the strongest baseline accuracy among the tested candidates and was then fine-tuned on a balanced Amazon review dataset.
36
 
37
+ The model is used as **Pipeline 1** in the EcoPulse AI application. It takes raw customer review text as input and outputs a sentiment label with a confidence score. The Streamlit application then aggregates the predictions into sentiment distribution summaries, business recommendations, written reports, and audio briefings.
38
 
39
+ * **Developed by:** Junlei Wang and Zhuoyuan Zhang
40
+ * **Project:** EcoPulse AI
41
+ * **Model type:** RoBERTa-based sequence classification model
42
+ * **Language:** English
43
+ * **Task:** 3-class sentiment classification
44
+ * **Fine-tuned from:** `cardiffnlp/twitter-roberta-base-sentiment-latest`
45
+ * **Final model repository:** `zoeywwww/cardiffnlp-sentiment-3class-finetuned`
46
 
47
+ ---
48
 
49
+ ## Model Sources
50
 
51
+ * **Base model:** `cardiffnlp/twitter-roberta-base-sentiment-latest`
52
+ * **Fine-tuned model:** `zoeywwww/cardiffnlp-sentiment-3class-finetuned`
53
+ * **Dataset:** `SetFit/amazon_reviews_multi_en`
54
+ * **Demo application:** https://group10finalproject-ee3nfmeyomxalcieln8f8a.streamlit.app/
55
+ * **GitHub repository:** https://github.com/zoeywang524-beep/Group10_Final_project/blob/main/group10app.py
56
 
57
+ ---
58
 
59
+ ## Intended Use
60
 
61
  ### Direct Use
62
 
63
+ This model can be used directly for English e-commerce customer review sentiment classification. Given a customer review, it predicts one of three labels:
64
+
65
+ | Label ID | Label |
66
+ | -------: | -------- |
67
+ | 0 | Negative |
68
+ | 1 | Neutral |
69
+ | 2 | Positive |
70
+
71
+ Example use cases include:
72
 
73
+ * Classifying Amazon-style product reviews
74
+ * Monitoring customer satisfaction
75
+ * Identifying negative feedback for customer service escalation
76
+ * Supporting review summarization dashboards
77
+ * Generating structured sentiment inputs for business reports
78
 
79
+ ### Downstream Use
80
 
81
+ This model is used inside the EcoPulse AI Streamlit Cloud application. In the deployed application, the model performs review-level sentiment classification. The app then uses the predictions to calculate sentiment distribution, generate support recommendations, produce a written customer sentiment report, and trigger a text-to-speech pipeline for an audio dashboard briefing.
82
 
83
+ The full system follows this workflow:
84
+
85
+ ```text
86
+ Customer Review Text
87
+
88
+ Fine-Tuned RoBERTa Sentiment Classifier
89
+
90
+ Negative / Neutral / Positive Prediction + Confidence Score
91
+
92
+ Streamlit Business Logic Layer
93
+
94
+ Sentiment Summary + Support Recommendation + Written Report
95
+
96
+ Text-to-Speech Pipeline
97
+
98
+ Audio Dashboard Briefing
99
+ ```
100
 
101
  ### Out-of-Scope Use
102
 
103
+ This model is not intended for high-stakes decision-making without human review. It should not be used as the sole basis for customer compensation, employee evaluation, legal judgment, or automated enforcement decisions.
104
+
105
+ The model may not perform well on:
106
+
107
+ * Sarcastic reviews
108
+ * Ambiguous or mixed-emotion reviews
109
+ * Very short reviews without enough context
110
+ * Non-English text
111
+ * Highly domain-specific product terminology
112
+ * Reviews that require external context to interpret correctly
113
 
114
+ ---
115
 
116
  ## Bias, Risks, and Limitations
117
 
118
+ The model was fine-tuned on Amazon-style English review data. As a result, its performance is most relevant to e-commerce customer review classification and may not generalize equally well to other domains such as healthcare, finance, legal complaints, or social media conversations.
119
+
120
+ A known limitation is sarcasm detection. For example, a sentence such as:
121
 
122
+ > "Brilliant delivery, my package arrived completely crushed."
123
 
124
+ may be difficult because the word “Brilliant” is positive, while the full meaning of the sentence is negative. In the project’s manual Streamlit application test, the only misclassification occurred in a sarcastic review of this type.
125
 
126
+ Users should treat the model as a **first-line decision-support tool**, not a replacement for human judgment.
127
 
128
+ ---
129
+
130
+ ## Recommendations
131
+
132
+ Users should review low-confidence predictions and ambiguous cases manually. For business use, the model is best applied as an initial screening tool that helps support teams prioritize reviews for further investigation.
133
+
134
+ Recommended use:
135
+
136
+ * Use the model to flag likely negative reviews.
137
+ * Review sarcastic, mixed, or unclear cases manually.
138
+ * Combine model predictions with business rules and human oversight.
139
+ * Periodically update or fine-tune the model with newer customer review data.
140
+
141
+ ---
142
 
143
  ## How to Get Started with the Model
144
 
145
+ You can use the model with the Hugging Face `transformers` library.
146
 
147
+ ```python
148
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
149
+ import torch
150
+ import numpy as np
151
 
152
+ model_name = "zoeywwww/cardiffnlp-sentiment-3class-finetuned"
153
 
154
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
155
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
156
+
157
+ id2label = {
158
+ 0: "Negative",
159
+ 1: "Neutral",
160
+ 2: "Positive"
161
+ }
162
+
163
+ text = "The product arrived damaged and customer service did not respond."
164
+
165
+ inputs = tokenizer(
166
+ text,
167
+ return_tensors="pt",
168
+ truncation=True,
169
+ padding=True,
170
+ max_length=128
171
+ )
172
 
173
+ with torch.no_grad():
174
+ outputs = model(**inputs)
175
 
176
+ probabilities = torch.softmax(outputs.logits, dim=-1)[0].numpy()
177
+ predicted_id = int(np.argmax(probabilities))
178
+ predicted_label = id2label[predicted_id]
179
+ confidence = float(probabilities[predicted_id])
180
 
181
+ print("Predicted sentiment:", predicted_label)
182
+ print("Confidence:", round(confidence, 4))
183
+ ```
184
 
185
+ ---
186
 
187
+ ## Training Details
188
 
189
+ ### Training Data
190
 
191
+ The model was fine-tuned using the `SetFit/amazon_reviews_multi_en` dataset from Hugging Face. This dataset contains English Amazon review text and original star-rating labels.
192
 
193
+ The original 5-star labels were mapped into three sentiment classes:
194
 
195
+ | Original Rating Label | Star Rating Meaning | New Sentiment Label |
196
+ | --------------------: | ------------------- | ------------------- |
197
+ | 0 | 1-star | Negative |
198
+ | 1 | 2-star | Negative |
199
+ | 2 | 3-star | Neutral |
200
+ | 3 | 4-star | Positive |
201
+ | 4 | 5-star | Positive |
202
 
203
+ ### Dataset Splits Used in the Project
204
 
205
+ | Split | Number of Samples | Class Balance | Purpose |
206
+ | --------------------------- | ----------------: | --------------- | -------------------------- |
207
+ | Preliminary training sample | 9,000 | 3,000 per class | Candidate model comparison |
208
+ | Fine-tuning training set | 6,000 | 2,000 per class | Fine-tuning selected model |
209
+ | Validation set | 1,500 | 500 per class | Fine-tuning monitoring |
210
+ | Test set | 1,500 | 500 per class | Final evaluation |
211
 
212
+ The fine-tuning set was balanced across the three sentiment classes to reduce class imbalance effects.
213
 
214
+ ### Preprocessing
215
 
216
+ The preprocessing steps included:
217
 
218
+ 1. Loading English Amazon review data.
219
+ 2. Mapping 5-star labels into 3 sentiment labels.
220
+ 3. Creating balanced negative, neutral, and positive samples.
221
+ 4. Tokenizing review text using the tokenizer from `cardiffnlp/twitter-roberta-base-sentiment-latest`.
222
+ 5. Truncating and padding input text to support transformer-based classification.
223
 
224
+ ---
225
 
226
+ ## Training Procedure
227
 
228
+ The base model `cardiffnlp/twitter-roberta-base-sentiment-latest` was selected after comparing three candidate transformer models:
229
 
230
+ | Candidate Model | Baseline Accuracy | Runtime |
231
+ | -------------------------------------------------- | ----------------: | --------: |
232
+ | `cardiffnlp/twitter-roberta-base-sentiment-latest` | 0.6228 | 60.71 sec |
233
+ | `distilbert-base-uncased` | 0.3287 | 32.17 sec |
234
+ | `roberta-base` | 0.3306 | 61.68 sec |
235
 
236
+ The Cardiff RoBERTa model achieved the highest baseline accuracy and was selected for fine-tuning.
237
 
238
+ The selected model was fine-tuned for 1, 2, and 3 epochs. The Epoch 1 model was selected for deployment because it offered the best balance between validation loss, test performance, generalization stability, and runtime.
239
 
240
+ ### Fine-Tuning Results
241
 
242
+ | Epoch | Validation Loss | Validation Accuracy | Train Accuracy | Test Accuracy | Test Runtime |
243
+ | ----: | --------------: | ------------------: | -------------: | ------------: | -----------: |
244
+ | 1 | 0.6777 | 0.7127 | 0.7848 | 0.7040 | 10.66 sec |
245
+ | 2 | 0.7371 | 0.7167 | 0.8613 | 0.7093 | 10.98 sec |
246
+ | 3 | 0.9523 | 0.7140 | 0.9205 | 0.7093 | 10.78 sec |
247
 
248
+ Although Epoch 2 and Epoch 3 achieved slightly higher test accuracy, the improvement was small. Training accuracy increased strongly from Epoch 1 to Epoch 3, while test accuracy remained almost unchanged. Validation loss also increased after Epoch 1, suggesting a higher risk of overfitting in later epochs.
249
 
250
+ Therefore, the Epoch 1 model was selected for deployment.
251
 
252
+ ---
253
 
254
+ ## Evaluation
255
 
256
+ ### Testing Data
257
 
258
+ Final evaluation was conducted on an untouched balanced test set of 1,500 Amazon-style reviews:
259
 
260
+ * 500 negative reviews
261
+ * 500 neutral reviews
262
+ * 500 positive reviews
263
 
264
+ ### Metrics
265
 
266
+ The main evaluation metric was accuracy. Runtime was also recorded during model comparison and testing to assess deployment feasibility.
267
 
268
+ ### Results
269
 
270
+ The deployed fine-tuned model achieved:
271
 
272
+ | Metric | Value |
273
+ | ------------------- | --------: |
274
+ | Test Accuracy | 0.7040 |
275
+ | Test Runtime | 10.66 sec |
276
+ | Validation Loss | 0.6777 |
277
+ | Validation Accuracy | 0.7127 |
278
 
279
+ ### Streamlit Application Test
 
 
 
 
280
 
281
+ The deployed Streamlit Cloud application was manually tested using 10 unseen e-commerce customer review samples. The app correctly classified 9 out of 10 samples.
282
 
283
+ | Application Test Setting | Test Sample Size | Accuracy |
284
+ | --------------------------------------------- | ---------------: | -------: |
285
+ | Streamlit Cloud sentiment classification test | 10 | 90% |
286
 
287
+ The only misclassification occurred in a sarcastic review, showing a known limitation of sentiment models when handling sarcasm.
288
 
289
+ ---
290
 
291
+ ## Technical Specifications
292
 
293
+ ### Model Architecture and Objective
294
 
295
+ This model uses a RoBERTa-based transformer architecture for sequence classification. The input review text is tokenized and passed into the transformer encoder. A classification head maps the encoded representation into three sentiment categories. A softmax layer is used to produce class probabilities.
296
 
297
+ Simplified architecture:
298
 
299
+ ```text
300
+ Review Text
301
+
302
+ Tokenizer
303
+
304
+ RoBERTa Transformer Encoder
305
+
306
+ Classification Head
307
+
308
+ Softmax Probabilities
309
+
310
+ Negative / Neutral / Positive
311
+ ```
312
 
313
+ ### Software
314
 
315
+ The project used:
316
 
317
+ * Python
318
+ * PyTorch
319
+ * Hugging Face Transformers
320
+ * Hugging Face Datasets
321
+ * Hugging Face Hub
322
+ * Google Colab
323
+ * Streamlit
324
 
325
+ ### Compute Infrastructure
326
 
327
+ Fine-tuning and experiments were conducted in Google Colab. Exact hardware may vary depending on the assigned Colab runtime.
328
 
329
+ ---
330
 
331
+ ## Environmental Impact
332
 
333
+ Carbon emissions were not formally measured for this course project. Fine-tuning was conducted using Google Colab, and the training duration was limited by using a relatively small balanced fine-tuning dataset and only a small number of epochs.
334
 
335
+ ---
336
 
337
+ ## Citation
338
 
339
+ If you use this model, please cite the base model and dataset sources:
340
 
341
+ * Base model: `cardiffnlp/twitter-roberta-base-sentiment-latest`
342
+ * Dataset: `SetFit/amazon_reviews_multi_en`
343
 
344
+ ---
345
+
346
+ ## Model Card Authors
347
+
348
+ Junlei Wang
349
+ Zhuoyuan Zhang
350
+
351
+ ---
352
 
353
  ## Model Card Contact
354
 
355
+ For questions about this course project, please refer to the EcoPulse AI project report, GitHub repository, and Streamlit application.