FLAN-T5 COVID-19 Vaccine Stance Classification
This repository contains my submission for the take-home coding assessment regarding the LLM Research Opportunity under Sean Yun-Shiuan Chuang, Junjie Hu, and Tim Rogers.
This model is currently public for easy visibility during the coding assessment, but will be made private afterwards.
Task Summary
Predict the stance of each tweet (in-favor, against, or neutral-or-unclear) from a CSV of 5,751 tweets regarding the COVID-19 vaccination using flan-t5-large.
Project Structure
predict.py- Predicts the model's labels on a given dataset and saves the result into output/.eval.py- Model evaluation.utils.py- Shared helper functions.train.py- Fine-tuning code on a given dataset.requirements.txt- Package installs for reproducibility.data/- Contains original dataset.output/- Contains prediction output files and heldout dataset files from train/test splitting.finetune/- contains all files of the fine-tuned model. Includes epoch checkpoint files.
Setup
Install dependencies:
pip install transformers torch pandas scikit-learn sentencepiece datasets
OR
pip install -r requirements.txt
Quick Start
To run the fine-tuned model as-is:
python3 predict.py # manually change dataset path if needed
python3 eval.py # for evaluation
Development Summary
- Initial zero-shot prompting (no fine-tuning) revealed the model never predicted
neutral-or-unclear. Overall F1 score was 0.428. - In order to speed up initial fine-tuning on T4 GPU,
flan-t5-basewas used until final evaluations were done usingflan-t5-large.- Initial attempt at fine-tuning (no upsampling) had poor
neutral-or-unclearrecall (0.18). Overall F1 score was 0.518. - Fine-tuning with upsampling on
neutral-or-unclearwith an 80/20 train/test split on the first 2,000 records, and then running predictions on the following 1,500 records yielded an F1 score of 0.562. (3 Epochs) - Fine-tuning with upsampling only on
neutral-or-unclearon the entire dataset with a 80/20 train/test split showed average precision foragainst(0.59). Overall F1 score was 0.690. (3 Epochs) - Fine-tuning with upsampling on both
neutral-or-unclearandagainstlead to an F1 score of 0.724. (3 Epochs)\
- Initial attempt at fine-tuning (no upsampling) had poor
- Final fine-tuning on
flan-t5-largewas done on 2 epochs in bf16 format to account for T4 GPU limitations.- Fine-tuning with
flan-t5-largeon an 80/20 split with predictions ran on the heldout dataset resulted in an F1 score of 0.782. - Final version of the fine-tuned model, with predictions run on the entirety of the original dataset provided a final F1 score of 0.772 with an accuracy of 0.801.
- Fine-tuning with
Potential Improvements
- Further tinkering with prompt could yield improved results. Brevity is a key obstacle in prompt generation.
- 3 epochs over 2 for fine-tuning
flan-t5-largecould also provide improved F1 on more powerful GPUs. - Experimentation on train/test splits.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for akashmohan/finetuned-flan-t5-large
Base model
google/flan-t5-large