Instructions to use LaurenGurgiolo/VIT_finetuned_9emotions with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LaurenGurgiolo/VIT_finetuned_9emotions with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="LaurenGurgiolo/VIT_finetuned_9emotions") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoImageProcessor, AutoModelForImageClassification processor = AutoImageProcessor.from_pretrained("LaurenGurgiolo/VIT_finetuned_9emotions") model = AutoModelForImageClassification.from_pretrained("LaurenGurgiolo/VIT_finetuned_9emotions") - Notebooks
- Google Colab
- Kaggle
"😐 ViT Facial Expression Recognition (9-Class Baseline Model)
This repository hosts a Vision Transformer (ViT)–based facial expression recognition model trained using an iterative fine-tuning strategy. The model was developed by further training LaurenGurgiolo/vit-micro-facial-expressions, which itself was fine-tuned from mo-thecreator/vit-Facial-Expression-Recognition.
The objective of this model is to classify facial images into nine distinct facial expression categories using robust transformer-based visual representations.
📌 Model Details
Base model: mo-thecreator/vit-Facial-Expression-Recognition
Intermediate model: LaurenGurgiolo/vit-micro-facial-expressions
Architecture: Vision Transformer (ViT)
Task: Facial Expression Classification
Final model type: Iteratively fine-tuned baseline model
📂 Dataset 9_Facial_Expressions Dataset
Source: LaurenGurgiolo/9_Facial_Expressions
Task: Multi-class facial expression classification
Classes: 9 facial expression categories
This dataset was used to further refine the intermediate ViT model through iterative training.
🧠 Training Methodology Iterative Fine-Tuning (Baseline Model)
The LaurenGurgiolo/vit-micro-facial-expressions model was iteratively fine-tuned on the 9_Facial_Expressions dataset, allowing the model to progressively integrate new facial expression patterns.
Training Configuration:
Batch size: 16
Epochs: 10
Learning rate: 2e-5
Warmup steps: 500
Scheduler: Cosine learning rate with restarts (2 cycles)
Weight decay: 0.01
This iterative training procedure achieved a final accuracy of 75%, which is designated as the baseline performance.
Non-Iterative Fine-Tuning (Comparison Model)
For comparison, the pretrained mo-thecreator/vit-Facial-Expression-Recognition model was directly fine-tuned on the 9_Facial_Expressions dataset without iterative training.
Training approach: Single-stage fine-tuning
Final accuracy: 66%
This result is substantially lower than the iterative baseline, highlighting the effectiveness of sequential learning.
📊 Results Summary Training Strategy Accuracy Iterative fine-tuning 75% Non-iterative fine-tuning 66%
Figure: Training and validation performance across 10 epochs, illustrating stable convergence and improved generalization under iterative training.
🧠 Why Iterative Training?
Iterative training is a sequential learning methodology in which a facial recognition model is trained across multiple datasets over time. This approach enables:
Progressive knowledge refinement
Improved generalization to unseen facial variations
Enhanced feature discrimination
By exposing the model to increasingly diverse data distributions, iterative training improves adaptability to novel conditions (Mohan, 2024).
🧬 Architecture Choice
A Vision Transformer (ViT) architecture was selected due to its strong performance in facial recognition tasks. ViTs have demonstrated superior accuracy and generalization compared to convolutional neural networks (CNNs) by leveraging global self-attention mechanisms.
🚀 Usage Example from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image import torch
processor = AutoImageProcessor.from_pretrained("your-username/your-model-name") model = AutoModelForImageClassification.from_pretrained("your-username/your-model-name")
image = Image.open("face.jpg") inputs = processor(images=image, return_tensors="pt")
with torch.no_grad(): outputs = model(**inputs)
predicted_label = outputs.logits.argmax(dim=-1).item() print(predicted_label)
⚠️ Limitations
Performance may be affected by:
Low-resolution images
Occlusions or extreme facial poses
Unbalanced class distributions
Emotion classification remains inherently subjective.
📜 License & Attribution
Base model: mo-thecreator/vit-Facial-Expression-Recognition
Datasets: LaurenGurgiolo/9_Facial_Expressions
Please consult the original model and dataset licenses on Hugging Face before use.
🙌 Acknowledgements
Hugging Face for model hosting and tools
Dataset contributors
Prior research on Vision Transformers and iterative learning strategies"
- Downloads last month
- 1