Instructions to use classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER")# Load model directly from transformers import AutoProcessor, Wav2Vec2ForSpeechClassification processor = AutoProcessor.from_pretrained("classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER") model = Wav2Vec2ForSpeechClassification.from_pretrained("classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoProcessor, Wav2Vec2ForSpeechClassification
processor = AutoProcessor.from_pretrained("classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER")
model = Wav2Vec2ForSpeechClassification.from_pretrained("classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER")Quick Links
classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER
This model for Croatian SER (speech emotion recognition) is based on the facebook/wav2vec2-large-slavic-voxpopuli-v2 and was fine-tuned on the CrES 2.1 dataset (Croatian Emotional Speech corpus).
If you use this model, please cite the following paper describing the dataset:
@inproceedings{Dropuljić_Chmura_Kolak_Petrinović_2011, title={Emotional speech corpus of Croatian language}, ISSN={1845-5921}, booktitle={2011 7th International Symposium on Image and Signal Processing and Analysis (ISPA)}, author={Dropuljić, Branimir and Chmura, Miłosz Thomasz and Kolak, Antonio and Petrinović, Davor}, year={2011}, month={Sep}, pages={95–100} }
Metrics
Evaluation is performed on the dev and test portions of the CrES 2.1 dataset. The splitting was performed anew, stratified on emotion and with no leakage (i.e. no speaker is present in more than one split).
| accuracy | macro F1 | split |
|---|---|---|
| 0.6796 | 0.6461 | test |
| 0.7277 | 0.7232 | dev |
Confusion matrix on test:
Training hyperparameters
In fine-tuning, the following arguments were used:
| arg | value |
|---|---|
per_device_train_batch_size |
2 |
per_device_eval_batch_size |
2 |
gradient_accumulation_steps |
2 |
num_train_epochs |
20 |
learning_rate |
1e-4 |
- Downloads last month
- 1

# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER")