Instructions to use jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli") model = AutoModelForCTC.from_pretrained("jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This checkpoint is a wav2vec2-large model that is useful for generating transcriptions with punctuation. It is intended for use in building transcriptions for TTS models, where punctuation is very important for prosody.
This model was created by fine-tuning the facebook/wav2vec2-large-robust-ft-libri-960h checkpoint on the libritts and voxpopuli datasets with a new vocabulary that includes punctuation.
The model gets a respectable WER of 4.45% on the librispeech validation set. The baseline, facebook/wav2vec2-large-robust-ft-libri-960h, got 4.3%.
Since the model was fine-tuned on clean audio, it is not well-suited for noisy audio like CommonVoice (though I may upload a checkpoint for that soon too). It still does pretty good, though.
The vocabulary is uploaded to the model hub as well jbetker/tacotron_symbols.
Check out my speech transcription script repo, ocotillo for usage examples: https://github.com/neonbjb/ocotillo
- Downloads last month
- 132,260