Automatic Speech Recognition
Transformers
Safetensors
VibeVoice
ASR
Transcriptoin
Diarization
Speech-to-Text
Instructions to use microsoft/VibeVoice-ASR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/VibeVoice-ASR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="microsoft/VibeVoice-ASR")# Load model directly from transformers import VibeVoiceForASRTraining model = VibeVoiceForASRTraining.from_pretrained("microsoft/VibeVoice-ASR", dtype="auto") - Notebooks
- Google Colab
- Kaggle
VibeVoice ASR is part of Transformers from v5.3.0
#20
by bezzam - opened
Here is the checkpoint compatible with Transformers π€ https://huggingface.co/microsoft/VibeVoice-ASR-HF
TODO: updating the LoRA Fine-tuning tutorial as the state dict has changed. You can also see the mapping from original to Transformers here.
I tried it with a 90 seconds sample file, but all it did was immediately OOM on me on a 24 GB 4090. The Gradio demo could handle a 30 minutes file without issues.
@andypotato thanks for trying it out! Could you try adjusting the acoustic tokenizer chunk size as described here? Maybe the Gradio demo used a different value