VibeVoice ASR is part of Transformers from v5.3.0

#20

by bezzam - opened Mar 5

Mar 5

Here is the checkpoint compatible with Transformers 🤗 https://huggingface.co/microsoft/VibeVoice-ASR-HF

TODO: updating the LoRA Fine-tuning tutorial as the state dict has changed. You can also see the mapping from original to Transformers here.

andypotato

Mar 8

I tried it with a 90 seconds sample file, but all it did was immediately OOM on me on a 24 GB 4090. The Gradio demo could handle a 30 minutes file without issues.

bezzam

Mar 9

@andypotato thanks for trying it out! Could you try adjusting the acoustic tokenizer chunk size as described here? Maybe the Gradio demo used a different value

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment