How to use from the
Use from the
VibeVoice library
import torch, soundfile as sf, librosa, numpy as np
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor
from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference

# Load voice sample (should be 24kHz mono)
voice, sr = sf.read("path/to/voice_sample.wav")
if voice.ndim > 1: voice = voice.mean(axis=1)
if sr != 24000: voice = librosa.resample(voice, sr, 24000)

processor = VibeVoiceProcessor.from_pretrained("gafiatulin/vibevoice-1.5b-coreai")
model = VibeVoiceForConditionalGenerationInference.from_pretrained(
    "gafiatulin/vibevoice-1.5b-coreai", torch_dtype=torch.bfloat16
).to("cuda").eval()
model.set_ddpm_inference_steps(5)

inputs = processor(text=["Speaker 0: Hello!\nSpeaker 1: Hi there!"],
                   voice_samples=[[voice]], return_tensors="pt")
audio = model.generate(**inputs, cfg_scale=1.3,
                       tokenizer=processor.tokenizer).speech_outputs[0]
sf.write("output.wav", audio.cpu().numpy().squeeze(), 24000)

VibeVoice 1.5B โ€” multi-speaker TTS (Core AI)

Multi-speaker text-to-speech, 1.5B Qwen2 backbone. INT8 LM.

Source & export pipeline: github.com/gafiatulin/vibevoice-coreai

On-device performance (M4 Max, Core AI): 4.99ร— RTF.

โš ๏ธ Beta artifacts. These .aimodel bundles are compiled for macOS 27 / Xcode 27 beta (Core AI). They may need re-export on the GA toolchain. The original weights are Microsoft VibeVoice (see upstream for the model license).

Layout

vibevoice-1.5b-coreai/
  manifest.json        # role โ†’ {variant: path} + recommended flags
  embed_tokens.f16     # host-side embed table
  tokenizer/           # tokenizer files
  lm/lm-embeds.aimodel/
  diffusion/diffusion-head.aimodel/
  diffusion/fused-sampler.aimodel/
  codec/acoustic-decoder.aimodel/
  codec/acoustic-encoder.aimodel/
  codec/semantic-encoder.aimodel/
  connector/acoustic-connector.aimodel/
  connector/semantic-connector.aimodel/

Roles

Resolve assets by role via manifest.json (default = recommended variant):

{
  "lm": {
    "default": "lm/lm-embeds.aimodel"
  },
  "diffusion": {
    "default": "diffusion/fused-sampler.aimodel",
    "per_step": "diffusion/diffusion-head.aimodel"
  },
  "acoustic_encoder": {
    "default": "codec/acoustic-encoder.aimodel"
  },
  "acoustic_decoder": {
    "default": "codec/acoustic-decoder.aimodel"
  },
  "semantic_encoder": {
    "default": "codec/semantic-encoder.aimodel"
  },
  "acoustic_connector": {
    "default": "connector/acoustic-connector.aimodel"
  },
  "semantic_connector": {
    "default": "connector/semantic-connector.aimodel"
  }
}

Recommended flags

{
  "fused": true,
  "steps": 10,
  "lm": "int8",
  "cfg": 3.0,
  "decode_compute": "gpu",
  "sem_compute": "gpu"
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gafiatulin/vibevoice-1.5b-coreai

Finetuned
(15)
this model

Collection including gafiatulin/vibevoice-1.5b-coreai