VibeVoice 1.5B โ€” multi-speaker TTS (Core AI)

Multi-speaker text-to-speech, 1.5B Qwen2 backbone. INT8 LM.

Source & export pipeline: github.com/gafiatulin/vibevoice-coreai

On-device performance (M4 Max, Core AI): 4.99ร— RTF.

โš ๏ธ Beta artifacts. These .aimodel bundles are compiled for macOS 27 / Xcode 27 beta (Core AI). They may need re-export on the GA toolchain. The original weights are Microsoft VibeVoice (see upstream for the model license).

Layout

vibevoice-1.5b-coreai/
  manifest.json        # role โ†’ {variant: path} + recommended flags
  embed_tokens.f16     # host-side embed table
  tokenizer/           # tokenizer files
  lm/lm-embeds.aimodel/
  diffusion/diffusion-head.aimodel/
  diffusion/fused-sampler.aimodel/
  codec/acoustic-decoder.aimodel/
  codec/acoustic-encoder.aimodel/
  codec/semantic-encoder.aimodel/
  connector/acoustic-connector.aimodel/
  connector/semantic-connector.aimodel/

Roles

Resolve assets by role via manifest.json (default = recommended variant):

{
  "lm": {
    "default": "lm/lm-embeds.aimodel"
  },
  "diffusion": {
    "default": "diffusion/fused-sampler.aimodel",
    "per_step": "diffusion/diffusion-head.aimodel"
  },
  "acoustic_encoder": {
    "default": "codec/acoustic-encoder.aimodel"
  },
  "acoustic_decoder": {
    "default": "codec/acoustic-decoder.aimodel"
  },
  "semantic_encoder": {
    "default": "codec/semantic-encoder.aimodel"
  },
  "acoustic_connector": {
    "default": "connector/acoustic-connector.aimodel"
  },
  "semantic_connector": {
    "default": "connector/semantic-connector.aimodel"
  }
}

Recommended flags

{
  "fused": true,
  "steps": 10,
  "lm": "int8",
  "cfg": 3.0,
  "decode_compute": "gpu",
  "sem_compute": "gpu"
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gafiatulin/vibevoice-1.5b-coreai

Finetuned
(15)
this model

Collection including gafiatulin/vibevoice-1.5b-coreai