--- license: mit language: - en pipeline_tag: text-to-speech tags: - vibevoice - core-ai - apple-silicon - on-device - m4-max - text-to-speech base_model: - vibevoice/VibeVoice-7B --- # VibeVoice 7B — multi-speaker TTS (Core AI) High-quality multi-speaker TTS, 7B Qwen2 backbone. INT8/4 LM. **Source & export pipeline:** [github.com/gafiatulin/vibevoice-coreai](https://github.com/gafiatulin/vibevoice-coreai) **On-device performance (M4 Max, Core AI):** 2.37× RTF. > ⚠️ **Beta artifacts.** These `.aimodel` bundles are compiled for macOS 27 / Xcode 27 beta (Core AI). They may need re-export on the GA toolchain. The original weights are Microsoft VibeVoice (see upstream for the model license). ## Layout ``` vibevoice-7b-coreai/ manifest.json # role → {variant: path} + recommended flags embed_tokens.f16 # host-side embed table tokenizer/ # tokenizer files lm/lm-embeds-int4.aimodel/ lm/lm-embeds.aimodel/ diffusion/fused-sampler-s10.aimodel/ diffusion/fused-sampler.aimodel/ codec/acoustic-decoder.aimodel/ codec/acoustic-encoder.aimodel/ codec/semantic-encoder.aimodel/ connector/acoustic-connector.aimodel/ connector/semantic-connector.aimodel/ ``` ## Roles Resolve assets by role via `manifest.json` (`default` = recommended variant): ```json { "lm": { "default": "lm/lm-embeds.aimodel", "int4": "lm/lm-embeds-int4.aimodel" }, "diffusion": { "default": "diffusion/fused-sampler.aimodel", "s10": "diffusion/fused-sampler-s10.aimodel" }, "acoustic_encoder": { "default": "codec/acoustic-encoder.aimodel" }, "acoustic_decoder": { "default": "codec/acoustic-decoder.aimodel" }, "semantic_encoder": { "default": "codec/semantic-encoder.aimodel" }, "acoustic_connector": { "default": "connector/acoustic-connector.aimodel" }, "semantic_connector": { "default": "connector/semantic-connector.aimodel" } } ``` ## Recommended flags ```json { "model": "7b", "fused": true, "steps": 8, "lm": "int8", "auto_chunk": 40, "decode_compute": "gpu", "sem_compute": "gpu" } ```