Instructions to use AfriSpeech/voxcpm-afrispeech-full-inference-20260606 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- VoxCPM
How to use AfriSpeech/voxcpm-afrispeech-full-inference-20260606 with VoxCPM:
import soundfile as sf from voxcpm import VoxCPM model = VoxCPM.from_pretrained("AfriSpeech/voxcpm-afrispeech-full-inference-20260606") wav = model.generate( text="VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.", prompt_wav_path=None, # optional: path to a prompt speech for voice cloning prompt_text=None, # optional: reference text cfg_value=2.0, # LM guidance on LocDiT, higher for better adherence to the prompt, but maybe worse inference_timesteps=10, # LocDiT inference timesteps, higher for better result, lower for fast speed normalize=True, # enable external TN tool denoise=True, # enable external Denoise tool retry_badcase=True, # enable retrying mode for some bad cases (unstoppable) retry_badcase_max_times=3, # maximum retrying times retry_badcase_ratio_threshold=6.0, # maximum length restriction for bad case detection (simple but effective), it could be adjusted for slow pace speech ) sf.write("output.wav", wav, 16000) print("saved: output.wav") - Notebooks
- Google Colab
- Kaggle
VoxCPM β AfriSpeech Multilingual TTS (50 African Languages)
Full fine-tune of openbmb/VoxCPM-0.5B on all 50 language subsets of AfriSpeech/africa-speech, merged into a single training manifest (mono 16 kHz WAV).
Try it live: AfriSpeech/VoxCPM-AfriSpeech
Supported languages
Afar, Akan (Twi), Amharic, Baoule, Bemba, Burkina Faso Fulfulde, Dan, Ewe, Fon, Fulani, Ganda (Luganda), Hausa, Igbo, Jola-Kasa, Kalanga, Kalenjin, Kikuyu, Lingala, Lozi, Luba-Lulua, Makhuwa-Shirima, Malgache, Mankanya, Mbunda, Mende, Mossi, Ngambay, Northeastern Dinka, Nyanja, Oromo (Borana-Arsi-Guji), Pular, Punu, Rundi (Kirundi), Rwandan (Kinyarwanda), Sango, Shilluk, Shona, Somali, Sukuma, Swahili, Tarifit, Tashelhayt, Tigrinya, Tiv, Tumbuka, West Central Oromo, Western Niger Fulfulde, Wolof (Senegal), Yaka, Yoruba.
Training details
| Base model | openbmb/VoxCPM-0.5B (full fine-tune, no LoRA) |
| Data | all 50 configs of AfriSpeech/africa-speech, mono 16 kHz WAV |
| Epochs | 2 (12,364 optimizer steps, effective batch 16) |
| LR / warmup | 1e-5 / 200 steps |
| Hardware | 1Γ A100-80GB (Modal) |
Weights are stored as pytorch_model.bin wrapped in {"state_dict": ...} β
the format VoxCPM.from_pretrained() expects. Optimizer/scheduler states are
not included.
Usage
from voxcpm import VoxCPM
import soundfile as sf
model = VoxCPM.from_pretrained("AfriSpeech/voxcpm-afrispeech-full-inference-20260606")
wav = model.generate(
text="Karibu sana! Tunafurahi kukuona hapa leo.",
inference_timesteps=10,
cfg_value=2.0,
)
sf.write("out.wav", wav, 16000)
For voice cloning, pass prompt_wav_path and prompt_text (a 3β10 s reference
clip and its transcript) to model.generate(...).
Related
- Demo Space: AfriSpeech/VoxCPM-AfriSpeech
- Source dataset: AfriSpeech/africa-speech
- Base model: openbmb/VoxCPM-0.5B
Built by AfriSpeech Γ GhanaNLP.
- Downloads last month
- 22