VoxCPM β€” AfriSpeech Multilingual TTS (50 African Languages)

Full fine-tune of openbmb/VoxCPM-0.5B on all 50 language subsets of AfriSpeech/africa-speech, merged into a single training manifest (mono 16 kHz WAV).

Try it live: AfriSpeech/VoxCPM-AfriSpeech

Supported languages

Afar, Akan (Twi), Amharic, Baoule, Bemba, Burkina Faso Fulfulde, Dan, Ewe, Fon, Fulani, Ganda (Luganda), Hausa, Igbo, Jola-Kasa, Kalanga, Kalenjin, Kikuyu, Lingala, Lozi, Luba-Lulua, Makhuwa-Shirima, Malgache, Mankanya, Mbunda, Mende, Mossi, Ngambay, Northeastern Dinka, Nyanja, Oromo (Borana-Arsi-Guji), Pular, Punu, Rundi (Kirundi), Rwandan (Kinyarwanda), Sango, Shilluk, Shona, Somali, Sukuma, Swahili, Tarifit, Tashelhayt, Tigrinya, Tiv, Tumbuka, West Central Oromo, Western Niger Fulfulde, Wolof (Senegal), Yaka, Yoruba.

Training details

Base model openbmb/VoxCPM-0.5B (full fine-tune, no LoRA)
Data all 50 configs of AfriSpeech/africa-speech, mono 16 kHz WAV
Epochs 2 (12,364 optimizer steps, effective batch 16)
LR / warmup 1e-5 / 200 steps
Hardware 1Γ— A100-80GB (Modal)

Weights are stored as pytorch_model.bin wrapped in {"state_dict": ...} β€” the format VoxCPM.from_pretrained() expects. Optimizer/scheduler states are not included.

Usage

from voxcpm import VoxCPM
import soundfile as sf

model = VoxCPM.from_pretrained("AfriSpeech/voxcpm-afrispeech-full-inference-20260606")

wav = model.generate(
    text="Karibu sana! Tunafurahi kukuona hapa leo.",
    inference_timesteps=10,
    cfg_value=2.0,
)
sf.write("out.wav", wav, 16000)

For voice cloning, pass prompt_wav_path and prompt_text (a 3–10 s reference clip and its transcript) to model.generate(...).

Related

Built by AfriSpeech Γ— GhanaNLP.

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AfriSpeech/voxcpm-afrispeech-full-inference-20260606

Finetuned
(6)
this model

Dataset used to train AfriSpeech/voxcpm-afrispeech-full-inference-20260606