Bengali CosyVoice3 TTS

This is a Bengali / Bangla text-to-speech model fine-tuned from CosyVoice3.

Open in Spaces

Links

Model Description

This model was fine-tuned to improve Bengali speech synthesis quality.

  • Base model: FunAudioLLM/Fun-CosyVoice3-0.5B-2512
  • Language: Bengali / Bangla
  • Task: Text-to-Speech
  • Model type: Fine-tuned CosyVoice3 TTS model
  • Output: Synthetic speech audio
  • Recommended mode: Cross-lingual voice clone

Demo

A Hugging Face Space demo is available here:

https://huggingface.co/spaces/kawshikbuet17/bengali-cosyvoice3-tts-demo

The demo uses cross-lingual voice cloning mode. Users provide Bengali text and a short prompt audio as the reference voice.

Use prompt audio only when you have permission to use that speaker's voice.

Download Model

Install Hugging Face Hub CLI if needed:

pip install -U "huggingface_hub>=0.30.0,<1.0.0" hf-xet

Download the model:

mkdir -p pretrained_models/Fun-CosyVoice3-0.5B

hf download kawshikbuet17/bengali-cosyvoice3-tts \
  --repo-type model \
  --local-dir pretrained_models/Fun-CosyVoice3-0.5B

Verify the downloaded files:

ls -lah pretrained_models/Fun-CosyVoice3-0.5B
ls -lah pretrained_models/Fun-CosyVoice3-0.5B/CosyVoice-BlankEN
ls -lah pretrained_models/Fun-CosyVoice3-0.5B/asset

Space Implementation Files

The demo application code, Dockerfile, and runtime files are available in the Space repository:

https://huggingface.co/spaces/kawshikbuet17/bengali-cosyvoice3-tts-demo/tree/main

For source code and related development files, see:

https://github.com/kawshikbuet17/cosyvoice-bengali-tts

Training Data

This model was fine-tuned using an internal/private Bengali speech dataset.

The training data is not included in this repository and is not linked publicly.

Important Disclosure

This model generates synthetic speech.

Generated audio should be treated as synthetic. Do not use this model to impersonate a real person, clone voices without permission, or create misleading audio.

Intended Use

This model is intended for:

  • Bengali text-to-speech research
  • Bengali speech synthesis experiments
  • Low-resource Bengali TTS development
  • Academic and demo use
  • Internal product prototyping, subject to license, consent, and data-rights requirements

Out-of-Scope Use

Do not use this model to:

  • Impersonate a real person without consent
  • Clone voices without permission
  • Generate misleading, fraudulent, or deceptive speech
  • Violate privacy, publicity, copyright, or data rights
  • Generate harmful, abusive, or deceptive speech content

Limitations

  • The model may mispronounce uncommon Bengali words, names, numbers, abbreviations, or English-Bengali mixed text.
  • The model may produce unstable output for very long text.
  • The model may not generalize equally across all Bengali dialects, speaking styles, or domains.
  • Generated speech should be reviewed before production use.
  • Voice-cloning or prompt-audio-based generation should only be used with proper permission from the speaker.

Attribution

This model is fine-tuned from CosyVoice3.

Please also acknowledge the original CosyVoice3 project when using this model.

Prepared By

Kawshik Kumar Paul
Dept of CSE, BUET
kawshikbuet17@gmail.com

Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kawshikbuet17/bengali-cosyvoice3-tts

Quantized
(9)
this model

Space using kawshikbuet17/bengali-cosyvoice3-tts 1