Instructions to use Thomcles/Chatterbox-TTS-French with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use Thomcles/Chatterbox-TTS-French with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Notebooks
- Google Colab
- Kaggle
| license: cc-by-4.0 | |
| datasets: | |
| - amphion/Emilia-Dataset | |
| language: | |
| - fr | |
| base_model: | |
| - ResembleAI/chatterbox | |
| pipeline_tag: text-to-speech | |
| tags: | |
| - french | |
| - audio | |
| - speech | |
| - tts | |
| - fine-tuning | |
| - chatterbox | |
| - Emilia | |
| - voice-cloning | |
| - zero-shot | |
| # Chatterbox TTS French 🥖 | |
| **Chatterbox TTS French** is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis. | |
| <div align="center"><img width="400px" src="https://ih1.redbubble.net/image.5397735048.6235/bg,f8f8f8-flat,750x,075,f-pad,750x1000,f8f8f8.jpg" alt="baguette-france-tour-eiffel-image" /></div> | |
| - 🔊 **Language**: French 🇫🇷 | |
| - 🗣️ **Training dataset**: [Emilia Dataset (FR branch)](https://huggingface.co/datasets/amphion/Emilia-Dataset) | |
| - ⏱️ **Data quantity**: 1400 hours of audio | |
| ## Usage Example | |
| Here’s how to generate speech using Chatterbox-TTS French: | |
| ```python | |
| import torch | |
| import soundfile as sf | |
| from chatterbox.tts import ChatterboxTTS | |
| from huggingface_hub import hf_hub_download | |
| from safetensors.torch import load_file | |
| # Configuration | |
| MODEL_REPO = "Thomcles/Chatterbox-TTS-French" | |
| CHECKPOINT_FILENAME = "t3_cfg.safetensors" | |
| OUTPUT_PATH = "output_cloned_voice.wav" | |
| TEXT_TO_SYNTHESIZE = "Jean-Paul Sartre laisse à la postérité une œuvre considérable, tant littéraire que philosophique, ayant influencée à la fois la vie politique française d'après-guerre et les penseurs de son temps (Merleau-Ponty et Alain Badiou notamment)." | |
| def get_device() -> str: | |
| return "cuda" if torch.cuda.is_available() else "cpu" | |
| def download_checkpoint(repo: str, filename: str) -> str: | |
| return hf_hub_download(repo_id=repo, filename=filename) | |
| def load_tts_model(repo: str, checkpoint_file: str, device: str) -> ChatterboxTTS: | |
| model = ChatterboxTTS.from_pretrained(device=device) | |
| checkpoint_path = download_checkpoint(repo, checkpoint_file) | |
| t3_state = load_file(checkpoint_path, device="cpu") | |
| model.t3.load_state_dict(t3_state) | |
| return model | |
| def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor: | |
| with torch.inference_mode(): | |
| return model.generate( | |
| text=text, | |
| audio_prompt_path=audio_prompt_path, | |
| **kwargs | |
| ) | |
| def save_audio(waveform: torch.Tensor, path: str, sample_rate: int): | |
| sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate) | |
| def main(): | |
| print("Loading model...") | |
| device = get_device() | |
| model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device) | |
| print(f"Generating speech on {device}...") | |
| wav = synthesize_speech( | |
| model, | |
| TEXT_TO_SYNTHESIZE, | |
| audio_prompt_path=None, | |
| exaggeration=0.5, | |
| temperature=0.6, | |
| cfg_weight=0.3 | |
| ) | |
| print(f"Saving output to: {OUTPUT_PATH}") | |
| save_audio(wav, OUTPUT_PATH, model.sr) | |
| print("Done.") | |
| if __name__ == "__main__": | |
| main() | |
| ``` | |
| Here is the output: | |
| <audio controls src="https://huggingface.co/Thomcles/Chatterbox-TTS-French/resolve/main/example.mp3">Your browser does not support audio.</audio> | |
| ### Base model license | |
| The base model is licensed under the MIT License. | |
| Base model: [Chatterbox](https://huggingface.co/ResembleAI/chatterbox) | |
| License: [MIT](https://choosealicense.com/licenses/mit/) | |
| ### Training Data License | |
| This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0). | |
| Dataset: [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset) | |
| License: [Creative Commons Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/) | |
| ### Contact me | |
| Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Don’t hesitate to reach out. | |