Update README.md

87cff19 verified 9 months ago

3.85 kB

	---
	license: cc-by-4.0
	datasets:
	- amphion/Emilia-Dataset
	language:
	- fr
	base_model:
	- ResembleAI/chatterbox
	pipeline_tag: text-to-speech
	tags:
	- french
	- audio
	- speech
	- tts
	- fine-tuning
	- chatterbox
	- Emilia
	- voice-cloning
	- zero-shot
	---

	# Chatterbox TTS French 🥖

	Chatterbox TTS French is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis.

	<div align="center"><img width="400px" src="https://ih1.redbubble.net/image.5397735048.6235/bg,f8f8f8-flat,750x,075,f-pad,750x1000,f8f8f8.jpg" alt="baguette-france-tour-eiffel-image" /></div>

	- 🔊 Language: French 🇫🇷
	- 🗣️ Training dataset: [Emilia Dataset (FR branch)](https://huggingface.co/datasets/amphion/Emilia-Dataset)
	- ⏱️ Data quantity: 1400 hours of audio

	## Usage Example

	Here’s how to generate speech using Chatterbox-TTS French:

	```python
	import torch
	import soundfile as sf
	from chatterbox.tts import ChatterboxTTS
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file

	# Configuration
	MODEL_REPO = "Thomcles/Chatterbox-TTS-French"
	CHECKPOINT_FILENAME = "t3_cfg.safetensors"
	OUTPUT_PATH = "output_cloned_voice.wav"
	TEXT_TO_SYNTHESIZE = "Jean-Paul Sartre laisse à la postérité une œuvre considérable, tant littéraire que philosophique, ayant influencée à la fois la vie politique française d'après-guerre et les penseurs de son temps (Merleau-Ponty et Alain Badiou notamment)."

	def get_device() -> str:
	return "cuda" if torch.cuda.is_available() else "cpu"

	def download_checkpoint(repo: str, filename: str) -> str:
	return hf_hub_download(repo_id=repo, filename=filename)

	def load_tts_model(repo: str, checkpoint_file: str, device: str) -> ChatterboxTTS:
	model = ChatterboxTTS.from_pretrained(device=device)
	checkpoint_path = download_checkpoint(repo, checkpoint_file)
	t3_state = load_file(checkpoint_path, device="cpu")
	model.t3.load_state_dict(t3_state)
	return model

	def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor:
	with torch.inference_mode():
	return model.generate(
	text=text,
	audio_prompt_path=audio_prompt_path,
	**kwargs
	)

	def save_audio(waveform: torch.Tensor, path: str, sample_rate: int):
	sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate)

	def main():
	print("Loading model...")
	device = get_device()
	model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device)

	print(f"Generating speech on {device}...")
	wav = synthesize_speech(
	model,
	TEXT_TO_SYNTHESIZE,
	audio_prompt_path=None,
	exaggeration=0.5,
	temperature=0.6,
	cfg_weight=0.3
	)

	print(f"Saving output to: {OUTPUT_PATH}")
	save_audio(wav, OUTPUT_PATH, model.sr)
	print("Done.")

	if __name__ == "__main__":
	main()
	```

	Here is the output:

	<audio controls src="https://huggingface.co/Thomcles/Chatterbox-TTS-French/resolve/main/example.mp3">Your browser does not support audio.</audio>

	### Base model license

	The base model is licensed under the MIT License.
	Base model: [Chatterbox](https://huggingface.co/ResembleAI/chatterbox)
	License: [MIT](https://choosealicense.com/licenses/mit/)

	### Training Data License

	This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
	Dataset: [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset)
	License: [Creative Commons Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/)


	### Contact me

	Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Don’t hesitate to reach out.