Instructions to use formospeech/yourtts-htia-240704 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use formospeech/yourtts-htia-240704 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="formospeech/yourtts-htia-240704")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("formospeech/yourtts-htia-240704", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:Invalid JSON for config file config.json
Model Card for yourtts-htia-240704
yourtts-htia-240704 is an experimental Taiwanese Hakka text-to-speech (TTS) model based on YourTTS.
The model is designed for synthesizing Taiwanese Hakka speech and is part of the VoxHakka project. For more details, audio samples, and system information, please refer to the project page.
This checkpoint was trained on multi-speaker speech data covering the following Taiwanese Hakka dialects:
- Sixian
- Hailu
- Dapu
- Raoping
- Zhaoan
Model Details
- Architecture: YourTTS
- Task: Text-to-speech
- Language: Taiwanese Hakka (
hak) - Supported dialects: Sixian, Hailu, Dapu, Raoping, and Zhaoan
- Sample rate: 22,050 Hz
- Training data: Multi-speaker Taiwanese Hakka speech data from more than 19 speakers
- Speaker conditioning: Speaker encoder
- Language conditioning: Language embedding
Intended Use
This model is intended for:
- Taiwanese Hakka speech synthesis research
- Taiwanese Hakka language technology development
- Educational and non-commercial demonstrations
- Experiments on multi-speaker and dialect-aware text-to-speech
This model is not intended for commercial use under the CC BY-NC 4.0 license.
Usage
Local Demo
The recommended way to run this model locally is to use the official Space implementation, since it includes the Taiwanese Hakka G2P frontend and the required YourTTS configuration patch.
git clone https://huggingface.co/spaces/united-link/taiwanese-hakka-tts
cd taiwanese-hakka-tts
pip install -r requirements.txt
python app.py
Programmatic Inference
The following example is adapted from the Space implementation. It assumes that you run the script inside the cloned Space repository so that replace/tts.py and the required dependencies are available.
import os
import re
import numpy as np
import torch
import TTS
from formog2p.hakka import g2p
from huggingface_hub import snapshot_download
from scipy.io.wavfile import write as write_wav
from TTS.utils.synthesizer import Synthesizer
from replace.tts import ChangedVitsConfig
TTS.tts.configs.vits_config.VitsConfig = ChangedVitsConfig
MODEL_ID = "formospeech/yourtts-htia-240704"
# This example uses Sixian Taiwanese Hakka.
DIALECT = "sixian"
G2P_DIALECT = "hak_sx"
# Example default speaker.
SPEAKER_NAME = "XF"
def parse_ipa(ipa: str, delete_chars=r"\+\-\|\_", as_space: str = "") -> list[str]:
text = []
ipa_list = re.split(r"(?<![\d])(?=[\d])|(?<=[\d])(?![\d])", ipa)
for word in ipa_list:
if word.isdigit():
text.append(word)
else:
if len(as_space) > 0:
word = re.sub(r"[{}]".format(as_space), " ", word)
if len(delete_chars) > 0:
word = re.sub(r"[{}]".format(delete_chars), "", word)
word = word.replace(",", " , ")
text.extend(word)
return text
def load_model(model_id: str = MODEL_ID) -> Synthesizer:
model_dir = snapshot_download(model_id)
config_file_path = os.path.join(model_dir, "config.json")
model_ckpt_path = os.path.join(model_dir, "model.pth")
speaker_file_path = os.path.join(model_dir, "speakers.pth")
language_file_path = os.path.join(model_dir, "language_ids.json")
speaker_embedding_file_path = os.path.join(model_dir, "speaker_embs.pth")
temp_config_path = "temp_config.json"
with open(config_file_path, "r", encoding="utf-8") as f:
content = f.read()
content = content.replace("speakers.pth", speaker_file_path)
content = content.replace("language_ids.json", language_file_path)
content = content.replace("speaker_embs.pth", speaker_embedding_file_path)
with open(temp_config_path, "w", encoding="utf-8") as f:
f.write(content)
return Synthesizer(
tts_checkpoint=model_ckpt_path,
tts_config_path=temp_config_path,
use_cuda=torch.cuda.is_available(),
)
def synthesize(
text: str,
output_path: str = "output.wav",
speed: float = 1.0,
):
model = load_model()
result = g2p(text, G2P_DIALECT, include_eng=True)
if len(result.unknown_words) > 0:
raise ValueError(
f"The following words could not be converted to IPA: "
f"{', '.join(result.unknown_words)}"
)
parsed_ipa = [p.replace(" ", "|") for p in result.pronunciations]
parsed_ipa = parse_ipa(" ".join(parsed_ipa))
# Larger values produce slower speech.
model.tts_model.length_scale = speed
wav = model.tts(
parsed_ipa,
speaker_name=SPEAKER_NAME,
language_name=DIALECT,
split_sentences=False,
)
sample_rate = model.tts_model.config.audio.sample_rate
wav = np.asarray(wav, dtype=np.float32)
write_wav(output_path, sample_rate, wav)
return output_path
if __name__ == "__main__":
synthesize(
text="食飯愛正經食,正毋會食到半出半入",
output_path="output.wav",
speed=1.0,
)
Input Format
The model is intended for Taiwanese Hakka text. The official Space uses formog2p.hakka.g2p to convert Taiwanese Hakka text into the phonetic representation expected by the model.
If some input words cannot be converted by the G2P frontend, inference may fail. In that case, try rewriting the sentence with supported Taiwanese Hakka words or orthography.
Limitations
- This is an experimental Taiwanese Hakka TTS model.
- Output quality may vary by dialect, speaker, sentence style, and G2P coverage.
- The model is expected to work best on Taiwanese Hakka text similar to the training data.
- The model is not designed for Mandarin Chinese, general Chinese TTS, or non-Hakka languages.
- As with other voice synthesis systems, users should avoid misleading, deceptive, or unauthorized voice impersonation use cases.
License
This model is released under the CC BY-NC 4.0 license.
By downloading or using the public release of this model, you agree to comply with the terms and conditions of the CC BY-NC 4.0 license.
Commercial use is not permitted under this license.
Citation
If you use this model, please cite the following paper:
@article{chen2024voxhakka,
title={VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka},
author={Chen, Li-Wei and Lee, Hung-Shin and Chang, Chen-Chi},
journal={arXiv preprint arXiv:2409.01548},
year={2024}
}
- Downloads last month
- 33