Upload README.md with huggingface_hub

5f67a26 verified 2 months ago

3.82 kB

language:
  - de
license: apache-2.0
tags:
  - text-to-speech
  - german
  - kokoro
  - styletts2
  - multi-speaker
pipeline_tag: text-to-speech

Kikiri German Base — 51 Speakers Synthetic

A German multi-speaker TTS base model trained with StyleTTS2 on a synthetic dataset of 51 German speakers.

This is a Stage 1 base model — it provides the acoustic foundation for speaker-adapted fine-tuning (Stage 2). It is compatible with the Kokoro inference architecture.

Model Details

Property	Value
Architecture	StyleTTS2 Stage 1 (Kokoro-compatible)
Language	German (de)
Speakers	51 synthetic voices
Training data	~30,800 samples (synthetic, TTS-generated)
Training epochs	4
Validation Mel Loss	0.286
Sample rate	24 kHz
G2P	misaki 0.9.4 + espeak-ng 1.50

Audio Samples

Generated with the Victoria voice (Stage 2 fine-tune, voices/victoria.pt).

"Schön, dass du da bist. Die Bücher liegen auf dem großen Tisch."

"Ich mache mich auf den Weg nach Aachen, um auch nachts wach zu sein."

"Er aß die Maße in der Straße, aber das Maß war voll."

"Zwei weiße Zwerge zwängen sich zwischen zwei Zweige."

"Ein Pfau pflegt seine Federn an der Pfütze."

"Warum hast du das getan? Das ist ja unglaublich!"

"Das kostet genau einhundertdreiundzwanzig Millionen Euro."

Files

File	Description
`kikiri_german_base_51spk_ep4.pth`	Model weights (Kokoro-compatible format)
`voices/victoria.pt`	Victoria speaker voicepack (512-dim style embedding)
`audio/test_*.wav`	German phonetic test sentences

Usage

# Uses the kokoro library as underlying framework
from kokoro import KPipeline

pipeline = KPipeline(lang_code="de", model_path="kikiri_german_base_51spk_ep4.pth")
voicepack = pipeline.load_voice("voices/victoria.pt")

text = "Guten Tag, wie geht es Ihnen?"
audio = pipeline(text, voice=voicepack)

Training

Stage 1 trains the acoustic model on mel spectrogram reconstruction across all 51 speakers
Stage 2 fine-tunes a single speaker using WavLM adversarial training (bf16)
Data pipeline: text → misaki G2P (de) → Kokoro 178-token IPA vocabulary
All training data phoneme-validated: no ?? artifacts, no OOV symbols

Limitations

Trained entirely on synthetic (TTS-generated) audio — real human recordings may improve naturalness
Stage 1 alone requires Stage 2 fine-tuning for production-quality single-speaker output
German number/date normalization is handled by the caller (not built-in)

License

Apache 2.0