--- language: fa tags: - tts - piper - persian - fa-ir - manta-tts - neural-tts - single-speaker license: apache-2.0 pretty_name: Persian TTS โ€” Piper EN base โ†’ ManaTTS (v1) datasets: - kiarashQ/farsi-asr-unified-cleaned base_model: - rhasspy/piper-voices --- # ๐Ÿ‡ฎ๐Ÿ‡ท Persian TTS โ€” Piper EN Base โ†’ ManaTTS (v1) **Model name:** `fa-ir-tts-piper-en-mantatts-v1` **Previous name:** `kiarashQ/fa_IR-mantatts` **Sampling rate:** 22,050 Hz **Base checkpoint:** `ar/ar_JO/kareem/medium/epoch=5079-step=1682020.ckpt` (Piper AR, medium) This is a Persian (fa-IR) single-speaker TTS model fine-tuned from the **Arabic Piper medium checkpoint** on the **ManaTTS** dataset. --- ## โญ Highlights - โœ”๏ธ Arabic phoneme system provides **better accuracy for certain Persian words** - โœ”๏ธ Produces stable, smooth speech - โœ”๏ธ Complements the EN-based model โ€” each excels at different phonemes - โœ”๏ธ Output at **22.05 kHz** --- ## ๐Ÿงช Training Details **Training script:** `piper_train` **Hardware:** 1ร— GPU A4000 **Dataset:** ManaTTS **Batch size:** 16 **Precision:** 32-bit **Validation split:** 1% **Test samples:** 5 **Training epochs:** 20 **Logging:** every 2000 steps **Quality setting:** `medium` **Checkpoint frequency:** every 1 epoch **No resume checkpoint** (fresh fine-tune) Training command: ```bash piper_train \ --dataset-dir /workspace/piper_full/piper_dataset \ --accelerator gpu --devices 1 \ --batch-size 16 \ --validation-split 0.01 \ --num-test-examples 5 \ --quality medium \ --checkpoint-epochs 1 \ --max_epochs 20 \ --precision 32 \ --log_every_n_steps 2000 ``` ## ๐Ÿ”Š Inference Example ```bash piper \ --model model.onnx \ --config config.json \ --text "ุณู„ุงู…! ุญุงู„ ุดู…ุง ฺ†ุทูˆุฑ ุงุณุชุŸ" \ --output_file out.wav ``` Python: ```python import subprocess text = "ุณู„ุงู…! ุงู…ุฑูˆุฒ ู‡ูˆุง ฺ†ุทูˆุฑ ุงุณุชุŸ" subprocess.run([ "piper", "--model", "model.onnx", "--config", "config.json", "--text", text, "--output_file", "out.wav" ]) ``` ## ๐Ÿ” Observations - The Arabic-base version sometimes pronounces Persian words more correctly than the EN-base model. - Slightly lower overall accent naturalness compared to EN-base. - Useful as a complementary voice to EN-base. ## ๐Ÿ“œ License Apache-2.0 ## ๐Ÿ™ Credits - Piper TTS - ManaTTS dataset - Model fine-tuning by @kiarashQ