---
language: fa
tags:
- tts
- piper
- persian
- fa-ir
- manta-tts
- neural-tts
- single-speaker
license: apache-2.0
pretty_name: Persian TTS — Piper EN base → ManaTTS (v1)
datasets:
- kiarashQ/farsi-asr-unified-cleaned
base_model:
- rhasspy/piper-voices
---

# 🇮🇷 Persian TTS — Piper EN Base → ManaTTS (v1)

**Model name:** `fa-ir-tts-piper-en-mantatts-v1`  
**Previous name:** `kiarashQ/fa_IR-mantatts`  
**Sampling rate:** 22,050 Hz  
**Base checkpoint:**  
`ar/ar_JO/kareem/medium/epoch=5079-step=1682020.ckpt` (Piper AR, medium)

This is a Persian (fa-IR) single-speaker TTS model fine-tuned from the **Arabic Piper medium checkpoint** on the **ManaTTS** dataset.

---

## ⭐ Highlights
- ✔️ Arabic phoneme system provides **better accuracy for certain Persian words**
- ✔️ Produces stable, smooth speech
- ✔️ Complements the EN-based model — each excels at different phonemes
- ✔️ Output at **22.05 kHz**

---

## 🧪 Training Details

**Training script:** `piper_train`  
**Hardware:** 1× GPU A4000  
**Dataset:** ManaTTS  
**Batch size:** 16  
**Precision:** 32-bit  
**Validation split:** 1%  
**Test samples:** 5  
**Training epochs:** 20  
**Logging:** every 2000 steps  
**Quality setting:** `medium`  
**Checkpoint frequency:** every 1 epoch  
**No resume checkpoint** (fresh fine-tune)

Training command:

```bash
piper_train \
  --dataset-dir /workspace/piper_full/piper_dataset \
  --accelerator gpu --devices 1 \
  --batch-size 16 \
  --validation-split 0.01 \
  --num-test-examples 5 \
  --quality medium \
  --checkpoint-epochs 1 \
  --max_epochs 20 \
  --precision 32 \
  --log_every_n_steps 2000
```

## 🔊 Inference Example
```bash
piper \
  --model model.onnx \
  --config config.json \
  --text "سلام! حال شما چطور است؟" \
  --output_file out.wav
```

Python:
```python
import subprocess

text = "سلام! امروز هوا چطور است؟"
subprocess.run([
    "piper", "--model", "model.onnx", "--config", "config.json",
    "--text", text, "--output_file", "out.wav"
])
```

## 🔍 Observations

- The Arabic-base version sometimes pronounces Persian words more correctly than the EN-base model.
- Slightly lower overall accent naturalness compared to EN-base.
- Useful as a complementary voice to EN-base.

## 📜 License
Apache-2.0

## 🙏 Credits
- Piper TTS
- ManaTTS dataset
- Model fine-tuning by @kiarashQ