File size: 4,036 Bytes
d24f326
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca58e2c
d24f326
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83e7fd6
d24f326
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca58e2c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
license: mit
language:
- ar
base_model:
- ResembleAI/chatterbox
pipeline_tag: text-to-speech
tags:
- Saudi
- Arabic
- Saudi-Dialect
- Chatterbox
- TTS
- voice-cloning
- multilingual-tts
library_name: chatterbox
---

![NAMAA Saudi TTS Banner](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/2d4VIgVYji-CS2w8n_3tS.png)

# 🇸🇦 NAMAA-Saudi-TTS

**NAMAA-Saudi-TTS** is a Saudi Arabic Text-to-Speech (TTS) model built on top of the **Chatterbox Multilingual TTS** architecture.  
The model is configured and refined to generate **natural Saudi dialect speech**, targeting everyday conversational usage rather than Modern Standard Arabic (MSA).

This model is developed and released by **NAMAA Community (Network for Advancing Modern Arabic AI)** as part of its efforts to advance high-quality Arabic speech and language technologies.

---

## 🔊 Live Demo (Hugging Face Space)

👉 **Try the model here:**  
https://huggingface.co/spaces/omarelshehy/NAMAA-Saudi-Voice

---

## ✨ Model Capabilities

The model supports:

- **Saudi Arabic text input** (`language_id = "ar"`)
- Natural conversational prosody
- Saudi dialect phrasing and rhythm
- Optional **reference audio prompting** for:
  - Speaker similarity
  - Style and tone transfer
- GPU-accelerated inference

This repository contains all required **model checkpoints and assets** for local or hosted inference.

---

## 🗣️ Example Text (Saudi Dialect)

```text
آبي أروح البقالة أشتري كم غرض وأرجع بسرعة.
```

## ⚠️ Limitations

Please be aware of the following current limitations:

- Lack of tashkeel may affect pronunciation accuracy.
- Numeric normalization will be improved in future releases.
- This is a known limitation of the current flow-based generation.


These limitations are actively being addressed in upcoming versions.

## 🧪 Example Usage (Inference)

```python
import numpy as np
import torchaudio as ta
from huggingface_hub import snapshot_download
from safetensors.torch import load_file as load_safetensors
from chatterbox import mtl_tts

device = "cuda"  # or "cpu" / "mps"

ckpt_dir = snapshot_download(
    repo_id="NAMAA-Space/NAMAA-Saudi-TTS",
    repo_type="model",
    revision="main"
)

# Load model
model = mtl_tts.ChatterboxMultilingualTTS.from_pretrained(device=device)

t3_state = load_safetensors(
    f"{ckpt_dir}/t3_mtl23ls_v2.safetensors",
    device=device
)
model.t3.load_state_dict(t3_state)
model.t3.to(device).eval()

# Saudi Arabic text
text = "أنا الحين بروح الشغل وإذا رجعت بمرّ البقالة"

wav = model.generate(text, language_id="ar")
ta.save("namma_saudi.wav", wav, model.sr)
```

### 🔹 Inference with Reference Audio (Voice / Style Transfer)

```python
text = "آبي أخلص الشغل اليوم وأرتاح بكرة"

wav = model.generate(
    text,
    language_id="ar",
    audio_prompt_path="/content/reference_saudi.wav"
)

ta.save("namma_saudi_ref.wav", wav, model.sr)
```

## 🧠 Base Model

This model is built on top of:

- **ResembleAI/chatterbox**
- **Chatterbox Multilingual TTS architecture**

The Saudi dialect behavior is achieved through **specialized configuration, prompting, and curated usage patterns**, rather than training focused on Modern Standard Arabic (MSA).

---

## 📜 License

This model is released under the **MIT License**, allowing both **research and commercial usage** with proper attribution.

---

## 🤝 Community & Contributions

Developed and maintained by **NAMAA Community**  
*(Network for Advancing Modern Arabic NLP & AI)*

We welcome:

- Feedback and evaluations  
- Dialect-specific test cases  
- Contributions toward improving Arabic Text-to-Speech systems  

---

## 📌 Citation

If you use this model in research or production, please cite:

```bibtex
@misc{namaa_saudi_tts,
  title = {NAMAA-Saudi-TTS: Saudi Dialect Text-to-Speech},
  author = {{NAMAA Community}},
  year = {2026},
  url = {https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS}
}