---
language:
- adj
tags:
- text-to-speech
- mms-tts
- vits
- adj
- west-africa
- eyaa-tom
- finetuned
license: cc-by-nc-4.0
base_model: facebook/mms-tts-adj
pipeline_tag: text-to-speech
---

# MMS-TTS Adja — Eyaa-Tom Fine-tuned

Fine-tuned version of [facebook/mms-tts-adj](https://huggingface.co/facebook/mms-tts-adj)
on the **Eyaa-Tom** dataset for **Adja** (`adj`).

> Adja/Aja-Gbe. Fine-tuned from facebook/mms-tts-adj (closest MMS checkpoint to ISO ajg).

## Language Details
| Field | Value |
|-------|-------|
| Language | Adja |
| ISO 639-3 (MMS) | `adj` |
| Your ISO | `ajg` |
| Region | Togo/Benin |
| Family | Gbe (Niger-Congo) |
| Base model | [facebook/mms-tts-adj](https://huggingface.co/facebook/mms-tts-adj) |

## Training Statistics
| Metric | Value |
|--------|-------|
| Training samples | 5 |
| Validation samples | 1 |
| Best validation mel-L1 | 3.3801 |
| Uploaded variant | `best` |

## Usage

```python
from transformers import VitsModel, VitsTokenizer
import torch, torchaudio

model     = VitsModel.from_pretrained("Umbaji001/eyaa-tom-mms-tts-adj")
tokenizer = VitsTokenizer.from_pretrained("Umbaji001/eyaa-tom-mms-tts-adj")

inputs = tokenizer("your text here", return_tensors="pt")
with torch.no_grad():
    waveform = model(**inputs).waveform[0]

torchaudio.save("output.wav", waveform.unsqueeze(0), model.config.sampling_rate)
```

## Training Details
- **Loss**: Mel-spectrogram L1 (avoids VITS training restriction)
- **Optimizer**: AdamW — lr=2e-4, betas=(0.8, 0.99)
- **Scheduler**: ExponentialLR γ=0.999
- **Epochs**: 6  |  **Batch size**: 4 (effective 16 w/ grad accumulation)

## Citation
```bibtex
@article{pratap2023mms,
  title={Scaling Speech Technology to 1,000+ Languages},
  author={Pratap, Vineel et al.},
  journal={arXiv preprint arXiv:2305.13516},
  year={2023}
}
```
*Fine-tuned: 2026-02-25 — Eyaa-Tom project*