--- language: - adj tags: - text-to-speech - mms-tts - vits - adj - west-africa - eyaa-tom - finetuned license: cc-by-nc-4.0 base_model: facebook/mms-tts-adj pipeline_tag: text-to-speech --- # MMS-TTS Adja — Eyaa-Tom Fine-tuned Fine-tuned version of [facebook/mms-tts-adj](https://huggingface.co/facebook/mms-tts-adj) on the **Eyaa-Tom** dataset for **Adja** (`adj`). > Adja/Aja-Gbe. Fine-tuned from facebook/mms-tts-adj (closest MMS checkpoint to ISO ajg). ## Language Details | Field | Value | |-------|-------| | Language | Adja | | ISO 639-3 (MMS) | `adj` | | Your ISO | `ajg` | | Region | Togo/Benin | | Family | Gbe (Niger-Congo) | | Base model | [facebook/mms-tts-adj](https://huggingface.co/facebook/mms-tts-adj) | ## Training Statistics | Metric | Value | |--------|-------| | Training samples | 5 | | Validation samples | 1 | | Best validation mel-L1 | 3.3801 | | Uploaded variant | `best` | ## Usage ```python from transformers import VitsModel, VitsTokenizer import torch, torchaudio model = VitsModel.from_pretrained("Umbaji001/eyaa-tom-mms-tts-adj") tokenizer = VitsTokenizer.from_pretrained("Umbaji001/eyaa-tom-mms-tts-adj") inputs = tokenizer("your text here", return_tensors="pt") with torch.no_grad(): waveform = model(**inputs).waveform[0] torchaudio.save("output.wav", waveform.unsqueeze(0), model.config.sampling_rate) ``` ## Training Details - **Loss**: Mel-spectrogram L1 (avoids VITS training restriction) - **Optimizer**: AdamW — lr=2e-4, betas=(0.8, 0.99) - **Scheduler**: ExponentialLR γ=0.999 - **Epochs**: 6 | **Batch size**: 4 (effective 16 w/ grad accumulation) ## Citation ```bibtex @article{pratap2023mms, title={Scaling Speech Technology to 1,000+ Languages}, author={Pratap, Vineel et al.}, journal={arXiv preprint arXiv:2305.13516}, year={2023} } ``` *Fine-tuned: 2026-02-25 — Eyaa-Tom project*