Request for reproducible Magpie-TTS fine-tuning setup for new language adaptation

by tauqeersajid - opened 9 days ago

Hi NMikka,

Thank you for sharing the Georgian Magpie-TTS fine-tuned model. I am also trying to fine-tune NVIDIA Magpie-TTS for a new language, specifically Slovene, and your model seems to be one of the few public examples of successful language adaptation.

I checked the model card and saw that the model was fine-tuned from nvidia/magpie_tts_multilingual_357m using NeMo, with Full SFT, LR 2e-5, 37 epochs, bf16-mixed precision, and the NeMo commit:

3d73c48aca1ae3be44657267b81f25dc3201161a

Would you be willing to share the exact fine-tuning setup you used?

Specifically, it would be very helpful if you could share:

The exact magpietts.yaml / Hydra config used for training
The full training command with all overrides
Whether you modified any files in the NeMo repo
If yes, could you share the changed files, patch, or commit diff?
The exact dataset manifest format you used
Whether you precomputed target_audio_codes_path and context_audio_codes_path
How you selected context_audio_filepath and context_text for each sample
Which tokenizer configuration you used for Georgian
Whether you used google/byt5-small as a byte-level tokenizer or made any language-specific tokenizer changes
Whether you changed alignment_loss_scale, prior_scaling_factor, cfg_unconditional_prob, context_duration_min/max, or any decoder settings
Whether you used trainer.precision=32 first and later switched to bf16-mixed, or trained directly with bf16-mixed
Any inference settings that helped avoid repetitions or artifacts, such as temperature, topk, cfg_scale, max_decoder_steps, or use_local_transformer_for_inference

I am asking because my fine-tuned model trains, but the generated audio sometimes has artifacts, repeated words, or duplicated segments. I want to understand whether the issue is coming from my data preparation, tokenizer setup, cached codec extraction, NeMo version, training config, or inference settings.

Thanks again for releasing the model. It would really help others who are trying to adapt Magpie-TTS to low-resource or unsupported languages.

Best regards,
Tauqeer

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment