--- license: apache-2.0 datasets: - keithito/lj_speech language: - en base_model: - Qwen/Qwen3-TTS-12Hz-1.7B-Base tags: - text-to-speech - tts - voice-cloning - ljspeech - qwen library_name: transformers pipeline_tag: text-to-speech --- # Qwen3-TTS Fine-tuned on LJSpeech This model is a fine-tuned version of [Qwen/Qwen3-TTS-12Hz-1.7B-Base](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base) trained on the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/). ## Model Description - **Base Model:** Qwen3-TTS-12Hz-1.7B-Base - **Training Data:** LJSpeech-1.1 (200 samples subset) - **Voice:** Linda Johnson (female, American English) - **Training:** 3 epochs, loss reduced from 20.4 to 10.7 ## Voice Characteristics The model produces speech in the voice of **Linda Johnson**, featuring: - Clear, professional female voice - American English accent - Natural reading style (audiobook quality) - Consistent tone and pacing ## Use Cases - **Audiobook narration** - Professional reading voice for long-form content - **Virtual assistants** - Clear, friendly voice for AI applications - **Accessibility tools** - Text-to-speech for visually impaired users - **Content creation** - Voiceovers for videos and presentations - **Educational content** - Clear pronunciation for learning materials ## Training Details | Parameter | Value | |-----------|-------| | Epochs | 3 | | Batch Size | 1 (gradient accumulation: 4) | | Learning Rate | 5e-6 | | Mixed Precision | bf16 | | Starting Loss | 20.4 | | Final Loss | ~10.7 | ## License and Attribution - **Training Data:** LJSpeech dataset (Public Domain) - **Base Model:** Qwen3-TTS (Apache 2.0) - **This Fine-tuned Model:** Apache 2.0 ### Credits - Original recordings by Linda Johnson - LJSpeech dataset by [Keith Ito](https://keithito.com/LJ-Speech-Dataset/) - Base model by [Qwen Team](https://github.com/QwenLM/Qwen3-TTS)