---
license: apache-2.0
datasets:
- keithito/lj_speech
language:
- en
base_model:
- Qwen/Qwen3-TTS-12Hz-1.7B-Base
tags:
- text-to-speech
- tts
- voice-cloning
- ljspeech
- qwen
library_name: transformers
pipeline_tag: text-to-speech
---

# Qwen3-TTS Fine-tuned on LJSpeech

This model is a fine-tuned version of [Qwen/Qwen3-TTS-12Hz-1.7B-Base](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base) trained on the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).

## Model Description

- **Base Model:** Qwen3-TTS-12Hz-1.7B-Base
- **Training Data:** LJSpeech-1.1 (200 samples subset)
- **Voice:** Linda Johnson (female, American English)
- **Training:** 3 epochs, loss reduced from 20.4 to 10.7

## Voice Characteristics

The model produces speech in the voice of **Linda Johnson**, featuring:
- Clear, professional female voice
- American English accent
- Natural reading style (audiobook quality)
- Consistent tone and pacing

## Use Cases

- **Audiobook narration** - Professional reading voice for long-form content
- **Virtual assistants** - Clear, friendly voice for AI applications
- **Accessibility tools** - Text-to-speech for visually impaired users
- **Content creation** - Voiceovers for videos and presentations
- **Educational content** - Clear pronunciation for learning materials

## Training Details

| Parameter | Value |
|-----------|-------|
| Epochs | 3 |
| Batch Size | 1 (gradient accumulation: 4) |
| Learning Rate | 5e-6 |
| Mixed Precision | bf16 |
| Starting Loss | 20.4 |
| Final Loss | ~10.7 |

## License and Attribution

- **Training Data:** LJSpeech dataset (Public Domain)
- **Base Model:** Qwen3-TTS (Apache 2.0)
- **This Fine-tuned Model:** Apache 2.0

### Credits

- Original recordings by Linda Johnson
- LJSpeech dataset by [Keith Ito](https://keithito.com/LJ-Speech-Dataset/)
- Base model by [Qwen Team](https://github.com/QwenLM/Qwen3-TTS)