--- license: apache-2.0 language: - it - en - pl - de - fr base_model: - nari-labs/Dia-1.6B pipeline_tag: text-to-speech tags: - speech - dia - text-to-speech - vocal - voice --- # Aurora-1.6B: Multilingual Emotion and Singing TTS Model A fine-tuned version of Dia-1.6B trained on multilingual and singing datasets, with full emotion control and zero-shot voice cloning. ## Features - **Multilingual Support** Natural speech in Italian, English, Polish, German, French, and more. - **Emotion Control** Use speaker tags or emotion tokens (e.g. `[S1]`, `[happy]`, `[sad]`) to modulate expressiveness. - **Singing Capabilities** Generate melodic vocals by providing singing prompts or style references. - **Zero-Shot Voice Cloning** Clone any speaker’s voice from a short audio sample. - **Nonverbal Vocalizations** Embed realistic effects like `(laughs)`, `(coughs)`, or `(sighs)` inline. ## Usage ```python from dia.model import Dia import soundfile as sf # Load the Aurora-1.6B model model = Dia.from_pretrained("Lorenzob/aurora-1.6b") # Generate a happy spoken line followed by singing text = "[S1][happy] Hello world! Now sing 'Happy Birthday to You'" audio = model.generate(text) # Save output at 44.1 kHz sf.write("output.wav", audio, 44100)