Instructions to use phucpx247/vieneu-tts-v2-turbo-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use phucpx247/vieneu-tts-v2-turbo-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="phucpx247/vieneu-tts-v2-turbo-gguf", filename="vieneu-tts-v2-turbo.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use phucpx247/vieneu-tts-v2-turbo-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf phucpx247/vieneu-tts-v2-turbo-gguf # Run inference directly in the terminal: llama-cli -hf phucpx247/vieneu-tts-v2-turbo-gguf
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf phucpx247/vieneu-tts-v2-turbo-gguf # Run inference directly in the terminal: llama-cli -hf phucpx247/vieneu-tts-v2-turbo-gguf
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf phucpx247/vieneu-tts-v2-turbo-gguf # Run inference directly in the terminal: ./llama-cli -hf phucpx247/vieneu-tts-v2-turbo-gguf
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf phucpx247/vieneu-tts-v2-turbo-gguf # Run inference directly in the terminal: ./build/bin/llama-cli -hf phucpx247/vieneu-tts-v2-turbo-gguf
Use Docker
docker model run hf.co/phucpx247/vieneu-tts-v2-turbo-gguf
- LM Studio
- Jan
- Ollama
How to use phucpx247/vieneu-tts-v2-turbo-gguf with Ollama:
ollama run hf.co/phucpx247/vieneu-tts-v2-turbo-gguf
- Unsloth Studio new
How to use phucpx247/vieneu-tts-v2-turbo-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for phucpx247/vieneu-tts-v2-turbo-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for phucpx247/vieneu-tts-v2-turbo-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for phucpx247/vieneu-tts-v2-turbo-gguf to start chatting
- Docker Model Runner
How to use phucpx247/vieneu-tts-v2-turbo-gguf with Docker Model Runner:
docker model run hf.co/phucpx247/vieneu-tts-v2-turbo-gguf
- Lemonade
How to use phucpx247/vieneu-tts-v2-turbo-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull phucpx247/vieneu-tts-v2-turbo-gguf
Run and chat with the model
lemonade run user.vieneu-tts-v2-turbo-gguf-{{QUANT_TAG}}List all available models
lemonade list
output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)🦜 VieNeu-TTS v2 Turbo — GGUF
Ultra-fast Vietnamese & English TTS — runs entirely on CPU, no GPU required.
📖 Model Description
VieNeu-TTS v2 Turbo is the lightweight, CPU-optimized edition of the VieNeu-TTS family — a state-of-the-art Vietnamese Text-to-Speech system. Quantized to GGUF format and paired with an ONNX neural codec, this model delivers near-real-time speech synthesis on commodity hardware: laptops, edge devices, and even Raspberry Pi class machines.
This repository hosts the GGUF quantized weights intended for use with llama-cpp-python as the inference backend, alongside the companion ONNX codec for waveform generation.
What makes it special?
- 🇻🇳🇺🇸 Bilingual (Code-switching): Naturally handles mixed Vietnamese–English sentences, powered by sea-g2p. No need to pre-label language boundaries.
- ⚡ Extreme Speed: Optimized GGUF quantization achieves real-time or faster inference on a standard CPU.
- 💻 Zero GPU Dependency: Runs fully offline on any x86_64 / ARM64 machine with sufficient RAM.
- 🔇 AI Watermarking: Audio output embeds an imperceptible identifier for responsible AI content tracing.
- 🔊 24 kHz Audio: High-fidelity waveform output suitable for production applications.
🗂️ Repository Contents
| File | Description |
|---|---|
vieneu-v2-turbo-*.gguf |
GGUF quantized LLM backbone (multiple quant levels) |
🚀 Quickstart
Option 1 — Install via vieneu SDK (Recommended)
# Minimal installation (Turbo/CPU Only)
pip install vieneu
# Optional: Pre-built llama-cpp-python for CPU (if building fails)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/
# Optional: macOS Metal acceleration
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/
from vieneu import Vieneu
# Turbo mode is the default — no GPU needed
tts = Vieneu()
# Vietnamese only
audio = tts.infer(text="Xin chào! Đây là VieNeu TTS phiên bản Turbo.")
tts.save(audio, "output.wav")
# Bilingual code-switching
audio = tts.infer(
text="Trước đây, hệ thống điện sử dụng direct current, nhưng Tesla đã chứng minh alternating current is more efficient."
)
tts.save(audio, "output_bilingual.wav")
Option 2 — Web UI (Full repo)
git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
uv sync # minimal install (Turbo/CPU)
uv run vieneu-web
# → Open http://127.0.0.1:7860
🌐 Bilingual Code-Switching
VieNeu-TTS v2 Turbo can handle natural Vietnamese–English mixed text without any special markup. The sea-g2p engine automatically identifies language boundaries and generates accurate phonemes for both languages.
from vieneu import Vieneu
tts = Vieneu()
examples = [
"Hôm nay tôi sẽ trình bày về machine learning và deep learning.",
"The new feature là rất hữu ích cho developers.",
"VieNeu supports both Vietnamese và English seamlessly.",
]
for i, text in enumerate(examples):
audio = tts.infer(text=text)
tts.save(audio, f"bilingual_{i}.wav")
🎙️ Preset Voices
The model ships with multiple preset voices. List and use them via the SDK:
from vieneu import Vieneu
tts = Vieneu()
# List available preset voices
voices = tts.list_preset_voices()
for description, voice_id in voices:
print(f" {description} → ID: {voice_id}")
# Use a specific voice
voice_data = tts.get_preset_voice("xuan_vinh") # default: Southern Male
audio = tts.infer(
text="Giọng đọc này được tổng hợp bởi VieNeu Turbo.",
voice=voice_data
)
tts.save(audio, "preset_voice.wav")
Note: Instant Voice Cloning is not yet available in Turbo mode. It is planned for a future release. For cloning, use the standard GPU-based
VieNeu-TTS-v2model.
🔬 Model Architecture
VieNeu-TTS v2 Turbo is a two-stage TTS system:
- LLM Backbone (GGUF): A transformer language model conditioned on text tokens and speaker embeddings. It predicts discrete audio codec tokens autoregressively.
- Neural Codec (ONNX): A VQ-VAE-based neural codec (VieNeu-Codec) decodes the predicted token sequence into a 24 kHz waveform.
The bilingual capability is enabled by sea-g2p, which converts mixed-language graphemes to phonemes before the LLM backbone processes them.
📊 Training Data
The model was trained on over 20,000 hours of combined Vietnamese and English speech data, covering a wide range of speakers, accents, recording conditions, and speaking styles.
| Dataset | Language | Description |
|---|---|---|
pnnbao-ump/VieNeu-TTS-1000h |
Vietnamese | Curated studio-quality Vietnamese speech corpus |
pnnbao-ump/vietnamese-audio-corpus |
Vietnamese | Diverse multi-speaker Vietnamese audio |
amphion/Emilia-Dataset |
Multilingual | Large-scale multilingual speech dataset |
facebook/multilingual_librispeech |
English + others | Multilingual read speech |
🗺️ Roadmap
- GGUF/ONNX Turbo engine
- Bilingual (Vietnamese–English) code-switching
- Turbo Voice Cloning
- Mobile SDK (Android / iOS)
- Streaming output API
🤝 Related Resources
| Resource | Link |
|---|---|
| 📦 PyPI Package | pip install vieneu |
| 🐙 GitHub | pnnbao97/VieNeu-TTS |
| 📖 Documentation | docs.vieneu.io |
| 🤗 Full Model (GPU) | pnnbao-ump/VieNeu-TTS |
| 💬 Discord Community | Join here |
| ☕ Support the project | buymeacoffee.com/pnnbao |
📄 License
This model is released under the Apache License 2.0 — free for personal and commercial use.
Made with ❤️ for the Vietnamese TTS community by @pnnbao97 and contributors.
- Downloads last month
- 28
We're not able to determine the quantization variants.
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="phucpx247/vieneu-tts-v2-turbo-gguf", filename="vieneu-tts-v2-turbo.gguf", )