--- language: - zh - en - ja - ko - es - pt - ar - ru - fr - de - sv - it - tr - "no" - nl - cy - eu - ca - da - gl - ta - hu - fi - pl - et - hi - la - ur - th - vi - jw - bn - yo - sl - cs - sw - nn - he - ms - uk - id - kk - bg - lv - my - tl - sk - ne - fa - af - el - bo - hr - ro - sn - mi - yi - am - be - km - is - az - sd - br - sq - ps - mn - ht - ml - sr - sa - te - ka - bs - pa - lt - kn - si - hy - mr - as - gu - fo license: other license_name: fish-audio-research-license license_link: LICENSE.md pipeline_tag: text-to-speech library_name: pytorch base_model: fishaudio/s2-pro base_model_relation: quantized quantized_by: groxaxo inference: false tags: - text-to-speech - instruction-following - multilingual - s2 - pro - tts - cuda - "2026" - quantized - nf4 - bitsandbytes extra_gated_prompt: >- You agree to not use the model to generate contents that violate DMCA or local laws. extra_gated_fields: Country: country Specific date: date_picker I agree to use this model for non-commercial use ONLY: checkbox --- # S2-Pro NF4 S2-Pro overview [**GitHub Fork**](https://github.com/groxaxo/fish-speech-int4-patch) | [**Upstream Fish Speech**](https://github.com/fishaudio/fish-speech) | [**Technical Report**](https://huggingface.co/papers/2603.08823) | [**Fish Audio**](https://fish.audio) [![GitHub stars](https://img.shields.io/github/stars/groxaxo/fish-speech-int4-patch?style=for-the-badge&label=Star%20the%20Fork)](https://github.com/groxaxo/fish-speech-int4-patch/stargazers) [![GitHub repo](https://img.shields.io/badge/GitHub-groxaxo%2Ffish--speech--int4--patch-111827?style=for-the-badge&logo=github)](https://github.com/groxaxo/fish-speech-int4-patch) [![Upstream](https://img.shields.io/badge/Upstream-fishaudio%2Ffish--speech-1f7a8c?style=for-the-badge)](https://github.com/fishaudio/fish-speech) This repository hosts the **Groxaxo NF4 release of Fish Audio S2-Pro** for lower-VRAM inference. - **Base model:** Fish Audio S2-Pro - **Relation:** Quantized release - **Format:** bitsandbytes **NF4** prequantized `model.pth` - **Target hardware:** practical single-GPU inference on **12 GB+ VRAM** setups - **Best paired with:** `groxaxo/fish-speech-int4-patch` This is a community-hosted release of the original Fish Audio model. Credit for the base model, research, and architecture belongs to the [Fish Audio](https://fish.audio/) team. Huge thanks to the original creators at [Fish Audio](https://fish.audio/) and the upstream [fishaudio/fish-speech](https://github.com/fishaudio/fish-speech) project for building and open-sourcing S2-Pro. If this NF4 release helps you, please star the companion GitHub project here: **https://github.com/groxaxo/fish-speech-int4-patch** The goal is simple: make the flagship S2-Pro experience easier to run, easier to share, and easier to deploy on real-world single-GPU machines. ## What is in this repo - `model.pth`: prequantized NF4 checkpoint - `codec.pth`: codec weights - tokenizer/config assets needed by the patched loader The checkpoint is meant to be loaded through the fork's **bnb4** path. It is not a legacy `int4` or `int8` export. ## Recommended usage Use the patched repo that defaults to the right settings for this checkpoint: ```bash git clone https://github.com/groxaxo/fish-speech-int4-patch cd fish-speech-int4-patch ./install_bnb4_3060.sh ./start_bnb4_3060.sh ```` That path starts the API/WebUI with the intended defaults: * `--bnb4` * `--half` * lazy loading * `s2-pro` as the canonical model name ## Why people use this release * lower-VRAM **NF4** deployment path for S2-Pro * companion GitHub fork with API, WebUI, Docker, and export tooling * smoke-tested prequantized `model.pth` reload support * clearer self-hosting path for 12 GB and 24 GB cards ## Quick commands ### WebUI ```bash git clone https://github.com/groxaxo/fish-speech-int4-patch cd fish-speech-int4-patch ./install_bnb4_3060.sh ./start_bnb4_3060.sh ``` ### API server ```bash PYTHONPATH=. python tools/api_server.py \ --checkpoint-path /path/to/s2-pro \ --bnb4 \ --half \ --host 0.0.0.0 \ --port 8880 ``` ### OpenAI-style request ```bash curl http://127.0.0.1:8880/v1/audio/speech \ -H 'Content-Type: application/json' \ -d '{ "model": "s2-pro", "input": "[warm, calm] Hello from the Groxaxo NF4 S2-Pro release.", "voice": "default" }' \ --output speech.wav ``` ## Manual loading If you want to point the repo at this checkpoint directly, keep `--bnb4` enabled: ```bash PYTHONPATH=. python tools/api_server.py \ --checkpoint-path /path/to/s2-pro \ --bnb4 \ --half ``` Or in Python: ```python import torch from fish_speech.models.text2semantic.inference import init_model model, decode_one_token = init_model( checkpoint_path="/path/to/s2-pro", device="cuda:0", precision=torch.float16, compile=False, bnb4=True, ) ``` ## Why this release exists Upstream S2-Pro is excellent, but many single-card workstations do not have enough VRAM for a comfortable default setup. This NF4 release makes S2-Pro much easier to run on common cards like the RTX 3060 while preserving the flagship model path. ## Model notes * S2-Pro uses a **Dual-Autoregressive** architecture with a 4B slow AR stack and a fast residual AR stack. * It supports fine-grained inline control with natural-language tags such as `[whisper]`, `[laugh]`, and `[sad]`. * It supports multilingual generation, multi-speaker prompting, and strong voice cloning workflows. ## Prompt examples ```text [whisper] We need to leave quietly before sunrise. [excited] We actually got it working on a 12 GB card. [sad] I waited for you at the station all night. ``` ## Links * [Groxaxo fork README](https://github.com/groxaxo/fish-speech-int4-patch#readme) * [Star the GitHub project](https://github.com/groxaxo/fish-speech-int4-patch) * [GitHub issues and feature requests](https://github.com/groxaxo/fish-speech-int4-patch/issues) * [Fish Audio blog post](https://fish.audio/blog/fish-audio-open-sources-s2/) * [Fish Audio S2 technical report](https://huggingface.co/papers/2603.08823) ## License This model remains under the [Fish Audio Research License](LICENSE.md). Research and non-commercial use is permitted under that license. Commercial use requires a separate agreement with Fish Audio. ```