---
language:
  - zh
  - en
  - ja
  - ko
  - es
  - pt
  - ar
  - ru
  - fr
  - de
  - sv
  - it
  - tr
  - "no"
  - nl
  - cy
  - eu
  - ca
  - da
  - gl
  - ta
  - hu
  - fi
  - pl
  - et
  - hi
  - la
  - ur
  - th
  - vi
  - jw
  - bn
  - yo
  - sl
  - cs
  - sw
  - nn
  - he
  - ms
  - uk
  - id
  - kk
  - bg
  - lv
  - my
  - tl
  - sk
  - ne
  - fa
  - af
  - el
  - bo
  - hr
  - ro
  - sn
  - mi
  - yi
  - am
  - be
  - km
  - is
  - az
  - sd
  - br
  - sq
  - ps
  - mn
  - ht
  - ml
  - sr
  - sa
  - te
  - ka
  - bs
  - pa
  - lt
  - kn
  - si
  - hy
  - mr
  - as
  - gu
  - fo
license: other
license_name: fish-audio-research-license
license_link: LICENSE.md
pipeline_tag: text-to-speech
library_name: pytorch
base_model: fishaudio/s2-pro
base_model_relation: quantized
quantized_by: groxaxo
inference: false
tags:
  - text-to-speech
  - instruction-following
  - multilingual
  - s2
  - pro
  - tts
  - cuda
  - "2026"
  - quantized
  - nf4
  - bitsandbytes
extra_gated_prompt: >-
  You agree to not use the model to generate contents that violate DMCA or local
  laws.
extra_gated_fields:
  Country: country
  Specific date: date_picker
  I agree to use this model for non-commercial use ONLY: checkbox
---

# S2-Pro NF4

<img src="overview.png" alt="S2-Pro overview" width="100%">

[**GitHub Fork**](https://github.com/groxaxo/fish-speech-int4-patch) | [**Upstream Fish Speech**](https://github.com/fishaudio/fish-speech) | [**Technical Report**](https://huggingface.co/papers/2603.08823) | [**Fish Audio**](https://fish.audio)

[![GitHub stars](https://img.shields.io/github/stars/groxaxo/fish-speech-int4-patch?style=for-the-badge&label=Star%20the%20Fork)](https://github.com/groxaxo/fish-speech-int4-patch/stargazers)
[![GitHub repo](https://img.shields.io/badge/GitHub-groxaxo%2Ffish--speech--int4--patch-111827?style=for-the-badge&logo=github)](https://github.com/groxaxo/fish-speech-int4-patch)
[![Upstream](https://img.shields.io/badge/Upstream-fishaudio%2Ffish--speech-1f7a8c?style=for-the-badge)](https://github.com/fishaudio/fish-speech)

This repository hosts the **Groxaxo NF4 release of Fish Audio S2-Pro** for lower-VRAM inference.

- **Base model:** Fish Audio S2-Pro
- **Relation:** Quantized release
- **Format:** bitsandbytes **NF4** prequantized `model.pth`
- **Target hardware:** practical single-GPU inference on **12 GB+ VRAM** setups
- **Best paired with:** `groxaxo/fish-speech-int4-patch`

This is a community-hosted release of the original Fish Audio model. Credit for the base model, research, and architecture belongs to the [Fish Audio](https://fish.audio/) team.

Huge thanks to the original creators at [Fish Audio](https://fish.audio/) and the upstream [fishaudio/fish-speech](https://github.com/fishaudio/fish-speech) project for building and open-sourcing S2-Pro.

If this NF4 release helps you, please star the companion GitHub project here:

**https://github.com/groxaxo/fish-speech-int4-patch**

The goal is simple: make the flagship S2-Pro experience easier to run, easier to share, and easier to deploy on real-world single-GPU machines.

## What is in this repo

- `model.pth`: prequantized NF4 checkpoint
- `codec.pth`: codec weights
- tokenizer/config assets needed by the patched loader

The checkpoint is meant to be loaded through the fork's **bnb4** path. It is not a legacy `int4` or `int8` export.

## Recommended usage

Use the patched repo that defaults to the right settings for this checkpoint:

```bash
git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch

./install_bnb4_3060.sh
./start_bnb4_3060.sh
````

That path starts the API/WebUI with the intended defaults:

* `--bnb4`
* `--half`
* lazy loading
* `s2-pro` as the canonical model name

## Why people use this release

* lower-VRAM **NF4** deployment path for S2-Pro
* companion GitHub fork with API, WebUI, Docker, and export tooling
* smoke-tested prequantized `model.pth` reload support
* clearer self-hosting path for 12 GB and 24 GB cards

## Quick commands

### WebUI

```bash
git clone https://github.com/groxaxo/fish-speech-int4-patch
cd fish-speech-int4-patch
./install_bnb4_3060.sh
./start_bnb4_3060.sh
```

### API server

```bash
PYTHONPATH=. python tools/api_server.py \
  --checkpoint-path /path/to/s2-pro \
  --bnb4 \
  --half \
  --host 0.0.0.0 \
  --port 8880
```

### OpenAI-style request

```bash
curl http://127.0.0.1:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "s2-pro",
    "input": "[warm, calm] Hello from the Groxaxo NF4 S2-Pro release.",
    "voice": "default"
  }' \
  --output speech.wav
```

## Manual loading

If you want to point the repo at this checkpoint directly, keep `--bnb4` enabled:

```bash
PYTHONPATH=. python tools/api_server.py \
  --checkpoint-path /path/to/s2-pro \
  --bnb4 \
  --half
```

Or in Python:

```python
import torch
from fish_speech.models.text2semantic.inference import init_model

model, decode_one_token = init_model(
    checkpoint_path="/path/to/s2-pro",
    device="cuda:0",
    precision=torch.float16,
    compile=False,
    bnb4=True,
)
```

## Why this release exists

Upstream S2-Pro is excellent, but many single-card workstations do not have enough VRAM for a comfortable default setup. This NF4 release makes S2-Pro much easier to run on common cards like the RTX 3060 while preserving the flagship model path.

## Model notes

* S2-Pro uses a **Dual-Autoregressive** architecture with a 4B slow AR stack and a fast residual AR stack.
* It supports fine-grained inline control with natural-language tags such as `[whisper]`, `[laugh]`, and `[sad]`.
* It supports multilingual generation, multi-speaker prompting, and strong voice cloning workflows.

## Prompt examples

```text
[whisper] We need to leave quietly before sunrise.
[excited] We actually got it working on a 12 GB card.
[sad] I waited for you at the station all night.
```

## Links

* [Groxaxo fork README](https://github.com/groxaxo/fish-speech-int4-patch#readme)
* [Star the GitHub project](https://github.com/groxaxo/fish-speech-int4-patch)
* [GitHub issues and feature requests](https://github.com/groxaxo/fish-speech-int4-patch/issues)
* [Fish Audio blog post](https://fish.audio/blog/fish-audio-open-sources-s2/)
* [Fish Audio S2 technical report](https://huggingface.co/papers/2603.08823)

## License

This model remains under the [Fish Audio Research License](LICENSE.md). Research and non-commercial use is permitted under that license. Commercial use requires a separate agreement with Fish Audio.

```