GGUF for the 12 Hz codec tokenizer in CrispASR (companion to Qwen3-TTS talkers)

by cstr - opened May 1

May 1

The 12 Hz codec tokenizer is the required companion to all Qwen3-TTS-12Hz talkers in CrispASR.

We ship it at: cstr/qwen3-tts-tokenizer-12hz-GGUF — auto-loaded by the qwen3-tts backend (and its qwen3-tts-1.7b-base alias).

Notable implementation note: the codec decoder (8L sliding-window transformer + ConvNeXt + 4× SnakeBeta+tconv → 24 kHz waveform) hung the M1 GPU command buffer with kIOGPUCommandBufferCallbackErrorImpactingInteractivity — each kernel_conv_transpose_1d output thread was doing 320 inner-loop iterations of which only 2 contributed. We patched the Metal kernel to compute the contributing range up-front (// CrispASR patch in ggml/src/ggml-metal/ggml-metal.metal). MUST be re-applied after every ggml bump (LEARNINGS.md "Metal conv_transpose_1d input range tightening").

Sibling talker GGUFs:

./build/bin/crispasr --backend qwen3-tts \
    -m qwen3-tts-0.6b-base-q4_k.gguf \
    --tts "Hello world" --tts-output out.wav
# (the tokenizer is auto-fetched on first run via the model registry)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment