GGUF + pure-C++ runtime in CrispASR β OmniASR-CTC-300M
We've added OmniASR-CTC-300M to CrispASR as the omniasr backend.
Three things bit hard during the port (all documented in our LEARNINGS.md "OmniASR-CTC: three critical findings"):
- Input must be layer-normalised waveform (zero mean, unit variance) β without this the CTC head emits mostly blanks.
- CTC blank = token 0 (
<s>), not token 1 (<pad>) like HF wav2vec2. fairseq2 convention. pos_conv padding = K // 2(=64 for K=128), not(K-1)//2. Fixes same-padding for fairseq2Conv1d.
Architecture: 7-layer CNN (Conv1d strides [5,2,2,2,2,2,2] = 320Γ downsampling) + Linear(512β1024) + 24L transformer (d=1024, 16 heads, FFN=4096, pre-norm, GELU) + CTC head (1024β9812). Raw 16 kHz PCM, no mel. ~1600 languages.
CTC = no native punctuation; pair with --punc-model fullstop-punc-q4_k.gguf (XLM-R-large, DE/EN/FR/IT) or fireredpunc-q8_0.gguf (BERT-base, EN+CN).
Pre-quantised GGUFs (Apache-2.0): cstr/omniASR-CTC-300M-v2-GGUF
./build/bin/crispasr --backend omniasr -m omniasr-ctc-300m-q4_k.gguf -l fr \
-f audio.wav --punc-model fullstop-punc-q4_k.gguf
CrispASR's omniasr backend also handles the 1B CTC variant and the autoregressive LLM variants β same source, GGUF metadata dispatch.