--- license: mit language: - zh - en - yue pipeline_tag: automatic-speech-recognition tags: - audio - speech-recognition - transcription - gguf - glm - zhipu - multilingual library_name: ggml base_model: zai-org/GLM-ASR-Nano-2512 --- # GLM-ASR-Nano-2512 — GGUF GGUF conversions and quantisations of [`zai-org/GLM-ASR-Nano-2512`](https://huggingface.co/zai-org/GLM-ASR-Nano-2512) for use with **[CrispStrobe/CrispASR](https://github.com/CrispStrobe/CrispASR)**. ## Available variants | File | Quant | Size | Notes | |---|---|---|---| | `glm-asr-nano.gguf` | F16 | 4.3 GB | Full precision | | `glm-asr-nano-q8_0.gguf` | Q8_0 | 2.3 GB | High quality | | `glm-asr-nano-q4_k.gguf` | Q4_K | 1.3 GB | Best size/quality tradeoff | All variants produce correct transcription on test audio. ## Model details - **Architecture:** Whisper encoder (1280d, 32L, partial RoPE) + 4-frame projector + Llama LLM (2048d, 28L, GQA 16/4) - **Parameters:** 1.5B - **Languages:** 17 (Mandarin, English, Cantonese, + 14 more) - **License:** MIT - **Outperforms OpenAI Whisper V3** on benchmarks (lowest avg error rate 4.10) ## Usage with CrispASR ```bash git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR cmake -S . -B build && cmake --build build -j8 # Auto-detect backend from GGUF ./build/bin/crispasr -m glm-asr-nano-q4_k.gguf -f audio.wav # Explicit backend ./build/bin/crispasr --backend glm-asr -m glm-asr-nano-q4_k.gguf -f audio.wav -osrt ``` ## Conversion ```bash python models/convert-glm-asr-to-gguf.py --input zai-org/GLM-ASR-Nano-2512 --output glm-asr-nano.gguf crispasr-quantize glm-asr-nano.gguf glm-asr-nano-q4_k.gguf q4_k ```