--- license: apache-2.0 library_name: gguf pipeline_tag: image-text-to-text language: - en - zh tags: - zen5 - zenlm - hanzo - gguf - moe - 3b-active - long-context - multimodal - vision-language - zen-5 --- # Zen5 Canonical default of the [Zen5](https://zenlm.org) family. **Multimodal** sparse MoE (image + text in → text out) with 35B total / 3B active parameters per token, 256K context. The everyday Zen5 model — agentic-trained, fast at scale, frontier-quality vision-language reasoning at a 3B-active compute budget. Part of the canonical Zen5 ladder: | SKU | Hardware fit | This repo | |---|---|---| | `zen5-flash` | anything (4 GB VRAM) | [zen-5-flash-gguf](https://huggingface.co/zenlm/zen-5-flash-gguf) | | `zen5-mini` | 32 GB | [zen-5-mini-gguf](https://huggingface.co/zenlm/zen-5-mini-gguf) | | **`zen5`** (default) | 24 GB+ VRAM (Q4_K) | **← you are here** | | `zen5-pro` | Mac M4 Max / DGX Spark / H100 80GB | [zen-5-pro-gguf](https://huggingface.co/zenlm/zen-5-pro-gguf) | | `zen5-max` | Mac Studio M3 Ultra 512GB / 8x H100 | [zen-5-max-gguf](https://huggingface.co/zenlm/zen-5-max-gguf) | ## Files | File | Format | |---|---| | main GGUF (`*-Q4_K.gguf`) | GGUF Q4_K (text + vision), refusal-orthogonalized | | `mmproj-model-f16.gguf` | multimodal vision projector — load alongside the main GGUF for image input | ## Run Hosted via the Hanzo gateway (`api.hanzo.ai`) as `zen5`. Local with `llama.cpp` (CLI / server) or `zen5-engine`: ```sh hf download zenlm/zen-5-gguf --local-dir gguf MAIN=$(ls gguf/*-Q4_K.gguf | head -1) # text-only chat llama-cli -m "$MAIN" -p "Explain MoE inference." # vision-language (image input) llama-cli -m "$MAIN" \ --mmproj gguf/mmproj-model-f16.gguf \ --image path/to/screenshot.png \ -p "Describe this UI and propose a fix." ```