---
license: apache-2.0
library_name: gguf
pipeline_tag: image-text-to-text
language:
- en
- zh
tags:
- zen5
- zenlm
- hanzo
- gguf
- moe
- 3b-active
- long-context
- multimodal
- vision-language
- zen-5
---

# Zen5

Canonical default of the [Zen5](https://zenlm.org) family. **Multimodal** sparse MoE (image + text in → text out) with 35B total / 3B active parameters per token, 256K context. The everyday Zen5 model — agentic-trained, fast at scale, frontier-quality vision-language reasoning at a 3B-active compute budget.

Part of the canonical Zen5 ladder:

| SKU | Hardware fit | This repo |
|---|---|---|
| `zen5-flash` | anything (4 GB VRAM) | [zen-5-flash-gguf](https://huggingface.co/zenlm/zen-5-flash-gguf) |
| `zen5-mini` | 32 GB | [zen-5-mini-gguf](https://huggingface.co/zenlm/zen-5-mini-gguf) |
| **`zen5`** (default) | 24 GB+ VRAM (Q4_K) | **← you are here** |
| `zen5-pro` | Mac M4 Max / DGX Spark / H100 80GB | [zen-5-pro-gguf](https://huggingface.co/zenlm/zen-5-pro-gguf) |
| `zen5-max` | Mac Studio M3 Ultra 512GB / 8x H100 | [zen-5-max-gguf](https://huggingface.co/zenlm/zen-5-max-gguf) |

## Files

| File | Format |
|---|---|
| main GGUF (`*-Q4_K.gguf`) | GGUF Q4_K (text + vision), refusal-orthogonalized |
| `mmproj-model-f16.gguf` | multimodal vision projector — load alongside the main GGUF for image input |

## Run

Hosted via the Hanzo gateway (`api.hanzo.ai`) as `zen5`.

Local with `llama.cpp` (CLI / server) or `zen5-engine`:

```sh
hf download zenlm/zen-5-gguf --local-dir gguf
MAIN=$(ls gguf/*-Q4_K.gguf | head -1)

# text-only chat
llama-cli -m "$MAIN" -p "Explain MoE inference."

# vision-language (image input)
llama-cli -m "$MAIN" \
          --mmproj gguf/mmproj-model-f16.gguf \
          --image path/to/screenshot.png \
          -p "Describe this UI and propose a fix."
```