How to use from
Docker Model Runner
docker model run hf.co/arealicehole/gemma-4-hermes-agent-v2-GGUF:Q4_K_M
Quick Links

Gemma 4 Hermes Agent v2 — GGUF

Converted from NF4 bitsandbytes quantization to GGUF Q4_K_M via Akash GPU.

Source

  • Original: ning423/gemma-4-e4b-hermes-agent-v2-bf16 (NF4 bitsandbytes, HuggingFace)
  • Base model: google/gemma-4-E4B-it
  • Architecture: Gemma4ForConditionalGeneration (4B parameters)
  • Conversion pipeline: NF4 → BF16 (GPU dequant) → GGUF F16 → Q4_K_M

Files

File Size Format
gemma-4-hermes-Q4_K_M.gguf 6.4 GB GGUF v3, Q4_K_M

Usage

llama-server -m gemma-4-hermes-Q4_K_M.gguf -ngl 99 -c 131072

Notes

  • Q8_0 quantization FAILS on Gemma 4 architecture (ncols=1 tensors) — use Q4_K_M
  • Supports tool use / function calling (Hermes agent fine-tune)
  • 128k context with KV cache compression (--cache-type-k q8_0 --cache-type-v q8_0)
Downloads last month
551
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arealicehole/gemma-4-hermes-agent-v2-GGUF

Quantized
(245)
this model