---
license: apache-2.0
base_model:
- nvidia/Gemma-4-26B-A4B-NVFP4
pipeline_tag: image-text-to-text
---

# Gemma 4 26B A4B NVFP4 GGUF

## About
This model is an unmodified GGUF quantization of [nvidia/Gemma-4-26B-A4B-NVFP4](https://huggingface.co/nvidia/Gemma-4-26B-A4B-NVFP4), made with the llama.cpp conversion tool.
Please refer to the official NVIDIA repository for quality metrics.
As of now, this repository contains NVFP4 GGUF in two variants: NVFP4 and NVFP4_FP8 of Gemma 4 26B A4B, and a projector file.

## Inference
It is recommended to use llamacpp with docker. You can start inferencing this model with the command below:
```bash
docker run --rm \
  --runtime nvidia \
  --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e HF_HUB_CACHE=/root/.cache/huggingface/hub \
  -p 8080:8080 \
  ghcr.io/ggml-org/llama.cpp:server-cuda13 \
  -hf catlilface/Gemma-4-26B-A4B-NVFP4-GGUF:NVFP4 \
  -c 4000
```

Adjust llama.cpp parameters to better fit your hardware.

## Acknowledgements
Special thanks to [ynankani](https://github.com/ynankani) for his contribution, which made this quantization possible.