How to use from
Lemonade
Pull the model
# Download Lemonade from https://lemonade-server.ai/
lemonade pull catlilface/Gemma-4-26B-A4B-NVFP4-GGUF:NVFP4
Run and chat with the model
lemonade run user.Gemma-4-26B-A4B-NVFP4-GGUF-NVFP4
List all available models
lemonade list
Quick Links

Gemma 4 26B A4B NVFP4 GGUF

About

This model is an unmodified GGUF quantization of nvidia/Gemma-4-26B-A4B-NVFP4, made with the llama.cpp conversion tool. Please refer to the official NVIDIA repository for quality metrics. As of now, this repository contains NVFP4 GGUF in two variants: NVFP4 and NVFP4_FP8 of Gemma 4 26B A4B, and a projector file.

Inference

It is recommended to use llamacpp with docker. You can start inferencing this model with the command below:

docker run --rm \
  --runtime nvidia \
  --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e HF_HUB_CACHE=/root/.cache/huggingface/hub \
  -p 8080:8080 \
  ghcr.io/ggml-org/llama.cpp:server-cuda13 \
  -hf catlilface/Gemma-4-26B-A4B-NVFP4-GGUF:NVFP4 \
  -c 4000

Adjust llama.cpp parameters to better fit your hardware.

Acknowledgements

Special thanks to ynankani for his contribution, which made this quantization possible.

Downloads last month
2,846
GGUF
Model size
25B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for catlilface/Gemma-4-26B-A4B-NVFP4-GGUF

Quantized
(1)
this model