--- license: apache-2.0 base_model: - nvidia/Gemma-4-26B-A4B-NVFP4 pipeline_tag: image-text-to-text --- # Gemma 4 26B A4B NVFP4 GGUF ## About This model is an unmodified GGUF quantization of [nvidia/Gemma-4-26B-A4B-NVFP4](https://huggingface.co/nvidia/Gemma-4-26B-A4B-NVFP4), made with the llama.cpp conversion tool. Please refer to the official NVIDIA repository for quality metrics. As of now, this repository contains NVFP4 GGUF in two variants: NVFP4 and NVFP4_FP8 of Gemma 4 26B A4B, and a projector file. ## Inference It is recommended to use llamacpp with docker. You can start inferencing this model with the command below: ```bash docker run --rm \ --runtime nvidia \ --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -e HF_HUB_CACHE=/root/.cache/huggingface/hub \ -p 8080:8080 \ ghcr.io/ggml-org/llama.cpp:server-cuda13 \ -hf catlilface/Gemma-4-26B-A4B-NVFP4-GGUF:NVFP4 \ -c 4000 ``` Adjust llama.cpp parameters to better fit your hardware. ## Acknowledgements Special thanks to [ynankani](https://github.com/ynankani) for his contribution, which made this quantization possible.