--- license: apache-2.0 base_model: - nvidia/Gemma-4-26B-A4B-NVFP4 pipeline_tag: image-text-to-text --- # Gemma 4 26B A4B NVFP4 GGUF ## About This model is an unmodified GGUF quantization of [nvidia/Gemma-4-26B-A4B-NVFP4](https://huggingface.co/nvidia/Gemma-4-26B-A4B-NVFP4), made with the llama.cpp conversion tool. Please refer to the official NVIDIA repository for quality metrics. As of now, this repository contains NVFP4 GGUF in two variants: NVFP4 and NVFP4_FP8 of Gemma 4 26B A4B, and a projector file. ## Inference It is not currently possible to inference Gemma 4 26B A4B NVFP4 using official llama.cpp. I suggest launching this model with my custom llama.cpp Docker image: `catlilface/llama.cpp:gemma4_26b_nvfp4`. It contains the server version of llama.cpp with CUDA 13. Feel free to use it until the official repo is available. This quantization has been tested on 1x 5070 Ti 16 GB with partial CPU offloading. ## Acknowledgements Special thanks to [ynankani](https://github.com/ynankani) for his contribution, which made this quantization possible.