---
license: apache-2.0
language:
- en
base_model:
- google/gemma-4-E4B-it
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- text-generation-inference
- llama.cpp
---

# **gemma-4-E4B-it-F32-GGUF**

> Gemma-4-E4B-it from Google is a 4.5B effective parameter (8B total with Per-Layer Embeddings) multimodal dense model in the Gemma 4 family, optimized for edge deployment on laptops, high-end smartphones, and consumer GPUs with native support for text, images (variable aspect ratio/resolution), audio processing, and configurable thinking modes for step-by-step reasoning. Featuring 42 layers, 512-token sliding window, 128K context length, and 262K vocabulary, it delivers frontier-level performance in agentic workflows, multilingual OCR/handwriting recognition, document/PDF parsing, UI/screen analysis, chart interpretation, object detection with pointing, coding assistance, and low-latency speech-to-text understanding—rivaling models 10-20x larger while maintaining Google's production-grade safety alignments. The instruction-tuned variant excels at on-device autonomous agents via Android AICore/Qualcomm optimizations, with open weights enabling local-first inference (MediaTek/ARM CPUs, NVIDIA RTX) for privacy-focused applications like mobile IDEs, real-time document processing, and structured data extraction in resource-constrained environments.

## Quick start with llama.cpp

```
llama-server -hf prithivMLmods/gemma-4-E4B-it-F32-GGUF:F32
```

## Model Files

   File Name | Quant Type | File Size | File Link |
 | - | - | - | - |
 | gemma-4-E4B-it.BF16.gguf | BF16 | 15.1 GB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.BF16.gguf) |
 | gemma-4-E4B-it.F16.gguf | F16 | 15.1 GB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.F16.gguf) |
 | gemma-4-E4B-it.F32.gguf | F32 | 30.1 GB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.F32.gguf) |
 | gemma-4-E4B-it.Q8_0.gguf | Q8_0 | 8.01 GB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.Q8_0.gguf) |
 | gemma-4-E4B-it.mmproj-bf16.gguf | mmproj-bf16 | 992 MB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.mmproj-bf16.gguf) |
 | gemma-4-E4B-it.mmproj-f16.gguf | mmproj-f16 | 992 MB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.mmproj-f16.gguf) |
 | gemma-4-E4B-it.mmproj-f32.gguf | mmproj-f32 | 1.91 GB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.mmproj-f32.gguf) |
 | gemma-4-E4B-it.mmproj-q8_0.gguf | mmproj-q8_0 | 560 MB | [Download](https://huggingface.co/prithivMLmods/gemma-4-E4B-it-F32-GGUF/blob/main/GGUF/gemma-4-E4B-it.mmproj-q8_0.gguf) |

## Quants Usage 

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant
types (lower is better):

![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)