---
license: apache-2.0
base_model: google/gemma-3n-e4b
library_name: gguf
tags:
  - gguf
  - ollama
  - claude-code
  - coding
  - agent
  - function-calling
  - gemma
language:
  - en
pipeline_tag: text-generation
---

# Gemma 4 Claude Coder — local model family

A family of custom models built on **Gemma 4** (edge variants E2B and E4B), tuned to act as
**autonomous coding and administration agents**. The models speak the Anthropic-compatible API,
so they drive **Claude Code** fully locally — your code never leaves your machine and cloud token
cost drops to zero.

Each model ships with a system prompt focused on real work inside a codebase: use tools instead
of guessing, make minimal and precise code changes, return complete and runnable output, and
verify after acting. Sampling follows Google's official Gemma 4 recommendation
(temperature 1.0, top_k 64, top_p 0.95), with thinking mode enabled for better planning before
a tool call.

## The idea

The whole point of this family is to run **Claude Code on small, popular, consumer-grade hardware**.
No datacenter GPU, no cloud bill — just an everyday Mac Mini (or similar 16 GB machine) acting as a
fully local, agentic coding assistant. These models make that practical: light enough to fit, smart
enough to drive real tool-calling agent loops.

In a time of **RAM shortages and the big tech giants tightening usage limits and quotas**, owning a
capable agent that runs entirely on your own modest hardware stops being a hobby and becomes
leverage: no rate limits, no surprise pricing, no dependency on someone else's quota.


## Models in the family

| Model | Base | Context | Purpose |
|---|---|---|---|
| **gemma4-e2b-claude-coder** | Gemma 4 E2B (eff. 2B / 5.1B with embeddings) | 64K | Fast everyday coding agent — edits, autocomplete, short agent loops. Lightest on memory. |
| **gemma4-e4b-claude-coder** | Gemma 4 E4B (eff. 4B / 8B with embeddings) | 64K | Stronger coding agent — better reasoning and tool use on larger tasks. |
| **gemma4-e4b-claude-coder-admin** | Gemma 4 E4B | 32K | Administration and system tasks (scripts, shell, devops). Smaller context fits 100% in GPU for higher, stable throughput. |

## What it's for

- Driving **Claude Code** locally (`ollama launch claude --model <name>`).
- Agentic code writing and editing with native **function calling / tool use**.
- Administration and devops tasks on a server (the *admin* variant).
- Full privacy and offline operation — no code sent to the cloud.

## Context

- **Coders (E2B / E4B):** 64K tokens — matching Claude Code's recommendation (64K minimum).
- **Admin (E4B):** 32K tokens — a deliberate trade-off for 16 GB hardware that keeps the model
  entirely on the GPU.
- Base Gemma 4 E2B/E4B natively supports up to 128K, so context can be raised on stronger hardware.

## Test hardware

The models were built and tested on:

- **Mac Mini (Apple Silicon, M-series), 16 GB RAM, macOS 15.6**
- **Ollama 0.24**, GPU (Metal) inference

### Measured performance (16 GB RAM)

| Model | Placement | Speed | Tool calling |
|---|---|---|---|
| gemma4-e2b-claude-coder | 100% GPU | ~55 tok/s | ✅ valid JSON |
| gemma4-e4b-claude-coder (64K) | 39% GPU / 61% CPU | ~27 tok/s (drops under load) | ✅ |
| gemma4-e4b-claude-coder-admin (32K) | **100% GPU** | ~30 tok/s (stable) | ✅ |

All three passed an end-to-end test through Claude Code: real turns with tool calls and correct
responses (HTTP 200 on `/v1/messages`).

## How they were made

These models were designed, built and tested with the help of **Claude Opus 4.8** — the best
coding model in the world. Their system prompts, parameter choices and context configuration draw
directly on its knowledge. In other words: the world's best coding model prepared local models
that take that work over right on your desk.

## License

Apache 2.0 (inherited from the base Gemma 4).