--- license: apache-2.0 base_model: google/gemma-3n-e4b library_name: gguf tags: - gguf - ollama - claude-code - coding - agent - function-calling - gemma language: - en pipeline_tag: text-generation --- # Gemma 4 Claude Coder — local model family A family of custom models built on **Gemma 4** (edge variants E2B and E4B), tuned to act as **autonomous coding and administration agents**. The models speak the Anthropic-compatible API, so they drive **Claude Code** fully locally — your code never leaves your machine and cloud token cost drops to zero. Each model ships with a system prompt focused on real work inside a codebase: use tools instead of guessing, make minimal and precise code changes, return complete and runnable output, and verify after acting. Sampling follows Google's official Gemma 4 recommendation (temperature 1.0, top_k 64, top_p 0.95), with thinking mode enabled for better planning before a tool call. ## The idea The whole point of this family is to run **Claude Code on small, popular, consumer-grade hardware**. No datacenter GPU, no cloud bill — just an everyday Mac Mini (or similar 16 GB machine) acting as a fully local, agentic coding assistant. These models make that practical: light enough to fit, smart enough to drive real tool-calling agent loops. In a time of **RAM shortages and the big tech giants tightening usage limits and quotas**, owning a capable agent that runs entirely on your own modest hardware stops being a hobby and becomes leverage: no rate limits, no surprise pricing, no dependency on someone else's quota. ## Models in the family | Model | Base | Context | Purpose | |---|---|---|---| | **gemma4-e2b-claude-coder** | Gemma 4 E2B (eff. 2B / 5.1B with embeddings) | 64K | Fast everyday coding agent — edits, autocomplete, short agent loops. Lightest on memory. | | **gemma4-e4b-claude-coder** | Gemma 4 E4B (eff. 4B / 8B with embeddings) | 64K | Stronger coding agent — better reasoning and tool use on larger tasks. | | **gemma4-e4b-claude-coder-admin** | Gemma 4 E4B | 32K | Administration and system tasks (scripts, shell, devops). Smaller context fits 100% in GPU for higher, stable throughput. | ## What it's for - Driving **Claude Code** locally (`ollama launch claude --model `). - Agentic code writing and editing with native **function calling / tool use**. - Administration and devops tasks on a server (the *admin* variant). - Full privacy and offline operation — no code sent to the cloud. ## Context - **Coders (E2B / E4B):** 64K tokens — matching Claude Code's recommendation (64K minimum). - **Admin (E4B):** 32K tokens — a deliberate trade-off for 16 GB hardware that keeps the model entirely on the GPU. - Base Gemma 4 E2B/E4B natively supports up to 128K, so context can be raised on stronger hardware. ## Test hardware The models were built and tested on: - **Mac Mini (Apple Silicon, M-series), 16 GB RAM, macOS 15.6** - **Ollama 0.24**, GPU (Metal) inference ### Measured performance (16 GB RAM) | Model | Placement | Speed | Tool calling | |---|---|---|---| | gemma4-e2b-claude-coder | 100% GPU | ~55 tok/s | ✅ valid JSON | | gemma4-e4b-claude-coder (64K) | 39% GPU / 61% CPU | ~27 tok/s (drops under load) | ✅ | | gemma4-e4b-claude-coder-admin (32K) | **100% GPU** | ~30 tok/s (stable) | ✅ | All three passed an end-to-end test through Claude Code: real turns with tool calls and correct responses (HTTP 200 on `/v1/messages`). ## How they were made These models were designed, built and tested with the help of **Claude Opus 4.8** — the best coding model in the world. Their system prompts, parameter choices and context configuration draw directly on its knowledge. In other words: the world's best coding model prepared local models that take that work over right on your desk. ## License Apache 2.0 (inherited from the base Gemma 4).