You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Hirundo Hardened Gemma 4 E4B-IT

A prompt-injection-hardened release of Google's gemma-4-E4B-it, produced with Hirundo's machine unlearning engine. The model behaves identically to the base model on its core capabilities, but is 74.5% less likely to follow injected instructions on Meta's PurpleLlama benchmark.

TL;DR

Metric	Pretrained	Hardened	Change
Prompt Injection Attack Success Rate (PurpleLlama)	18.73%	4.78%	−74.5% relative
Average utility delta across 6 reasoning/coding/knowledge benchmarks	—	—	±0.40 pp (within eval noise)

Why Unlearning, Not Guardrails

The standard playbook for prompt injection is to bolt on classifiers, tighten system prompts, or do another round of safety SFT. Each adds latency, brittleness, or capability tax — and none of them changes the underlying disposition of the model.

Hirundo's approach is different: we surgically modify the weights to remove the targeted behavior. The result is a drop-in replacement model that:

Has no inference-time overhead — no extra classifier in the path
Preserves utility — measurable across reasoning, coding, instruction-following, and knowledge benchmarks (see below)
Is more efficient than SFT or RLHF by orders of magnitude — no full retraining run, no preference data collection
Is verifiable in the weights themselves, not in a wrapper that can be stripped

Detailed Results

Prompt Injection Robustness — PurpleLlama

Attack Success Rate (ASR), aggregated across all attack categories in Meta's PurpleLlama prompt injection suite:

Benchmark	Pretrained ASR	Hardened ASR	Relative Reduction
All categories	18.73%	4.78%	74.47%

Utility Preservation — Nemo-Skills

Evaluated with NVIDIA's Nemo-Skills suite. All shifts are within typical run-to-run eval noise.

Benchmark	Pretrained	Hardened	Δ (pp)
AIME25	10.83	10.00	−0.83
GPQA	53.28	54.04	+0.76
IFBENCH	35.96	35.70	−0.26
LiveCodeBench	52.58	52.38	−0.20
MMLU-Pro	70.13	69.77	−0.36
SciCode	8.44	8.44	0.00

Mean absolute delta: 0.40 pp. GPQA improves; the rest move within noise.

Usage

Drop-in compatible with the base model — no prompt or interface changes required.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "hirundo-io/gemma-4-E4B-it-reduced-prompt-injection"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

To apply Hirundo's unlearning to your own model, or to evaluate models for undesirable behaviors, use our Python packages:

hirundo — Hirundo platform SDK for LLM behavior unlearning and dataset QA
llm-behavior-eval — Evaluate LLMs for undesirable behaviors such as bias and prompt injection susceptibility

pip install hirundo
pip install llm-behavior-eval

About Hirundo

Hirundo is the machine unlearning platform for production AI. We perform surgical, weight-level removal of unwanted model behaviors — prompt injection susceptibility, bias, PII memorization, hallucination patterns, and more — without retraining, without guardrails, and without measurable utility loss.

Hirundo works across all major open model families — including Gemma, Llama, Mistral, Qwen, NVIDIA Nemotron, IBM Granite, and others — at sizes ranging from sub-billion-parameter SLMs to frontier-scale open weights.

If you want this done on your own model, get in touch: hirundo.io.

License

Inherits the Gemma Terms of Use from the base model. © Google for the base weights; Hirundo for the unlearning modification.

Downloads last month: 141

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for hirundo-io/gemma-4-E4B-it-reduced-prompt-injection

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

(183)

this model