openai/gsm8k
Benchmark • Updated • 17.6k • 954k • 1.33k
LoRA adapter that improves math reasoning in Gemma 3 1B-IT, trained with a three-stage pipeline (SFT → SFT → GRPO) using Tunix on TPU v5e.
Writeup: gemma3-reasoning.
Builds the reasoning foundation using structured math and science datasets.
Expands capabilities to code, summarization, and creative writing while retaining math performance through weighted sampling.
Reinforcement learning using Group Relative Policy Optimization. The reward function verifies answer correctness against GSM8K gold labels — no judge model needed.
| Parameter | Value |
|---|---|
| Rank | 32 |
| Alpha | 32.0 |
| Target modules | q_einsum, kv_einsum, gate_proj, down_proj, up_proj, attn_vec_einsum |
This adapter was trained with Tunix/Qwix (JAX). To load and use:
from tunix.models.gemma3 import params, model
from tunix.generate import sampler as sampler_lib
import qwix
# Load base model
base = params.create_model_from_checkpoint(
params.GEMMA3_1B_IT,
model.ModelConfig.gemma3_1b_it()
)
# Apply LoRA structure
lora_model = qwix.apply_lora_to_model(
base,
qwix.LoraProvider(
module_path=".*q_einsum|.*kv_einsum|.*gate_proj|.*down_proj|.*up_proj|.*attn_vec_einsum",
rank=32, alpha=32.0,
),
rngs=nnx.Rngs(0),
**base.get_model_input(),
)
# Load trained adapter weights
from safetensors.numpy import load_file
adapter = load_file("adapter_model.safetensors")
# Merge adapter weights into lora_model state
# Generate
tokenizer = params.create_tokenizer()
sampler = sampler_lib.Sampler(
transformer=lora_model, tokenizer=tokenizer,
cache_config=sampler_lib.CacheConfig(
cache_size=1536,
num_layers=lora_model.config.num_layers,
num_kv_heads=lora_model.config.num_kv_heads,
head_dim=lora_model.config.head_dim,
),
)
prompt = "<start_of_turn>user\nWhat is 25 * 13? Think step by step.<end_of_turn>\n<start_of_turn>model\n"
out = sampler(
input_strings=[prompt],
max_generation_steps=512,
temperature=0.7, top_k=50, top_p=0.95,
echo=False, eos_tokens=[106],
)
print(out.text[0])
@misc{kotlar2025gemma3reasoning,
title={Gemma 3 1B-IT Reasoning LoRA},
author={Kotlar, Milos},
year={2025},
url={https://github.com/kotlarmilos/gemma3-reasoning}
}