---
library_name: transformers
license: gemma
base_model: google/gemma-4-E4B-it
base_model_relation: finetune
tags:
  - gemma4
  - reasoning
  - chain-of-thought
  - distillation
  - lora
  - unsloth
  - fine-tuned
  - thinking
datasets:
  - nohurry/Opus-4.6-Reasoning-3000x-filtered
  - Roman1111111/claude-opus-4.6-10000x
  - AI-MO/NuminaMath-CoT
  - TIGER-Lab/MathInstruct
language:
  - en
pipeline_tag: text-generation
model-index:
  - name: gemma-4-e4b-opus-reasoning-v2
    results: []
---

# Gemma 4 E4B — Opus Reasoning V2

A reasoning-enhanced fine-tune of [google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it), distilled from Claude Opus 4.6 reasoning traces with supplementary math Chain-of-Thought data.

## Model Details

| | |
|---|---|
| **Base Model** | `google/gemma-4-E4B-it` (4.5B effective params, 8B with embeddings) |
| **Architecture** | Dense transformer with Per-Layer Embeddings (PLE), 128K context |
| **Fine-tuning Method** | LoRA via [Unsloth](https://github.com/unslothai/unsloth) |
| **Precision** | Merged float16 |
| **Training Hardware** | NVIDIA A100 80GB (RunPod) |
| **Training Framework** | Unsloth + HuggingFace TRL (SFTTrainer) |

### LoRA Configuration

| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0 |
| Bias | None |
| Target Modules | Attention + MLP (language layers only) |

### Training Configuration

| Parameter | Value |
|---|---|
| Epochs | 2 |
| Learning Rate | 1e-4 (cosine schedule) |
| Batch Size | 8 (2 per device × 4 gradient accumulation) |
| Warmup Steps | 100 |
| Optimizer | AdamW 8-bit |
| Weight Decay | 0.01 |
| Max Sequence Length | 4096 |
| Response-only Training | Yes (user turns masked) |
| Final Training Loss | ~0.54 |

## Training Data

Around 20,000 samples combining reasoning distillation and math Chain-of-Thought data (~40% math content):

| Dataset | Samples | Purpose |
|---|---|---|
| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Claude Opus 4.6 reasoning traces |
| [Roman1111111/claude-opus-4.6-10000x](https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x) | 9,633 | Claude Opus 4.6 extended reasoning |
| [AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 4,000 (sampled) | Math Chain-of-Thought solutions |
| [TIGER-Lab/MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) | 4,000 (sampled) | Math CoT + Program-of-Thought |

All assistant responses were formatted with `<think>...</think>` blocks to teach the model structured reasoning before answering.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "naazimsnh02/gemma-4-e4b-opus-reasoning-v2",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/gemma-4-e4b-opus-reasoning-v2")

messages = [{"role": "user", "content": "A train travels 60 km/h. How long does it take to cover 255 km?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

output = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
```

## Limitations & Disclaimers

- **This is a reasoning-focused model, not a benchmark-optimized release.** It has not been evaluated on standard benchmarks (MMLU, GSM8K, HumanEval, etc.). Performance on such benchmarks is unknown and may differ from the base model.
- **Reasoning style, not reasoning ability.** This fine-tune teaches the model to *externalize* its reasoning in `<think>` blocks. It does not guarantee improved accuracy over the base model on any given task.
- **Distillation artifacts.** The reasoning traces were generated by Claude Opus 4.6. The model may reproduce stylistic patterns, phrasing, or reasoning structures characteristic of the teacher model.
- **Not safety-tuned beyond base.** This fine-tune does not add safety training beyond what exists in the base `gemma-4-E4B-it` model. Users should apply their own safety measures for production use.
- **English only.** Training data is predominantly English. Performance in other languages is untested.
- **Small model limitations.** At 4.5B effective parameters, the model has inherent capacity limits. Complex multi-step reasoning, nuanced analysis, and knowledge-intensive tasks may be unreliable.
- **No guarantees of factual accuracy.** Like all language models, this model can hallucinate, produce incorrect calculations, or generate plausible-sounding but wrong answers.

## Intended Use

- Research and experimentation with reasoning distillation techniques
- Exploring chain-of-thought behavior in smaller models
- Personal and educational projects requiring a lightweight reasoning model
- As a starting point for further fine-tuning

## Out of Scope

- Production systems requiring high reliability or factual accuracy
- Safety-critical applications (medical, legal, financial advice)
- Use cases requiring multilingual support
- Tasks requiring knowledge beyond the base model's training cutoff

## Acknowledgments

- **[Google](https://ai.google.dev/)** for the Gemma 4 model family
- **[Unsloth](https://github.com/unslothai/unsloth)** for efficient fine-tuning infrastructure
- **[nohurry](https://huggingface.co/nohurry)** for the curated Opus 4.6 Reasoning dataset
- **[Roman1111111](https://huggingface.co/Roman1111111)** for the Claude Opus 4.6 10K dataset
- **[AI-MO](https://huggingface.co/AI-MO)** for NuminaMath-CoT
- **[TIGER-Lab](https://huggingface.co/TIGER-Lab)** for MathInstruct

## License

This model inherits the [Gemma license](https://ai.google.dev/gemma/terms) from the base model. Please review and comply with Google's Gemma Terms of Use.