--- library_name: transformers license: gemma base_model: google/gemma-4-E4B-it base_model_relation: finetune tags: - gemma4 - reasoning - chain-of-thought - distillation - lora - unsloth - fine-tuned - thinking datasets: - nohurry/Opus-4.6-Reasoning-3000x-filtered - Roman1111111/claude-opus-4.6-10000x - AI-MO/NuminaMath-CoT - TIGER-Lab/MathInstruct language: - en pipeline_tag: text-generation model-index: - name: gemma-4-e4b-opus-reasoning-v2 results: [] --- # Gemma 4 E4B — Opus Reasoning V2 A reasoning-enhanced fine-tune of [google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it), distilled from Claude Opus 4.6 reasoning traces with supplementary math Chain-of-Thought data. ## Model Details | | | |---|---| | **Base Model** | `google/gemma-4-E4B-it` (4.5B effective params, 8B with embeddings) | | **Architecture** | Dense transformer with Per-Layer Embeddings (PLE), 128K context | | **Fine-tuning Method** | LoRA via [Unsloth](https://github.com/unslothai/unsloth) | | **Precision** | Merged float16 | | **Training Hardware** | NVIDIA A100 80GB (RunPod) | | **Training Framework** | Unsloth + HuggingFace TRL (SFTTrainer) | ### LoRA Configuration | Parameter | Value | |---|---| | Rank (r) | 16 | | Alpha | 32 | | Dropout | 0 | | Bias | None | | Target Modules | Attention + MLP (language layers only) | ### Training Configuration | Parameter | Value | |---|---| | Epochs | 2 | | Learning Rate | 1e-4 (cosine schedule) | | Batch Size | 8 (2 per device × 4 gradient accumulation) | | Warmup Steps | 100 | | Optimizer | AdamW 8-bit | | Weight Decay | 0.01 | | Max Sequence Length | 4096 | | Response-only Training | Yes (user turns masked) | | Final Training Loss | ~0.54 | ## Training Data Around 20,000 samples combining reasoning distillation and math Chain-of-Thought data (~40% math content): | Dataset | Samples | Purpose | |---|---|---| | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Claude Opus 4.6 reasoning traces | | [Roman1111111/claude-opus-4.6-10000x](https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x) | 9,633 | Claude Opus 4.6 extended reasoning | | [AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 4,000 (sampled) | Math Chain-of-Thought solutions | | [TIGER-Lab/MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) | 4,000 (sampled) | Math CoT + Program-of-Thought | All assistant responses were formatted with `...` blocks to teach the model structured reasoning before answering. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "naazimsnh02/gemma-4-e4b-opus-reasoning-v2", torch_dtype="auto", device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/gemma-4-e4b-opus-reasoning-v2") messages = [{"role": "user", "content": "A train travels 60 km/h. How long does it take to cover 255 km?"}] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt", tokenize=True, return_dict=True, ).to(model.device) output = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64) print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) ``` ## Limitations & Disclaimers - **This is a reasoning-focused model, not a benchmark-optimized release.** It has not been evaluated on standard benchmarks (MMLU, GSM8K, HumanEval, etc.). Performance on such benchmarks is unknown and may differ from the base model. - **Reasoning style, not reasoning ability.** This fine-tune teaches the model to *externalize* its reasoning in `` blocks. It does not guarantee improved accuracy over the base model on any given task. - **Distillation artifacts.** The reasoning traces were generated by Claude Opus 4.6. The model may reproduce stylistic patterns, phrasing, or reasoning structures characteristic of the teacher model. - **Not safety-tuned beyond base.** This fine-tune does not add safety training beyond what exists in the base `gemma-4-E4B-it` model. Users should apply their own safety measures for production use. - **English only.** Training data is predominantly English. Performance in other languages is untested. - **Small model limitations.** At 4.5B effective parameters, the model has inherent capacity limits. Complex multi-step reasoning, nuanced analysis, and knowledge-intensive tasks may be unreliable. - **No guarantees of factual accuracy.** Like all language models, this model can hallucinate, produce incorrect calculations, or generate plausible-sounding but wrong answers. ## Intended Use - Research and experimentation with reasoning distillation techniques - Exploring chain-of-thought behavior in smaller models - Personal and educational projects requiring a lightweight reasoning model - As a starting point for further fine-tuning ## Out of Scope - Production systems requiring high reliability or factual accuracy - Safety-critical applications (medical, legal, financial advice) - Use cases requiring multilingual support - Tasks requiring knowledge beyond the base model's training cutoff ## Acknowledgments - **[Google](https://ai.google.dev/)** for the Gemma 4 model family - **[Unsloth](https://github.com/unslothai/unsloth)** for efficient fine-tuning infrastructure - **[nohurry](https://huggingface.co/nohurry)** for the curated Opus 4.6 Reasoning dataset - **[Roman1111111](https://huggingface.co/Roman1111111)** for the Claude Opus 4.6 10K dataset - **[AI-MO](https://huggingface.co/AI-MO)** for NuminaMath-CoT - **[TIGER-Lab](https://huggingface.co/TIGER-Lab)** for MathInstruct ## License This model inherits the [Gemma license](https://ai.google.dev/gemma/terms) from the base model. Please review and comply with Google's Gemma Terms of Use.