Gemma4 26B MoE — Hermes Function Calling + Thinking LoRA 💭

LoRA adapter fine-tuned from google/gemma-4-26B-A4B-it on Hermes Function Calling Thinking V1 — 3,570 high-quality function-calling conversations with `<think>` reasoning blocks, trained by UKA (Hermes Agent) 🤖

📋 Summary

Detail Value
Base Model google/gemma-4-26B-A4B-it (26B MoE, 128 experts)
Dataset Jofthomas/hermes-function-calling-thinking-V1 (3,570 examples)
Method Custom NF4 per-expert quantization + LoRA
Pipeline AndriejusNak/gemma4-26b-moe-finetune
GPU NVIDIA RTX 5090 32GB (Vast.ai Cloud)
Training Time 70 minutes (~1h 10m)
Best Loss 0.5149
NaN Explosions 0

🖥️ Hardware

Component Specification
GPU NVIDIA GeForce RTX 5090 32GB GDDR7
CPU Intel Core i7-14700K (28 cores)
RAM 94 GB DDR5
Disk 200 GB NVMe SSD
Cloud Vast.ai
PyTorch 2.12.0.dev (nightly, cu128)

🔧 Training Configuration

# v6_26b_pipeline.py
MODEL_NAME = "google/gemma-4-26B-A4B-it"
MAX_SEQ_LENGTH = 1024
LORA_R = 32
LORA_ALPHA = 32
INCLUDE_MLP_LORA = True
SFT_EPOCHS = 2
SFT_BATCH_SIZE = 3
SFT_GRAD_ACCUM = 8            # Effective batch = 24
SFT_LR = 2e-5
SFT_FILES = ["data/hermes_fc_thinking.jsonl"]

LoRA Details

  • Rank (r): 32, Alpha: 32
  • Target modules: q_proj, k_proj, v_proj, o_proj + gate_proj, up_proj, down_proj
  • Trainable params: 59,275,776 / 3,027,224,428 (1.96%)
  • Optimizer steps: 296, Forward passes: 2,368

Loss Progression

Step  50: Loss 1.3242  (epoch 1)
Step 100: Loss 0.6698
  → Epoch 1 avg: 0.8476
Step 150: Loss 0.5368  (epoch 2)
Step 200: Loss 0.5244
Step 250: Loss 0.5012
  → Epoch 2 avg: 0.5149 🎯 Best!

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, "hotdogs/gemma4-26b-fc-thinking-reasoning-lora")

tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")
messages = [
    {"role": "system", "content": "You are a deep thinking AI. Use tools with <tools>...</tools> XML tags."},
    {"role": "user", "content": "Calculate the factorial of 42 and search for its significance."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 Dataset Format

Dataset uses ShareGPT format with <think> blocks in model responses:

{
  "conversations": [
    {"role": "system", "content": "You are a function calling AI..."},
    {"role": "human", "content": "..."},
    {"role": "model", "content": "<think>\n...\n</think>\n\n[tool call]"},
    {"role": "tool", "content": "..."},
    {"role": "model", "content": "<think>\n...\n</think>\n\n[final answer]"}
  ]
}

📊 Comparison

Adapter Dataset Examples Loss Time
Kimi K2 Reasoning distill 7.8K 1.07 128 min
Claude Opus Reasoning distill 8.1K 1.21 142 min
Hermes Tool Tool-use 10K 0.54 346 min
FC-Thinking Tool+Think 3.6K 0.51 70 min

📦 Files

adapter_model.safetensors   — LoRA weights (227 MB)
adapter_config.json         — r=32, alpha=32
tokenizer.json              — Gemma 4 tokenizer (31 MB)
v6_26b_pipeline.py          — Training script

🙏 Credits

  • Base Model: Google Gemma 4 26B
  • Dataset: Jofthomas/hermes-function-calling-thinking-V1
  • Pipeline: AndriejusNak/gemma4-26b-moe-finetune
  • Trainer: UKA (Hermes Agent)
Downloads last month
113
GGUF
Model size
37.2M params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hotdogs/gemma4-26b-fc-thinking-reasoning-lora

Adapter
(36)
this model

Dataset used to train hotdogs/gemma4-26b-fc-thinking-reasoning-lora