Gemma4 26B MoE — Hermes Function Calling + Thinking LoRA 💭

LoRA adapter fine-tuned from google/gemma-4-26B-A4B-it on Hermes Function Calling Thinking V1 — 3,570 high-quality function-calling conversations with `<think>` reasoning blocks, trained by UKA (Hermes Agent) 🤖

📋 Summary

Detail	Value
Base Model	`google/gemma-4-26B-A4B-it` (26B MoE, 128 experts)
Dataset	`Jofthomas/hermes-function-calling-thinking-V1` (3,570 examples)
Method	Custom NF4 per-expert quantization + LoRA
Pipeline	AndriejusNak/gemma4-26b-moe-finetune
GPU	NVIDIA RTX 5090 32GB (Vast.ai Cloud)
Training Time	70 minutes (~1h 10m)
Best Loss	0.5149
NaN Explosions	0

🖥️ Hardware

Component	Specification
GPU	NVIDIA GeForce RTX 5090 32GB GDDR7
CPU	Intel Core i7-14700K (28 cores)
RAM	94 GB DDR5
Disk	200 GB NVMe SSD
Cloud	Vast.ai
PyTorch	2.12.0.dev (nightly, cu128)

🔧 Training Configuration

# v6_26b_pipeline.py
MODEL_NAME = "google/gemma-4-26B-A4B-it"
MAX_SEQ_LENGTH = 1024
LORA_R = 32
LORA_ALPHA = 32
INCLUDE_MLP_LORA = True
SFT_EPOCHS = 2
SFT_BATCH_SIZE = 3
SFT_GRAD_ACCUM = 8            # Effective batch = 24
SFT_LR = 2e-5
SFT_FILES = ["data/hermes_fc_thinking.jsonl"]

LoRA Details

Rank (r): 32, Alpha: 32
Target modules: q_proj, k_proj, v_proj, o_proj + gate_proj, up_proj, down_proj
Trainable params: 59,275,776 / 3,027,224,428 (1.96%)
Optimizer steps: 296, Forward passes: 2,368

Loss Progression

Step  50: Loss 1.3242  (epoch 1)
Step 100: Loss 0.6698
  → Epoch 1 avg: 0.8476
Step 150: Loss 0.5368  (epoch 2)
Step 200: Loss 0.5244
Step 250: Loss 0.5012
  → Epoch 2 avg: 0.5149 🎯 Best!

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, "hotdogs/gemma4-26b-fc-thinking-reasoning-lora")

tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")
messages = [
    {"role": "system", "content": "You are a deep thinking AI. Use tools with <tools>...</tools> XML tags."},
    {"role": "user", "content": "Calculate the factorial of 42 and search for its significance."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 Dataset Format

Dataset uses ShareGPT format with <think> blocks in model responses:

{
  "conversations": [
    {"role": "system", "content": "You are a function calling AI..."},
    {"role": "human", "content": "..."},
    {"role": "model", "content": "<think>\n...\n</think>\n\n[tool call]"},
    {"role": "tool", "content": "..."},
    {"role": "model", "content": "<think>\n...\n</think>\n\n[final answer]"}
  ]
}

📊 Comparison

Adapter	Dataset	Examples	Loss	Time
Kimi K2	Reasoning distill	7.8K	1.07	128 min
Claude Opus	Reasoning distill	8.1K	1.21	142 min
Hermes Tool	Tool-use	10K	0.54	346 min
FC-Thinking	Tool+Think	3.6K	0.51	70 min

📦 Files

adapter_model.safetensors   — LoRA weights (227 MB)
adapter_config.json         — r=32, alpha=32
tokenizer.json              — Gemma 4 tokenizer (31 MB)
v6_26b_pipeline.py          — Training script

🙏 Credits

Base Model: Google Gemma 4 26B
Dataset: Jofthomas/hermes-function-calling-thinking-V1
Pipeline: AndriejusNak/gemma4-26b-moe-finetune
Trainer: UKA (Hermes Agent)

Downloads last month: 113

GGUF

Model size

37.2M params

Architecture

gemma4

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for hotdogs/gemma4-26b-fc-thinking-reasoning-lora

Base model

google/gemma-4-26B-A4B

Finetuned

google/gemma-4-26B-A4B-it

Adapter

(36)

this model

hotdogs
/

gemma4-26b-fc-thinking-reasoning-lora