Jofthomas/hermes-function-calling-thinking-V1
Viewer • Updated • 3.57k • 603 • 78
How to use hotdogs/gemma4-26b-fc-thinking-reasoning-lora with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-26B-A4B-it")
model = PeftModel.from_pretrained(base_model, "hotdogs/gemma4-26b-fc-thinking-reasoning-lora")How to use hotdogs/gemma4-26b-fc-thinking-reasoning-lora with HERMES:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
LoRA adapter fine-tuned from google/gemma-4-26B-A4B-it on Hermes Function Calling Thinking V1 — 3,570 high-quality function-calling conversations with `<think>` reasoning blocks, trained by UKA (Hermes Agent) 🤖
| Detail | Value |
|---|---|
| Base Model | google/gemma-4-26B-A4B-it (26B MoE, 128 experts) |
| Dataset | Jofthomas/hermes-function-calling-thinking-V1 (3,570 examples) |
| Method | Custom NF4 per-expert quantization + LoRA |
| Pipeline | AndriejusNak/gemma4-26b-moe-finetune |
| GPU | NVIDIA RTX 5090 32GB (Vast.ai Cloud) |
| Training Time | 70 minutes (~1h 10m) |
| Best Loss | 0.5149 |
| NaN Explosions | 0 |
| Component | Specification |
|---|---|
| GPU | NVIDIA GeForce RTX 5090 32GB GDDR7 |
| CPU | Intel Core i7-14700K (28 cores) |
| RAM | 94 GB DDR5 |
| Disk | 200 GB NVMe SSD |
| Cloud | Vast.ai |
| PyTorch | 2.12.0.dev (nightly, cu128) |
# v6_26b_pipeline.py
MODEL_NAME = "google/gemma-4-26B-A4B-it"
MAX_SEQ_LENGTH = 1024
LORA_R = 32
LORA_ALPHA = 32
INCLUDE_MLP_LORA = True
SFT_EPOCHS = 2
SFT_BATCH_SIZE = 3
SFT_GRAD_ACCUM = 8 # Effective batch = 24
SFT_LR = 2e-5
SFT_FILES = ["data/hermes_fc_thinking.jsonl"]
q_proj, k_proj, v_proj, o_proj + gate_proj, up_proj, down_projStep 50: Loss 1.3242 (epoch 1)
Step 100: Loss 0.6698
→ Epoch 1 avg: 0.8476
Step 150: Loss 0.5368 (epoch 2)
Step 200: Loss 0.5244
Step 250: Loss 0.5012
→ Epoch 2 avg: 0.5149 🎯 Best!
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-26B-A4B-it",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(model, "hotdogs/gemma4-26b-fc-thinking-reasoning-lora")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")
messages = [
{"role": "system", "content": "You are a deep thinking AI. Use tools with <tools>...</tools> XML tags."},
{"role": "user", "content": "Calculate the factorial of 42 and search for its significance."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Dataset uses ShareGPT format with <think> blocks in model responses:
{
"conversations": [
{"role": "system", "content": "You are a function calling AI..."},
{"role": "human", "content": "..."},
{"role": "model", "content": "<think>\n...\n</think>\n\n[tool call]"},
{"role": "tool", "content": "..."},
{"role": "model", "content": "<think>\n...\n</think>\n\n[final answer]"}
]
}
| Adapter | Dataset | Examples | Loss | Time |
|---|---|---|---|---|
| Kimi K2 | Reasoning distill | 7.8K | 1.07 | 128 min |
| Claude Opus | Reasoning distill | 8.1K | 1.21 | 142 min |
| Hermes Tool | Tool-use | 10K | 0.54 | 346 min |
| FC-Thinking | Tool+Think | 3.6K | 0.51 | 70 min |
adapter_model.safetensors — LoRA weights (227 MB)
adapter_config.json — r=32, alpha=32
tokenizer.json — Gemma 4 tokenizer (31 MB)
v6_26b_pipeline.py — Training script
We're not able to determine the quantization variants.