🛠️ LLM Tool Call Fine-Tuning — SFT + GRPO

Fine-tuning Qwen 2.5-1.5B to make structured JSON tool calls using SFT and Reinforcement Learning (GRPO).

🧪 Experiment

Train a small LLM to respond with structured tool calls instead of plain text:

{"name": "get_weather", "arguments": {"location": "Paris"}}

Three steps:

SFT — teach the model using 500 real examples from glaive-function-calling-v2
GRPO — improve using reward functions (no labeled answers needed)
Eval — compare SFT vs GRPO side by side on 12 test queries

📊 Results

Metric	SFT	GRPO	Winner
JSON Valid	0%	92%	GRPO ✅
Correct Tool	0%	50%	GRPO ✅
Has Arguments	0%	42%	GRPO ✅
Clean Output	0%	92%	GRPO ✅
Avg Quality Score	0.0	0.59	GRPO ✅

Key finding: SFT model answered questions directly in plain text (never used tools). GRPO model learned to always respond with structured JSON tool calls. Tested on 12 queries across weather, calculator, search, stocks, translation and unit conversion.

🚀 Run on Colab

!pip install -q transformers datasets peft trl accelerate bitsandbytes
!python EXP_STEP1_sft.py    # ~20 min
!python EXP_STEP2_grpo.py   # ~30 min
!python EXP_STEP3_compare.py

📁 Files

File	Description
`EXP_STEP1_sft.py`	SFT training (Colab)
`EXP_STEP2_grpo.py`	GRPO RL training (Colab)
`EXP_STEP3_compare.py`	Evaluation + comparison

🔧 Stack

Qwen2.5-1.5B • QLoRA • GRPO • glaive-function-calling-v2 • trl • peft

📎 References

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Balasandhya/llm-tool-call-grpo-lora-Qwen1.5B 1

Papers for Balasandhya/llm-tool-call-grpo-lora-Qwen1.5B

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 147

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 61