YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ› οΈ LLM Tool Call Fine-Tuning β€” SFT + GRPO

Fine-tuning Qwen 2.5-1.5B to make structured JSON tool calls using SFT and Reinforcement Learning (GRPO).


πŸ§ͺ Experiment

Train a small LLM to respond with structured tool calls instead of plain text:

{"name": "get_weather", "arguments": {"location": "Paris"}}

Three steps:

  1. SFT β€” teach the model using 500 real examples from glaive-function-calling-v2
  2. GRPO β€” improve using reward functions (no labeled answers needed)
  3. Eval β€” compare SFT vs GRPO side by side on 12 test queries

πŸ“Š Results

Metric SFT GRPO Winner
JSON Valid 0% 92% GRPO βœ…
Correct Tool 0% 50% GRPO βœ…
Has Arguments 0% 42% GRPO βœ…
Clean Output 0% 92% GRPO βœ…
Avg Quality Score 0.0 0.59 GRPO βœ…

Key finding: SFT model answered questions directly in plain text (never used tools). GRPO model learned to always respond with structured JSON tool calls. Tested on 12 queries across weather, calculator, search, stocks, translation and unit conversion.


πŸš€ Run on Colab

!pip install -q transformers datasets peft trl accelerate bitsandbytes
!python EXP_STEP1_sft.py    # ~20 min
!python EXP_STEP2_grpo.py   # ~30 min
!python EXP_STEP3_compare.py

πŸ“ Files

File Description
EXP_STEP1_sft.py SFT training (Colab)
EXP_STEP2_grpo.py GRPO RL training (Colab)
EXP_STEP3_compare.py Evaluation + comparison

πŸ”§ Stack

Qwen2.5-1.5B β€’ QLoRA β€’ GRPO β€’ glaive-function-calling-v2 β€’ trl β€’ peft


πŸ“Ž References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Balasandhya/llm-tool-call-grpo-lora-Qwen1.5B 1

Papers for Balasandhya/llm-tool-call-grpo-lora-Qwen1.5B