openai/gsm8k
Benchmark • Updated • 17.6k • 958k • 1.33k
This model is a VERL (Volcano Engine Reinforcement Learning for LLMs) fine-tuned version of Qwen2.5-0.5B-Instruct on the GSM8K mathematical reasoning dataset using PPO.
from transformers import AutoTokenizer
# Note: This repository contains the tokenizer and config
# Model weights are in VERL/FSDP format and need conversion
tokenizer = AutoTokenizer.from_pretrained("karthik/verl-qwen2.5-0.5b-gsm8k-ppo-step360")
# For the full model, you would need to convert the VERL checkpoint:
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
# # Then load the VERL checkpoint weights
This model was trained using the VERL framework with:
Shows significant improvement in mathematical reasoning:
config.json - Model configurationtokenizer.json - Tokenizertokenizer_config.json - Tokenizer configuration vocab.json - Vocabularymerges.txt - BPE mergesspecial_tokens_map.json - Special tokenschat_template.jinja - Chat templateThe actual model weights are stored in VERL/FSDP format and would need conversion for direct use with transformers. This repository provides the tokenizer and configuration for reference.
@misc{verl-qwen-gsm8k,
title={VERL Fine-tuned Qwen2.5-0.5B on GSM8K},
author={karthik},
year={2024},
howpublished={\url{https://huggingface.co/karthik/verl-qwen2.5-0.5b-gsm8k-ppo-step360}},
}
Trained using VERL - Versatile Reinforcement Learning framework.