Instructions to use arvindcr4/tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use arvindcr4/tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2
LoRA adapters trained with GRPO on top of Qwen/Qwen3-8B-Base using the
Tinker cloud training service.
Part of the TinkerRL-Bench release for our NeurIPS submission
"A Unified Benchmark for RL Post-Training of Language Models"
(repo).
Training configuration
| Base model | Qwen/Qwen3-8B-Base |
| Experiment tag | w1_qwen3-8b-base |
| Campaign | bitter_lesson_v2 |
| Task | gsm8k |
| Seed | 42 |
| LoRA rank | 32 |
| Learning rate | 1e-05 |
| Group size | 8 |
| Training steps | 30 |
| Platform | Tinker (tinker) |
| Training run ID | 9f27c001-b92c-55a7-9e12-1a8bd858e16d |
Metrics
| Metric | Value |
|---|---|
| First-5 reward avg | 0.875 |
| Last-10 reward avg | 0.9875 |
| Peak reward | 1.0 |
Checkpoints in this repo
| Step | Original Tinker URI | Local path |
|---|---|---|
sampler_weights/final |
tinker://9f27c001-b92c-55a7-9e12-1a8bd858e16d:train:0/sampler_weights/final | final |
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "Qwen/Qwen3-8B-Base"
adapter = "arvindcr4/tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, subfolder="final") # or "<step>"
Companion releases
- Dataset:
arvindcr4/tinker-rl-bench-wandb— all 334 W&B runs + 9,255 history rows - Manifest:
arvindcr4/tinker-rl-bench-checkpoints— full catalogue of every Tinker URI - Code:
pes-llm-research/tinker-rl-lab
Citation
@misc{tinkerrlbench2026,
title = {A Unified Benchmark for RL Post-Training of Language Models},
author = {Arvind, C. R. and Jeyaraj, Sandhya},
year = {2026},
note = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab}
}
License
Apache 2.0. The underlying base model retains its original license —
please check Qwen/Qwen3-8B-Base for any usage restrictions.
- Downloads last month
- -
Model tree for arvindcr4/tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2
Base model
Qwen/Qwen3-8B-Base
Task type is invalid.