--- license: apache-2.0 library_name: peft base_model: Qwen/Qwen3-8B-Base tags: - lora - peft - tinker - grpo - reinforcement-learning - rlhf --- # tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2 LoRA adapters trained with **GRPO** on top of `Qwen/Qwen3-8B-Base` using the [Tinker](https://tinker-console.thinkingmachines.ai) cloud training service. Part of the TinkerRL-Bench release for our NeurIPS submission *"A Unified Benchmark for RL Post-Training of Language Models"* ([repo](https://github.com/pes-llm-research/tinker-rl-lab)). ## Training configuration | | | |---|---| | Base model | `Qwen/Qwen3-8B-Base` | | Experiment tag | `w1_qwen3-8b-base` | | Campaign | `bitter_lesson_v2` | | Task | `gsm8k` | | Seed | `42` | | LoRA rank | `32` | | Learning rate | `1e-05` | | Group size | `8` | | Training steps | `30` | | Platform | Tinker (`tinker`) | | Training run ID | `9f27c001-b92c-55a7-9e12-1a8bd858e16d` | ## Metrics | Metric | Value | |---|---| | First-5 reward avg | 0.875 | | Last-10 reward avg | 0.9875 | | Peak reward | 1.0 | ## Checkpoints in this repo | Step | Original Tinker URI | Local path | |---|---|---| | `sampler_weights/final` | [tinker://9f27c001-b92c-55a7-9e12-1a8bd858e16d:train:0/sampler_weights/final](tinker://9f27c001-b92c-55a7-9e12-1a8bd858e16d:train:0/sampler_weights/final) | `final` | ## How to load ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = "Qwen/Qwen3-8B-Base" adapter = "arvindcr4/tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2" tok = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto") model = PeftModel.from_pretrained(model, adapter, subfolder="final") # or "" ``` ## Companion releases - Dataset: [`arvindcr4/tinker-rl-bench-wandb`](https://huggingface.co/datasets/arvindcr4/tinker-rl-bench-wandb) — all 334 W&B runs + 9,255 history rows - Manifest: [`arvindcr4/tinker-rl-bench-checkpoints`](https://huggingface.co/datasets/arvindcr4/tinker-rl-bench-checkpoints) — full catalogue of every Tinker URI - Code: [`pes-llm-research/tinker-rl-lab`](https://github.com/pes-llm-research/tinker-rl-lab) ## Citation ```bibtex @misc{tinkerrlbench2026, title = {A Unified Benchmark for RL Post-Training of Language Models}, author = {Arvind, C. R. and Jeyaraj, Sandhya}, year = {2026}, note = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab} } ``` ## License Apache 2.0. The underlying base model retains its original license — please check `Qwen/Qwen3-8B-Base` for any usage restrictions.