---
license: apache-2.0
library_name: peft
base_model: Qwen/Qwen3-8B-Base
tags:
  - lora
  - peft
  - tinker
  - grpo
  - reinforcement-learning
  - rlhf
---

# tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2

LoRA adapters trained with **GRPO** on top of `Qwen/Qwen3-8B-Base` using the
[Tinker](https://tinker-console.thinkingmachines.ai) cloud training service.
Part of the TinkerRL-Bench release for our NeurIPS submission
*"A Unified Benchmark for RL Post-Training of Language Models"*
([repo](https://github.com/pes-llm-research/tinker-rl-lab)).

## Training configuration

| | |
|---|---|
| Base model | `Qwen/Qwen3-8B-Base` |
| Experiment tag | `w1_qwen3-8b-base` |
| Campaign | `bitter_lesson_v2` |
| Task | `gsm8k` |
| Seed | `42` |
| LoRA rank | `32` |
| Learning rate | `1e-05` |
| Group size | `8` |
| Training steps | `30` |
| Platform | Tinker (`tinker`) |
| Training run ID | `9f27c001-b92c-55a7-9e12-1a8bd858e16d` |

## Metrics

| Metric | Value |
|---|---|
| First-5 reward avg | 0.875 |
| Last-10 reward avg | 0.9875 |
| Peak reward | 1.0 |


## Checkpoints in this repo

| Step | Original Tinker URI | Local path |
|---|---|---|
| `sampler_weights/final` | [tinker://9f27c001-b92c-55a7-9e12-1a8bd858e16d:train:0/sampler_weights/final](tinker://9f27c001-b92c-55a7-9e12-1a8bd858e16d:train:0/sampler_weights/final) | `final` |


## How to load

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen3-8B-Base"
adapter = "arvindcr4/tinker-rl-w1_qwen3-8b-base-qwen3-8b-base-s42-run2"

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, subfolder="final")  # or "<step>"
```

## Companion releases

- Dataset: [`arvindcr4/tinker-rl-bench-wandb`](https://huggingface.co/datasets/arvindcr4/tinker-rl-bench-wandb) — all 334 W&B runs + 9,255 history rows
- Manifest: [`arvindcr4/tinker-rl-bench-checkpoints`](https://huggingface.co/datasets/arvindcr4/tinker-rl-bench-checkpoints) — full catalogue of every Tinker URI
- Code: [`pes-llm-research/tinker-rl-lab`](https://github.com/pes-llm-research/tinker-rl-lab)

## Citation

```bibtex
@misc{tinkerrlbench2026,
  title   = {A Unified Benchmark for RL Post-Training of Language Models},
  author  = {Arvind, C. R. and Jeyaraj, Sandhya},
  year    = {2026},
  note    = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab}
}
```

## License

Apache 2.0. The underlying base model retains its original license —
please check `Qwen/Qwen3-8B-Base` for any usage restrictions.