Instructions to use arvindcr4/tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use arvindcr4/tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b
LoRA adapters trained with GRPO on top of nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 using the
Tinker cloud training service.
Part of the TinkerRL-Bench release for our NeurIPS submission
"A Unified Benchmark for RL Post-Training of Language Models"
(repo).
Training configuration
| Base model | nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 |
| Experiment tag | frontier_gsm8k_nemotron-120b |
| Campaign | None |
| Task | gsm8k |
| Seed | 42 |
| LoRA rank | 16 |
| Learning rate | 1e-05 |
| Group size | 4 |
| Training steps | 20 |
| Platform | Tinker (tinker) |
| Training run ID | 657a920a-9e74-55d2-9354-71a6ec2f1f61 |
Metrics
| Metric | Value |
|---|---|
| First-5 reward avg | 0.175 |
| Last-10 reward avg | 0.1625 |
| Peak reward | 0.875 |
| Peak accuracy | 0.875 |
| Last-10 accuracy | 0.1625 |
Checkpoints in this repo
| Step | Original Tinker URI | Local path |
|---|---|---|
sampler_weights/final |
tinker://657a920a-9e74-55d2-9354-71a6ec2f1f61:train:0/sampler_weights/final | final |
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
adapter = "arvindcr4/tinker-rl-frontier_gsm8k_nemotron-120b-nemotron-120b"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, subfolder="final") # or "<step>"
Companion releases
- Dataset:
arvindcr4/tinker-rl-bench-wandb— all 334 W&B runs + 9,255 history rows - Manifest:
arvindcr4/tinker-rl-bench-checkpoints— full catalogue of every Tinker URI - Code:
pes-llm-research/tinker-rl-lab
Citation
@misc{tinkerrlbench2026,
title = {A Unified Benchmark for RL Post-Training of Language Models},
author = {Arvind, C. R. and Jeyaraj, Sandhya},
year = {2026},
note = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab}
}
License
Apache 2.0. The underlying base model retains its original license —
please check nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 for any usage restrictions.
- Downloads last month
- -