Instructions to use arvindcr4/tinker-rl-arch_gsm8k_kimi-k2-kimi-k2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use arvindcr4/tinker-rl-arch_gsm8k_kimi-k2-kimi-k2 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
tinker-rl-arch_gsm8k_kimi-k2-kimi-k2
LoRA adapters trained with GRPO on top of moonshotai/Kimi-K2-Thinking using the
Tinker cloud training service.
Part of the TinkerRL-Bench release for our NeurIPS submission
"A Unified Benchmark for RL Post-Training of Language Models"
(repo).
Training configuration
| Base model | moonshotai/Kimi-K2-Thinking |
| Experiment tag | arch_gsm8k_kimi-k2 |
| Campaign | None |
| Task | gsm8k |
| Seed | 42 |
| LoRA rank | 16 |
| Learning rate | 1e-05 |
| Group size | 4 |
| Training steps | 20 |
| Platform | Tinker (tinker) |
| Training run ID | 51a8ef9e-15ef-5f8f-bda1-78ee51387a12 |
Metrics
| Metric | Value |
|---|---|
| Last-10 reward avg | 0.8 |
| Peak reward | 1.0 |
Checkpoints in this repo
| Step | Original Tinker URI | Local path |
|---|---|---|
sampler_weights/final |
tinker://51a8ef9e-15ef-5f8f-bda1-78ee51387a12:train:0/sampler_weights/final | final |
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "moonshotai/Kimi-K2-Thinking"
adapter = "arvindcr4/tinker-rl-arch_gsm8k_kimi-k2-kimi-k2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, subfolder="final") # or "<step>"
Companion releases
- Dataset:
arvindcr4/tinker-rl-bench-wandb— all 334 W&B runs + 9,255 history rows - Manifest:
arvindcr4/tinker-rl-bench-checkpoints— full catalogue of every Tinker URI - Code:
pes-llm-research/tinker-rl-lab
Citation
@misc{tinkerrlbench2026,
title = {A Unified Benchmark for RL Post-Training of Language Models},
author = {Arvind, C. R. and Jeyaraj, Sandhya},
year = {2026},
note = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab}
}
License
Apache 2.0. The underlying base model retains its original license —
please check moonshotai/Kimi-K2-Thinking for any usage restrictions.
- Downloads last month
- -
Model tree for arvindcr4/tinker-rl-arch_gsm8k_kimi-k2-kimi-k2
Base model
moonshotai/Kimi-K2-Thinking