AngLv
/

NoisyRewards-in-RL-RM-acc-65

Text Classification

Model card Files Files and versions

File information

The repository contains the following file information:

The model was presented in the paper The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason.

Github code: https://github.com/trestad/Noisy-Rewards-in-Learning-to-Reason

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

BF16

·

Paper for AngLv/NoisyRewards-in-RL-RM-acc-65

The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason

Paper • 2505.22653 • Published May 28, 2025 • 43