ThinkTank PRM — Process Reward Model for Reasoning Efficiency

A reward model that scores reasoning steps as useful or wasteful.

Trained on crowdsourced human judgments from ThinkTank, a Game With A Purpose where players identify wasteful steps in AI reasoning chains.

Results

Metric	Value
Pairwise accuracy	95.7%
Eval loss	0.071
Training pairs	92
Eval pairs	23
Training time	105 seconds

Usage

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel

# Load
tokenizer = AutoTokenizer.from_pretrained("vanthienha199/thinktank-prm-qwen2.5-0.5b")
base = AutoModelForSequenceClassification.from_pretrained("Qwen/Qwen2.5-0.5B", num_labels=1)
model = PeftModel.from_pretrained(base, "vanthienha199/thinktank-prm-qwen2.5-0.5b")
model.eval()

# Score a reasoning step
text = "Question: What is 25% of 200?\n\nReasoning step (step 3, calculation): 25% = 0.25. 0.25 * 200 = 50."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    score = model(**inputs).logits.item()

print(f"Score: {score:.3f}")  # Positive = useful, negative = wasteful

Example Scores

Step Type	Content	Score	Label
thinking	"I need to find 25% of 200..."	-0.33	WASTEFUL
calculation	"25/100 = 0.25. 0.25 * 200 = 50"	+3.21	USEFUL
conclusion	"The answer is 50"	+3.25	USEFUL
verification	"Let me double-check: 200/4 = 50"	+1.08	USEFUL

Training Details

Base model: Qwen/Qwen2.5-0.5B
Method: LoRA (r=16, alpha=32, dropout=0.1)
Target modules: q_proj, v_proj + score head
Epochs: 5
Learning rate: 1e-4
Hardware: Apple M4 (MPS), 105 seconds total

The Pipeline

ThinkTank GWAP (19 users, 206 judgments)
    → Consensus labels (165 steps)
    → Reward pairs (115 chosen/rejected)
    → This PRM (95.7% accuracy)
    → Score any LLM reasoning chain

Citation

@misc{thinktank-prm-2026,
  title={ThinkTank PRM: A Process Reward Model Trained on Crowdsourced Reasoning Labels},
  author={Ha Le},
  year={2026},
  url={https://huggingface.co/vanthienha199/thinktank-prm-qwen2.5-0.5b}
}

Downloads last month: 1

Model tree for vanthienha199/thinktank-prm-qwen2.5-0.5b

Base model

Qwen/Qwen2.5-0.5B

Adapter

(407)

this model

vanthienha199
/

thinktank-prm-qwen2.5-0.5b