Instructions to use Zigeng/R1-VeriThinker-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Zigeng/R1-VeriThinker-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Zigeng/R1-VeriThinker-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Zigeng/R1-VeriThinker-7B") model = AutoModelForCausalLM.from_pretrained("Zigeng/R1-VeriThinker-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Zigeng/R1-VeriThinker-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Zigeng/R1-VeriThinker-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zigeng/R1-VeriThinker-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Zigeng/R1-VeriThinker-7B
- SGLang
How to use Zigeng/R1-VeriThinker-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Zigeng/R1-VeriThinker-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zigeng/R1-VeriThinker-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Zigeng/R1-VeriThinker-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zigeng/R1-VeriThinker-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Zigeng/R1-VeriThinker-7B with Docker Model Runner:
docker model run hf.co/Zigeng/R1-VeriThinker-7B
🔍 VeriThinker: Learning to Verify Makes Reasoning Model Efficient
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang
xML Lab, National University of Singapore
The key distinction between VeriThinker and traditional SFT or RL-based long-to-short methods. We uniquely train LRMs on an auxiliary CoT verification task, achieving effective CoT compression without relying on synthetic target reasoning chains.
| 💻 GitHub | Code Reposity |
| 📄 Paper | ArXiv-Link |
| 🤖 Model | R1-VeriThinker-7B |
| 📊 Data | CoT-Veirification-340k |
| 📄 Paper (🤗) | Hugging Face Paper |
💡 Introduction
We introduce VeriThinker, a novel approach for CoT compression. Unlike conventional methods that fine-tune LRMs directly on the original reasoning task using synthetic concise CoT data, we innovatively fine-tune the model solely through an auxiliary verification task. By training LRMs to accurately verify the correctness of CoT solutions, the LRMs inherently become more discerning about the necessity of subsequent self-reflection steps, thereby effectively suppressing overthinking. Extensive experiments validate that VeriThinker substantially reduces reasoning chain lengths while maintaining or even slightly improving accuracy. When applied to DeepSeek-R1-Distill-Qwen-7B, our approach reduces reasoning tokens on MATH500 from 3790 to 2125 while improving accuracy by 0.8% (94.0% to 94.8%), and on AIME25, tokens decrease from 14321 to 10287 with a 2.1% accuracy gain (38.7% to 40.8%). Additionally, our experiments demonstrate that VeriThinker can also be zero-shot generalized to speculative reasoning to boost throughput.
🚀 Quick Start:
Reasoning Task:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Zigeng/R1-VeriThinker-7B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Given a=4, b=7, please help me calculate a*2+b*3+(a+b)^2."
tail = r" Please reason step by step, and put your final answer within \boxed{}."
messages = [
{"role": "user", "content": prompt + tail}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384,
temperature=0.6,
top_p=0.95,
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Correctness Veirification Task:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Zigeng/R1-VeriThinker-7B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt_part_1 = "## Instruction:
You will be provided with a question along with a proposed solution. Please carefully verify each step of the solution, tell me if every step is absolutely correct.
"
prompt_part_2 = """## Question:
{} Please reason step by step, and put your final answer within \\boxed{}.
## Proposed Solution:
{}
"""
question = "Given a=4, b=7, please help me calculate a*2+b*3+(a+b)^2."
cot_solution = """First, let's parse and calculate the given expression a*2+b*3+(a + b)^2 step by step, where a=4 and b=7.
Calculate a * 2: 4 * 2 = 8
Calculate b * 3: 7 * 3 = 21
Calculate (a + b)^2: (4 + 7)^2 = 11^2 = 121
Add the above results: 8 + 21 + 121 = 150
The final result is \\boxed{150}.
"""
prompt = prompt_part_1+prompt_part_2.format(question,"{}", cot_solution)
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=64,
do_sample=False
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
📖 Experimental Results
CoT Compression Results:
CoT Correctness Verification Results:
Speculative Reasoning Results:
Speculative reasoning results on three reasoning models. When using Qwen-2.5-Math-Instruct-7B as the draft model, most problems in MATH500 and GSM8K can be solved with short CoT model, while only a few (around 10%) require activation of the long CoT model for more complex solutions.

Citation
If our research assists your work, please give us a star ⭐ or cite us using:
- Downloads last month
- 42
Model tree for Zigeng/R1-VeriThinker-7B
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
