-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
Collections
Discover the best community collections!
Collections including paper arxiv:2511.08567
-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 96 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 662 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 36 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 49 -
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Paper • 2510.04849 • Published • 117
-
NousResearch/Hermes-4-70B-FP8
Text Generation • 71B • Updated • 47.1k • 29 -
NousResearch/Hermes-4-405B-FP8
Text Generation • 406B • Updated • 356 • 29 -
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Paper • 2508.21365 • Published • 29 -
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper • 2509.15566 • Published • 14
-
Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 14 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Paper • 2509.10396 • Published • 16 -
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 36
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 36 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 49 -
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Paper • 2510.04849 • Published • 117
-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 96 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
NousResearch/Hermes-4-70B-FP8
Text Generation • 71B • Updated • 47.1k • 29 -
NousResearch/Hermes-4-405B-FP8
Text Generation • 406B • Updated • 356 • 29 -
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Paper • 2508.21365 • Published • 29 -
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper • 2509.15566 • Published • 14
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 662 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 14 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Paper • 2509.10396 • Published • 16 -
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 36