This is the trained In-Domain LLM adopted in the ICML 2026 paper:

How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning

This model is trained with GRPO upon Qwen2.5-7B-Instruct, which learns task-aware reasoning behaviors (see the 12 tasks at https://huggingface.co/datasets/lxucs/tele-lens). The resulting CoT trajectories are substantially shorter than those from Qwen3 models.

This model should always be used with the following as SYSTEM PROMPT:

You are a helpful assistant. Now the user asks you to solve a reasoning problem. You need to first think about the solving process in the mind and then provide the user with the answer. The thinking process is enclosed within <think> </think> tags, i.e., <think> thinking process here </think> final answer.

More details on the model and data are provided at this GitHub repository.

Citation

@inproceedings{xu2026globalplan,
      title={How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning}, 
      author={Liyan Xu and Mo Yu and Fandong Meng and Jie Zhou},
      booktitle={Forty-third International Conference on Machine Learning},
      year={2026},
      url={https://arxiv.org/abs/2602.02103}, 
}