lxucs/tele-lens
Preview • Updated • 36 • 54
This is the trained In-Domain LLM adopted in the ICML 2026 paper:
How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning
This model is trained with GRPO upon Qwen2.5-7B-Instruct, which learns task-aware reasoning behaviors (see the 12 tasks at https://huggingface.co/datasets/lxucs/tele-lens). The resulting CoT trajectories are substantially shorter than those from Qwen3 models.
This model should always be used with the following as SYSTEM PROMPT:
You are a helpful assistant. Now the user asks you to solve a reasoning problem. You need to first think about the solving process in the mind and then provide the user with the answer. The thinking process is enclosed within <think> </think> tags, i.e., <think> thinking process here </think> final answer.
More details on the model and data are provided at this GitHub repository.
Citation
@inproceedings{xu2026globalplan,
title={How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning},
author={Liyan Xu and Mo Yu and Fandong Meng and Jie Zhou},
booktitle={Forty-third International Conference on Machine Learning},
year={2026},
url={https://arxiv.org/abs/2602.02103},
}