BigDong commited on
Commit
81f7f25
·
1 Parent(s): b282ae7

update README

Browse files
Files changed (2) hide show
  1. README-cn.md +34 -31
  2. README.md +31 -28
README-cn.md CHANGED
@@ -22,17 +22,21 @@ datasets:
22
  ---
23
 
24
  <div align="center">
25
- <img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm_logo.png" width="500em" />
26
  </div>
27
 
28
  <p align="center">
29
- <a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM 论文</a> |
30
- <a href="https://github.com/OpenBMB/MiniCPM/tree/minicpm5" target="_blank">GitHub 仓库</a> |
31
- <a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README.md" target="_blank">English</a> |
32
  <a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
33
- <a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM 桌宠</a>
 
34
  </p>
35
 
 
 
 
 
36
 
37
  ## 亮点
38
 
@@ -40,17 +44,13 @@ datasets:
40
 
41
  🏆 **同尺寸开源模型 SOTA**:与同尺寸优秀开源模型相比,MiniCPM5-1B 在该对比范围内达到 SOTA 水平,优势主要体现在 Agentic 工具调用、代码生成和高难推理。
42
 
43
- ![MiniCPM5-1B 各领域能力对比](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/public_leaderboard_radar_cn.png)
44
 
45
  🧠 **双模式推理**:内置 `<think>` chat template,可通过 `enable_thinking` 在思考模式和非思考模式之间切换。同一份权重既可以作为快速助手,也可以承担更复杂的推理任务。
46
 
47
  🛠️ **部署 / 微调资源**:MiniCPM GitHub 仓库提供面向主要推理后端和微调框架的单页 cookbook,并配套 Agent Skills,方便复现部署和微调流程。
48
 
49
- 🐱 **桌宠**:我们也提供了由 MiniCPM5-1B 本地驱动的桌宠应用。点击下方封面可打开演示视频。
50
-
51
- <a href="https://youtu.be/UXtUccouXGY"><img src="https://img.youtube.com/vi/UXtUccouXGY/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
52
-
53
- **项目仓库**: [OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)
54
 
55
  ## 模型列表
56
 
@@ -68,8 +68,8 @@ MiniCPM5-1B 具有以下特性:
68
 
69
  - **类型**:Causal Language Model
70
  - **架构**:标准 `LlamaForCausalLM`
71
- - **Number of Parameters**: 1,080,632,832
72
- - **Number of Non-Embedding Parameters**: 679,552,512
73
  - **层数**:24
74
  - **注意力头(GQA)**:16 个 Q heads / 2 个 KV heads
75
  - **上下文长度**:131,072
@@ -82,7 +82,7 @@ MiniCPM5-1B 是 MiniCPM5 系列的首个模型,面向本地助手、coding age
82
 
83
  我们选取 **LFM2.5-1.2B-Thinking**、**Qwen3-0.6B/think**、**Qwen3.5-0.8B/think** 等同尺寸优秀开源模型进行横向比较。这些模型本身已经很强;在这组对比中,MiniCPM5-1B 达到同尺寸开源模型 SOTA 水平,优势主要体现在工具调用、代码生成和高难推理上,也更适合承担本地 coding agent、工具助手和推理助手的角色。
84
 
85
- ![MiniCPM-5 1B 基准评测成绩](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/public_leaderboard_cn.png)
86
 
87
  ## 训练流程
88
 
@@ -92,7 +92,7 @@ MiniCPM5-1B 的训练过程是 **[UltraData 分级数据管理体系](https://ar
92
 
93
  **后训练阶段**分为 **SFT**、**RL** 与 **OPD** 三步。我们先使用 **200B tokens deep-thinking SFT** 与 **200B tokens hybrid-thinking SFT** 建立深度思考、混合思考和通用对话能力,相关 SFT 数据已同步开源为 [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605)。随后针对数学、代码、闭卷问答和写作等方向训练专用 **RL teacher**,并通过 **On-Policy Distillation (OPD)** 将这些 teacher 的能力蒸馏回同一个发布模型。
94
 
95
- ![MiniCPM5-1B 训练流程](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/training_recipe.png)
96
 
97
  ### RL + OPD 带来了什么?
98
 
@@ -100,13 +100,13 @@ MiniCPM5-1B 的训练过程是 **[UltraData 分级数据管理体系](https://ar
100
 
101
  **RL** 阶段组合了推理、闭卷问答、写作、指令跟随、长上下文理解和通用对话等多类互补训练信号。Reasoning RL 基于 [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) 采用两阶段长度调度,逐步降低超长率并提升推理准确率。我们还使用 [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa)、[NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open)、[LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData)、合成可验证 RLVR 数据与 pair-wise RLHF 信号,提升可靠性、指令跟随和用户体验。
102
 
103
- ![MiniCPM5-1B RL 两阶段流程](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/rl_two_stage_overview.png)
104
 
105
  **OPD** 阶段参考 Thinking Machines Lab 的 [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) 思路,并结合 [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016) 做了实现改进。我们在强化学习框架中使用反向 KL 散度作为优势估计值,替代原有的 verification-based advantage;同时在 response 序列的每个位置分别对学生模型和教师模型 logits 做双边 top-k 采样,取并集后计算反向 KL 散度,以平衡监督信号准确性和训练效率。OPD 直接复用各 RL teacher 训练时的同分布 prompt 作为蒸馏数据,无需额外构造语料。
106
 
107
- ![MiniCPM5-1B RL + OPD 增益](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/rl_gains.png)
108
 
109
- ![MiniCPM5-1B RL + OPD 超长率下降](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/rl_overlong.png)
110
 
111
  ## 快速上手
112
 
@@ -185,7 +185,7 @@ print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
185
 
186
  ## 工具调用
187
 
188
- 工具调用 / function calling **推荐使用 SGLang**。MiniCPM5-1B 以 XML 格式产出工具调用,SGLang 内置的 `minicpm5` parser 会自动将其转换为 OpenAI 兼容的 `tool_calls` 字段。
189
 
190
  ```bash
191
  python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000 \
@@ -200,28 +200,31 @@ MiniCPM5-1B 使用**标准 `LlamaForCausalLM` 架构**,主流推理引擎可
200
 
201
  | 后端 | 模型格式 / 适用场景 | Cookbook | Agent Skill |
202
  | --- | --- | --- | --- |
203
- | Transformers | BF16 / FP16,本地 Python 推理,GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-transformers/SKILL.md) |
204
- | vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/vllm.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-vllm/SKILL.md) |
205
- | SGLang | BF16 / FP16 OpenAI server,推荐用于 tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-sglang/SKILL.md) |
206
- | llama.cpp | GGUF,CPU/GPU 本地推理 | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
207
- | Ollama | GGUF,本地端侧运行 | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-ollama/SKILL.md) |
208
- | LM Studio | GGUF,Mac 桌面应用与 OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-lmstudio/SKILL.md) |
209
- | MLX | MLX / 4bit,Apple Silicon 本地推理 | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-mlx/SKILL.md) |
 
210
 
211
  ### 微调
212
 
213
  | 框架 | 适用场景 | Cookbook | Agent Skill |
214
  | --- | --- | --- | --- |
215
- | TRL + PEFT | LoRA / SFT 微调 | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-trl/SKILL.md) |
216
- | LLaMA-Factory | 微调 | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-llamafactory/SKILL.md) |
217
- | ms-swift | 微调 | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-ms-swift/SKILL.md) |
218
- | unsloth | 微调 | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-unsloth/SKILL.md) |
219
- | xtuner | 微调 | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-xtuner/SKILL.md) |
220
 
221
  ## 桌宠
222
 
223
  我们也发布了 **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**,一个由 MiniCPM5-1B 本地驱动的桌宠应用。它支持 Apple Silicon / NVIDIA GPU / CPU 路线,可以与 Cursor、Claude Code、Codex 等 coding agent 联动,并支持 LoRA 人格切换。
224
 
 
 
225
  ## 局限性与负责任使用
226
 
227
  MiniCPM5-1B 是一个基于训练数据统计规律生成文本的语言模型,可能生成不准确、有偏见或不安全的内容。在高风险场景中使用前,应对模型输出进行审查和验证。
 
22
  ---
23
 
24
  <div align="center">
25
+ <img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png" width="500em" />
26
  </div>
27
 
28
  <p align="center">
29
+ <a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM 技术报告</a> |
30
+ <a href="https://github.com/OpenBMB/MiniCPM" target="_blank">GitHub 仓库</a> |
 
31
  <a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
32
+ <a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM 桌宠</a> |
33
+ <a href="https://huggingface.co/spaces/openbmb/MiniCPM5-1B-Demo" target="_blank">在线 Demo</a>
34
  </p>
35
 
36
+ <p align="center">
37
+ <a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README.md" target="_blank">English</a> |
38
+ 中文
39
+ </p>
40
 
41
  ## 亮点
42
 
 
44
 
45
  🏆 **同尺寸开源模型 SOTA**:与同尺寸优秀开源模型相比,MiniCPM5-1B 在该对比范围内达到 SOTA 水平,优势主要体现在 Agentic 工具调用、代码生成和高难推理。
46
 
47
+ ![MiniCPM5-1B 各领域能力对比](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/public_leaderboard_radar_cn.png)
48
 
49
  🧠 **双模式推理**:内置 `<think>` chat template,可通过 `enable_thinking` 在思考模式和非思考模式之间切换。同一份权重既可以作为快速助手,也可以承担更复杂的推理任务。
50
 
51
  🛠️ **部署 / 微调资源**:MiniCPM GitHub 仓库提供面向主要推理后端和微调框架的单页 cookbook,并配套 Agent Skills,方便复现部署和微调流程。
52
 
53
+ 🐱 **桌宠**:我们也提供了由 MiniCPM5-1B 本地驱动的桌宠应用。
 
 
 
 
54
 
55
  ## 模型列表
56
 
 
68
 
69
  - **类型**:Causal Language Model
70
  - **架构**:标准 `LlamaForCausalLM`
71
+ - **参数数量**1,080,632,832
72
+ - **非嵌入参数数量**679,552,512
73
  - **层数**:24
74
  - **注意力头(GQA)**:16 个 Q heads / 2 个 KV heads
75
  - **上下文长度**:131,072
 
82
 
83
  我们选取 **LFM2.5-1.2B-Thinking**、**Qwen3-0.6B/think**、**Qwen3.5-0.8B/think** 等同尺寸优秀开源模型进行横向比较。这些模型本身已经很强;在这组对比中,MiniCPM5-1B 达到同尺寸开源模型 SOTA 水平,优势主要体现在工具调用、代码生成和高难推理上,也更适合承担本地 coding agent、工具助手和推理助手的角色。
84
 
85
+ ![MiniCPM-5 1B 基准评测成绩](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/public_leaderboard_cn.png)
86
 
87
  ## 训练流程
88
 
 
92
 
93
  **后训练阶段**分为 **SFT**、**RL** 与 **OPD** 三步。我们先使用 **200B tokens deep-thinking SFT** 与 **200B tokens hybrid-thinking SFT** 建立深度思考、混合思考和通用对话能力,相关 SFT 数据已同步开源为 [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605)。随后针对数学、代码、闭卷问答和写作等方向训练专用 **RL teacher**,并通过 **On-Policy Distillation (OPD)** 将这些 teacher 的能力蒸馏回同一个发布模型。
94
 
95
+ ![MiniCPM5-1B 训练流程](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/training_recipe.png)
96
 
97
  ### RL + OPD 带来了什么?
98
 
 
100
 
101
  **RL** 阶段组合了推理、闭卷问答、写作、指令跟随、长上下文理解和通用对话等多类互补训练信号。Reasoning RL 基于 [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) 采用两阶段长度调度,逐步降低超长率并提升推理准确率。我们还使用 [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa)、[NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open)、[LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData)、合成可验证 RLVR 数据与 pair-wise RLHF 信号,提升可靠性、指令跟随和用户体验。
102
 
103
+ ![MiniCPM5-1B RL 两阶段流程](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_two_stage_overview.png)
104
 
105
  **OPD** 阶段参考 Thinking Machines Lab 的 [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) 思路,并结合 [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016) 做了实现改进。我们在强化学习框架中使用反向 KL 散度作为优势估计值,替代原有的 verification-based advantage;同时在 response 序列的每个位置分别对学生模型和教师模型 logits 做双边 top-k 采样,取并集后计算反向 KL 散度,以平衡监督信号准确性和训练效率。OPD 直接复用各 RL teacher 训练时的同分布 prompt 作为蒸馏数据,无需额外构造语料。
106
 
107
+ ![MiniCPM5-1B RL + OPD 增益](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_gains.png)
108
 
109
+ ![MiniCPM5-1B RL + OPD 超长率下降](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_overlong.png)
110
 
111
  ## 快速上手
112
 
 
185
 
186
  ## 工具调用
187
 
188
+ 工具调用**推荐使用 SGLang**。MiniCPM5-1B 以 XML 格式产出工具调用,SGLang 内置的 `minicpm5` parser 会自动将其转换为 OpenAI 兼容的 `tool_calls` 字段。
189
 
190
  ```bash
191
  python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000 \
 
200
 
201
  | 后端 | 模型格式 / 适用场景 | Cookbook | Agent Skill |
202
  | --- | --- | --- | --- |
203
+ | Transformers | BF16 / FP16,本地 Python 推理,GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-transformers/SKILL.md) |
204
+ | vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/vllm.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-vllm/SKILL.md) |
205
+ | SGLang | BF16 / FP16 OpenAI server,推荐用于 tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-sglang/SKILL.md) |
206
+ | llama.cpp | GGUF,CPU/GPU 本地推理 | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
207
+ | Ollama | GGUF,本地端侧运行 | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-ollama/SKILL.md) |
208
+ | LM Studio | GGUF,Mac 桌面应用与 OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-lmstudio/SKILL.md) |
209
+ | MLX | MLX / 4bit,Apple Silicon 本地推理 | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-mlx/SKILL.md) |
210
+ | ArcLight | GGUF 本地端侧 / CPU / 桌面 / 服务器 | [arclight.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/arclight.md) | [minicpm5-deploy-arclight](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-arclight/SKILL.md) |
211
 
212
  ### 微调
213
 
214
  | 框架 | 适用场景 | Cookbook | Agent Skill |
215
  | --- | --- | --- | --- |
216
+ | TRL + PEFT | LoRA / SFT 微调 | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-trl/SKILL.md) |
217
+ | LLaMA-Factory | 微调 | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-llamafactory/SKILL.md) |
218
+ | ms-swift | 微调 | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-ms-swift/SKILL.md) |
219
+ | unsloth | 微调 | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
220
+ | xtuner | 微调 | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
221
 
222
  ## 桌宠
223
 
224
  我们也发布了 **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**,一个由 MiniCPM5-1B 本地驱动的桌宠应用。它支持 Apple Silicon / NVIDIA GPU / CPU 路线,可以与 Cursor、Claude Code、Codex 等 coding agent 联动,并支持 LoRA 人格切换。
225
 
226
+ <a href="https://youtu.be/Ee0slMW8SEk"><img src="https://img.youtube.com/vi/UXtUccouXGY/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
227
+
228
  ## 局限性与负责任使用
229
 
230
  MiniCPM5-1B 是一个基于训练数据统计规律生成文本的语言模型,可能生成不准确、有偏见或不安全的内容。在高风险场景中使用前,应对模型输出进行审查和验证。
README.md CHANGED
@@ -22,17 +22,21 @@ datasets:
22
  ---
23
 
24
  <div align="center">
25
- <img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm_logo.png" width="500em" />
26
  </div>
27
 
28
  <p align="center">
29
- <a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM Paper</a> |
30
- <a href="https://github.com/OpenBMB/MiniCPM/tree/minicpm5" target="_blank">GitHub Repo</a> |
31
- <a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README-cn.md" target="_blank">中文</a> |
32
  <a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
33
- <a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM Desk Pet</a>
 
34
  </p>
35
 
 
 
 
 
36
 
37
  ## Highlights
38
 
@@ -40,17 +44,13 @@ We are releasing **MiniCPM5-1B**, the first model in the **MiniCPM5** series. It
40
 
41
  🏆 **1B-class open-source SOTA**: compared with strong open-source models in the same size class, MiniCPM5-1B reaches SOTA within this comparison set. Its advantage is most visible in agentic tool use, code generation, and difficult reasoning.
42
 
43
- ![MiniCPM5-1B capability comparison by domain](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/public_leaderboard_radar_en.png)
44
 
45
  🧠 **Dual Mode Reasoning**: built-in `<think>` chat template, switch via `enable_thinking`. The same checkpoint serves as both a fast assistant and a deliberate reasoner.
46
 
47
  🛠️ **Deployment / Fine-tuning Resources**: the MiniCPM GitHub repo provides single-page cookbooks and Agent Skills for major inference backends and fine-tuning frameworks.
48
 
49
- 🐱 **Desktop Pet**: a local-LLM desktop pet driven by MiniCPM5-1B. Click the cover below to open the demo video.
50
-
51
- <a href="https://youtu.be/UXtUccouXGY"><img src="https://img.youtube.com/vi/UXtUccouXGY/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
52
-
53
- **Project repo**: [OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)
54
 
55
  ## Model List
56
 
@@ -82,7 +82,7 @@ MiniCPM5-1B is the first checkpoint in the MiniCPM5 series. It is designed for l
82
 
83
  We compare MiniCPM5-1B with strong open-source models in the same size class, including **LFM2.5-1.2B-Thinking**, **Qwen3-0.6B/think** and **Qwen3.5-0.8B/think**. These are capable baselines; within this comparison set, MiniCPM5-1B reaches 1B-class open-source SOTA, with its advantage most visible in tool use, code generation, and difficult reasoning. This makes it a practical choice for local coding agents, tool assistants, and reasoning assistants.
84
 
85
- ![MiniCPM-5 1B Public Leaderboard](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/public_leaderboard_en.png)
86
 
87
  ## Training Recipe
88
 
@@ -92,7 +92,7 @@ During **base training**, the model goes through stable training and decay train
92
 
93
  During **post-training**, we proceed in three steps: **SFT**, **RL**, and **OPD**. We first use **200B tokens of deep-thinking SFT** and **200B tokens of hybrid-thinking SFT** to establish deep-thinking, hybrid-thinking, and general chat abilities; the SFT data is released as [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605). We then train specialized **RL teachers** for math, code, closed-book QA, writing, and related domains, and use **On-Policy Distillation (OPD)** to distill these teachers back into one release model.
94
 
95
- ![MiniCPM5-1B Training Recipe](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/training_recipe.png)
96
 
97
  ### What does RL + OPD bring?
98
 
@@ -100,13 +100,13 @@ During **post-training**, we proceed in three steps: **SFT**, **RL**, and **OPD*
100
 
101
  **RL** combines complementary training signals for reasoning, closed-book QA, writing, instruction following, long-context understanding, and general dialogue. Reasoning RL is based on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) and uses a two-stage length schedule to reduce overlong responses while improving reasoning accuracy. We also use [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa), [NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open), [LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData), synthesized verifiable RLVR data, and pair-wise RLHF signals to improve reliability, instruction following, and user experience.
102
 
103
- ![MiniCPM5-1B RL Two-stage Pipeline](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/rl_two_stage_overview.png)
104
 
105
  **OPD** builds on Thinking Machines Lab's [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) and incorporates implementation improvements from [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016). In the RL framework, we use reverse KL divergence as the advantage estimate, replacing the original verification-based advantage. At each response position, we take top-k logits from both the student and teacher models, compute reverse KL on the union of the two token sets, and balance the accuracy of the RKL signal with training efficiency. OPD reuses the in-domain prompts used to train each RL teacher as distillation data, so no additional data curation is required.
106
 
107
- ![MiniCPM5-1B RL + OPD Gains](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/rl_gains.png)
108
 
109
- ![MiniCPM5-1B RL + OPD Overlong Response Rate Drop](https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/rl_overlong.png)
110
 
111
  ## Quickstart
112
 
@@ -200,28 +200,31 @@ MiniCPM5-1B uses the **standard `LlamaForCausalLM` architecture**, so mainstream
200
 
201
  | Backend | Model format / use case | Cookbook | Agent Skill |
202
  | --- | --- | --- | --- |
203
- | Transformers | BF16 / FP16 local Python inference, GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-transformers/SKILL.md) |
204
- | vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/vllm.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-vllm/SKILL.md) |
205
- | SGLang | BF16 / FP16 OpenAI server, recommended for tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-sglang/SKILL.md) |
206
- | llama.cpp | GGUF local inference, CPU/GPU | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
207
- | Ollama | GGUF local on-device runtime | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-ollama/SKILL.md) |
208
- | LM Studio | GGUF Mac desktop app and OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-lmstudio/SKILL.md) |
209
- | MLX | MLX / 4bit local inference on Apple Silicon | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-mlx/SKILL.md) |
 
210
 
211
  ### Fine-tuning
212
 
213
  | Framework | Use case | Cookbook | Agent Skill |
214
  | --- | --- | --- | --- |
215
- | TRL + PEFT | LoRA / SFT fine-tuning | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-trl/SKILL.md) |
216
- | LLaMA-Factory | Fine-tuning | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-llamafactory/SKILL.md) |
217
- | ms-swift | Fine-tuning | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-ms-swift/SKILL.md) |
218
- | unsloth | Fine-tuning | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-unsloth/SKILL.md) |
219
- | xtuner | Fine-tuning | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-xtuner/SKILL.md) |
220
 
221
  ## Desktop Pet
222
 
223
  We also ship **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**, a desktop pet driven locally by MiniCPM5-1B. It supports Apple Silicon / NVIDIA GPU / CPU paths, can work with coding agents such as Cursor, Claude Code, and Codex, and supports LoRA persona switching.
224
 
 
 
225
  ## Limitations and Responsible Use
226
 
227
  MiniCPM5-1B is a language model that generates content based on learned statistical patterns from training data. It may produce inaccurate, biased, or unsafe outputs, and generated content should be reviewed and verified before use in high-stakes settings.
 
22
  ---
23
 
24
  <div align="center">
25
+ <img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png" width="500em" />
26
  </div>
27
 
28
  <p align="center">
29
+ <a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM Tech Report</a> |
30
+ <a href="https://github.com/OpenBMB/MiniCPM" target="_blank">GitHub Repo</a> |
 
31
  <a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
32
+ <a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM Desk Pet</a> |
33
+ <a href="https://huggingface.co/spaces/openbmb/MiniCPM5-1B-Demo" target="_blank">Online Demo</a>
34
  </p>
35
 
36
+ <p align="center">
37
+ English |
38
+ <a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README-cn.md" target="_blank">中文</a>
39
+ </p>
40
 
41
  ## Highlights
42
 
 
44
 
45
  🏆 **1B-class open-source SOTA**: compared with strong open-source models in the same size class, MiniCPM5-1B reaches SOTA within this comparison set. Its advantage is most visible in agentic tool use, code generation, and difficult reasoning.
46
 
47
+ ![MiniCPM5-1B capability comparison by domain](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/public_leaderboard_radar_en.png)
48
 
49
  🧠 **Dual Mode Reasoning**: built-in `<think>` chat template, switch via `enable_thinking`. The same checkpoint serves as both a fast assistant and a deliberate reasoner.
50
 
51
  🛠️ **Deployment / Fine-tuning Resources**: the MiniCPM GitHub repo provides single-page cookbooks and Agent Skills for major inference backends and fine-tuning frameworks.
52
 
53
+ 🐱 **Desktop Pet**: a local-LLM desktop pet driven by MiniCPM5-1B.
 
 
 
 
54
 
55
  ## Model List
56
 
 
82
 
83
  We compare MiniCPM5-1B with strong open-source models in the same size class, including **LFM2.5-1.2B-Thinking**, **Qwen3-0.6B/think** and **Qwen3.5-0.8B/think**. These are capable baselines; within this comparison set, MiniCPM5-1B reaches 1B-class open-source SOTA, with its advantage most visible in tool use, code generation, and difficult reasoning. This makes it a practical choice for local coding agents, tool assistants, and reasoning assistants.
84
 
85
+ ![MiniCPM-5 1B Public Leaderboard](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/public_leaderboard_en.png)
86
 
87
  ## Training Recipe
88
 
 
92
 
93
  During **post-training**, we proceed in three steps: **SFT**, **RL**, and **OPD**. We first use **200B tokens of deep-thinking SFT** and **200B tokens of hybrid-thinking SFT** to establish deep-thinking, hybrid-thinking, and general chat abilities; the SFT data is released as [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605). We then train specialized **RL teachers** for math, code, closed-book QA, writing, and related domains, and use **On-Policy Distillation (OPD)** to distill these teachers back into one release model.
94
 
95
+ ![MiniCPM5-1B Training Recipe](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/training_recipe.png)
96
 
97
  ### What does RL + OPD bring?
98
 
 
100
 
101
  **RL** combines complementary training signals for reasoning, closed-book QA, writing, instruction following, long-context understanding, and general dialogue. Reasoning RL is based on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) and uses a two-stage length schedule to reduce overlong responses while improving reasoning accuracy. We also use [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa), [NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open), [LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData), synthesized verifiable RLVR data, and pair-wise RLHF signals to improve reliability, instruction following, and user experience.
102
 
103
+ ![MiniCPM5-1B RL Two-stage Pipeline](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_two_stage_overview.png)
104
 
105
  **OPD** builds on Thinking Machines Lab's [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) and incorporates implementation improvements from [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016). In the RL framework, we use reverse KL divergence as the advantage estimate, replacing the original verification-based advantage. At each response position, we take top-k logits from both the student and teacher models, compute reverse KL on the union of the two token sets, and balance the accuracy of the RKL signal with training efficiency. OPD reuses the in-domain prompts used to train each RL teacher as distillation data, so no additional data curation is required.
106
 
107
+ ![MiniCPM5-1B RL + OPD Gains](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_gains.png)
108
 
109
+ ![MiniCPM5-1B RL + OPD Overlong Response Rate Drop](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_overlong.png)
110
 
111
  ## Quickstart
112
 
 
200
 
201
  | Backend | Model format / use case | Cookbook | Agent Skill |
202
  | --- | --- | --- | --- |
203
+ | Transformers | BF16 / FP16 local Python inference, GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-transformers/SKILL.md) |
204
+ | vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/vllm.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-vllm/SKILL.md) |
205
+ | SGLang | BF16 / FP16 OpenAI server, recommended for tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-sglang/SKILL.md) |
206
+ | llama.cpp | GGUF local inference, CPU/GPU | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
207
+ | Ollama | GGUF local on-device runtime | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-ollama/SKILL.md) |
208
+ | LM Studio | GGUF Mac desktop app and OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-lmstudio/SKILL.md) |
209
+ | MLX | MLX / 4bit local inference on Apple Silicon | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-mlx/SKILL.md) |
210
+ | ArcLight | GGUF local on-device, CPU, Desktop & Server | [arclight.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/arclight.md) | [minicpm5-deploy-arclight](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-arclight/SKILL.md) |
211
 
212
  ### Fine-tuning
213
 
214
  | Framework | Use case | Cookbook | Agent Skill |
215
  | --- | --- | --- | --- |
216
+ | TRL + PEFT | LoRA / SFT fine-tuning | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-trl/SKILL.md) |
217
+ | LLaMA-Factory | Fine-tuning | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-llamafactory/SKILL.md) |
218
+ | ms-swift | Fine-tuning | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-ms-swift/SKILL.md) |
219
+ | unsloth | Fine-tuning | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
220
+ | xtuner | Fine-tuning | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
221
 
222
  ## Desktop Pet
223
 
224
  We also ship **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**, a desktop pet driven locally by MiniCPM5-1B. It supports Apple Silicon / NVIDIA GPU / CPU paths, can work with coding agents such as Cursor, Claude Code, and Codex, and supports LoRA persona switching.
225
 
226
+ <a href="https://youtu.be/UXtUccouXGY"><img src="https://img.youtube.com/vi/UXtUccouXGY/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
227
+
228
  ## Limitations and Responsible Use
229
 
230
  MiniCPM5-1B is a language model that generates content based on learned statistical patterns from training data. It may produce inaccurate, biased, or unsafe outputs, and generated content should be reviewed and verified before use in high-stakes settings.