Text Generation
Transformers
Safetensors
English
Chinese
llama
minicpm
minicpm5
long-context
tool-calling
on-device
edge-ai
conversational
text-generation-inference
Instructions to use openbmb/MiniCPM5-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM5-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM5-1B") model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM5-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM5-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM5-1B
- SGLang
How to use openbmb/MiniCPM5-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openbmb/MiniCPM5-1B with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM5-1B
update README
Browse files- README-cn.md +34 -31
- README.md +31 -28
README-cn.md
CHANGED
|
@@ -22,17 +22,21 @@ datasets:
|
|
| 22 |
---
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
-
<img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/
|
| 26 |
</div>
|
| 27 |
|
| 28 |
<p align="center">
|
| 29 |
-
<a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM
|
| 30 |
-
<a href="https://github.com/OpenBMB/MiniCPM
|
| 31 |
-
<a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README.md" target="_blank">English</a> |
|
| 32 |
<a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
|
| 33 |
-
<a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM 桌宠</a>
|
|
|
|
| 34 |
</p>
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
## 亮点
|
| 38 |
|
|
@@ -40,17 +44,13 @@ datasets:
|
|
| 40 |
|
| 41 |
🏆 **同尺寸开源模型 SOTA**:与同尺寸优秀开源模型相比,MiniCPM5-1B 在该对比范围内达到 SOTA 水平,优势主要体现在 Agentic 工具调用、代码生成和高难推理。
|
| 42 |
|
| 43 |
-

|
| 54 |
|
| 55 |
## 模型列表
|
| 56 |
|
|
@@ -68,8 +68,8 @@ MiniCPM5-1B 具有以下特性:
|
|
| 68 |
|
| 69 |
- **类型**:Causal Language Model
|
| 70 |
- **架构**:标准 `LlamaForCausalLM`
|
| 71 |
-
- **
|
| 72 |
-
- **
|
| 73 |
- **层数**:24
|
| 74 |
- **注意力头(GQA)**:16 个 Q heads / 2 个 KV heads
|
| 75 |
- **上下文长度**:131,072
|
|
@@ -82,7 +82,7 @@ MiniCPM5-1B 是 MiniCPM5 系列的首个模型,面向本地助手、coding age
|
|
| 82 |
|
| 83 |
我们选取 **LFM2.5-1.2B-Thinking**、**Qwen3-0.6B/think**、**Qwen3.5-0.8B/think** 等同尺寸优秀开源模型进行横向比较。这些模型本身已经很强;在这组对比中,MiniCPM5-1B 达到同尺寸开源模型 SOTA 水平,优势主要体现在工具调用、代码生成和高难推理上,也更适合承担本地 coding agent、工具助手和推理助手的角色。
|
| 84 |
|
| 85 |
-
。随后针对数学、代码、闭卷问答和写作等方向训练专用 **RL teacher**,并通过 **On-Policy Distillation (OPD)** 将这些 teacher 的能力蒸馏回同一个发布模型。
|
| 94 |
|
| 95 |
-
 采用两阶段长度调度,逐步降低超长率并提升推理准确率。我们还使用 [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa)、[NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open)、[LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData)、合成可验证 RLVR 数据与 pair-wise RLHF 信号,提升可靠性、指令跟随和用户体验。
|
| 102 |
|
| 103 |
-
 思路,并结合 [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016) 做了实现改进。我们在强化学习框架中使用反向 KL 散度作为优势估计值,替代原有的 verification-based advantage;同时在 response 序列的每个位置分别对学生模型和教师模型 logits 做双边 top-k 采样,取并集后计算反向 KL 散度,以平衡监督信号准确性和训练效率。OPD 直接复用各 RL teacher 训练时的同分布 prompt 作为蒸馏数据,无需额外构造语料。
|
| 106 |
|
| 107 |
-
)
|
|
| 185 |
|
| 186 |
## 工具调用
|
| 187 |
|
| 188 |
-
工具调用
|
| 189 |
|
| 190 |
```bash
|
| 191 |
python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000 \
|
|
@@ -200,28 +200,31 @@ MiniCPM5-1B 使用**标准 `LlamaForCausalLM` 架构**,主流推理引擎可
|
|
| 200 |
|
| 201 |
| 后端 | 模型格式 / 适用场景 | Cookbook | Agent Skill |
|
| 202 |
| --- | --- | --- | --- |
|
| 203 |
-
| Transformers | BF16 / FP16,本地 Python 推理,GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 204 |
-
| vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 205 |
-
| SGLang | BF16 / FP16 OpenAI server,推荐用于 tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 206 |
-
| llama.cpp | GGUF,CPU/GPU 本地推理 | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 207 |
-
| Ollama | GGUF,本地端侧运行 | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 208 |
-
| LM Studio | GGUF,Mac 桌面应用与 OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 209 |
-
| MLX | MLX / 4bit,Apple Silicon 本地推理 | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/
|
|
|
|
| 210 |
|
| 211 |
### 微调
|
| 212 |
|
| 213 |
| 框架 | 适用场景 | Cookbook | Agent Skill |
|
| 214 |
| --- | --- | --- | --- |
|
| 215 |
-
| TRL + PEFT | LoRA / SFT 微调 | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 216 |
-
| LLaMA-Factory | 微调 | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 217 |
-
| ms-swift | 微调 | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 218 |
-
| unsloth | 微调 | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 219 |
-
| xtuner | 微调 | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/
|
| 220 |
|
| 221 |
## 桌宠
|
| 222 |
|
| 223 |
我们也发布了 **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**,一个由 MiniCPM5-1B 本地驱动的桌宠应用。它支持 Apple Silicon / NVIDIA GPU / CPU 路线,可以与 Cursor、Claude Code、Codex 等 coding agent 联动,并支持 LoRA 人格切换。
|
| 224 |
|
|
|
|
|
|
|
| 225 |
## 局限性与负责任使用
|
| 226 |
|
| 227 |
MiniCPM5-1B 是一个基于训练数据统计规律生成文本的语言模型,可能生成不准确、有偏见或不安全的内容。在高风险场景中使用前,应对模型输出进行审查和验证。
|
|
|
|
| 22 |
---
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
+
<img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png" width="500em" />
|
| 26 |
</div>
|
| 27 |
|
| 28 |
<p align="center">
|
| 29 |
+
<a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM 技术报告</a> |
|
| 30 |
+
<a href="https://github.com/OpenBMB/MiniCPM" target="_blank">GitHub 仓库</a> |
|
|
|
|
| 31 |
<a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
|
| 32 |
+
<a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM 桌宠</a> |
|
| 33 |
+
<a href="https://huggingface.co/spaces/openbmb/MiniCPM5-1B-Demo" target="_blank">在线 Demo</a>
|
| 34 |
</p>
|
| 35 |
|
| 36 |
+
<p align="center">
|
| 37 |
+
<a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README.md" target="_blank">English</a> |
|
| 38 |
+
中文
|
| 39 |
+
</p>
|
| 40 |
|
| 41 |
## 亮点
|
| 42 |
|
|
|
|
| 44 |
|
| 45 |
🏆 **同尺寸开源模型 SOTA**:与同尺寸优秀开源模型相比,MiniCPM5-1B 在该对比范围内达到 SOTA 水平,优势主要体现在 Agentic 工具调用、代码生成和高难推理。
|
| 46 |
|
| 47 |
+

|
| 48 |
|
| 49 |
🧠 **双模式推理**:内置 `<think>` chat template,可通过 `enable_thinking` 在思考模式和非思考模式之间切换。同一份权重既可以作为快速助手,也可以承担更复杂的推理任务。
|
| 50 |
|
| 51 |
🛠️ **部署 / 微调资源**:MiniCPM GitHub 仓库提供面向主要推理后端和微调框架的单页 cookbook,并配套 Agent Skills,方便复现部署和微调流程。
|
| 52 |
|
| 53 |
+
🐱 **桌宠**:我们也提供了由 MiniCPM5-1B 本地驱动的桌宠应用。
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
## 模型列表
|
| 56 |
|
|
|
|
| 68 |
|
| 69 |
- **类型**:Causal Language Model
|
| 70 |
- **架构**:标准 `LlamaForCausalLM`
|
| 71 |
+
- **参数数量**:1,080,632,832
|
| 72 |
+
- **非嵌入参数数量**:679,552,512
|
| 73 |
- **层数**:24
|
| 74 |
- **注意力头(GQA)**:16 个 Q heads / 2 个 KV heads
|
| 75 |
- **上下文长度**:131,072
|
|
|
|
| 82 |
|
| 83 |
我们选取 **LFM2.5-1.2B-Thinking**、**Qwen3-0.6B/think**、**Qwen3.5-0.8B/think** 等同尺寸优秀开源模型进行横向比较。这些模型本身已经很强;在这组对比中,MiniCPM5-1B 达到同尺寸开源模型 SOTA 水平,优势主要体现在工具调用、代码生成和高难推理上,也更适合承担本地 coding agent、工具助手和推理助手的角色。
|
| 84 |
|
| 85 |
+

|
| 86 |
|
| 87 |
## 训练流程
|
| 88 |
|
|
|
|
| 92 |
|
| 93 |
**后训练阶段**分为 **SFT**、**RL** 与 **OPD** 三步。我们先使用 **200B tokens deep-thinking SFT** 与 **200B tokens hybrid-thinking SFT** 建立深度思考、混合思考和通用对话能力,相关 SFT 数据已同步开源为 [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605)。随后针对数学、代码、闭卷问答和写作等方向训练专用 **RL teacher**,并通过 **On-Policy Distillation (OPD)** 将这些 teacher 的能力蒸馏回同一个发布模型。
|
| 94 |
|
| 95 |
+

|
| 96 |
|
| 97 |
### RL + OPD 带来了什么?
|
| 98 |
|
|
|
|
| 100 |
|
| 101 |
**RL** 阶段组合了推理、闭卷问答、写作、指令跟随、长上下文理解和通用对话等多类互补训练信号。Reasoning RL 基于 [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) 采用两阶段长度调度,逐步降低超长率并提升推理准确率。我们还使用 [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa)、[NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open)、[LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData)、合成可验证 RLVR 数据与 pair-wise RLHF 信号,提升可靠性、指令跟随和用户体验。
|
| 102 |
|
| 103 |
+

|
| 104 |
|
| 105 |
**OPD** 阶段参考 Thinking Machines Lab 的 [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) 思路,并结合 [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016) 做了实现改进。我们在强化学习框架中使用反向 KL 散度作为优势估计值,替代原有的 verification-based advantage;同时在 response 序列的每个位置分别对学生模型和教师模型 logits 做双边 top-k 采样,取并集后计算反向 KL 散度,以平衡监督信号准确性和训练效率。OPD 直接复用各 RL teacher 训练时的同分布 prompt 作为蒸馏数据,无需额外构造语料。
|
| 106 |
|
| 107 |
+

|
| 108 |
|
| 109 |
+

|
| 110 |
|
| 111 |
## 快速上手
|
| 112 |
|
|
|
|
| 185 |
|
| 186 |
## 工具调用
|
| 187 |
|
| 188 |
+
工具调用**推荐使用 SGLang**。MiniCPM5-1B 以 XML 格式产出工具调用,SGLang 内置的 `minicpm5` parser 会自动将其转换为 OpenAI 兼容的 `tool_calls` 字段。
|
| 189 |
|
| 190 |
```bash
|
| 191 |
python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000 \
|
|
|
|
| 200 |
|
| 201 |
| 后端 | 模型格式 / 适用场景 | Cookbook | Agent Skill |
|
| 202 |
| --- | --- | --- | --- |
|
| 203 |
+
| Transformers | BF16 / FP16,本地 Python 推理,GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-transformers/SKILL.md) |
|
| 204 |
+
| vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/vllm.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-vllm/SKILL.md) |
|
| 205 |
+
| SGLang | BF16 / FP16 OpenAI server,推荐用于 tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-sglang/SKILL.md) |
|
| 206 |
+
| llama.cpp | GGUF,CPU/GPU 本地推理 | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
|
| 207 |
+
| Ollama | GGUF,本地端侧运行 | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-ollama/SKILL.md) |
|
| 208 |
+
| LM Studio | GGUF,Mac 桌面应用与 OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-lmstudio/SKILL.md) |
|
| 209 |
+
| MLX | MLX / 4bit,Apple Silicon 本地推理 | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-mlx/SKILL.md) |
|
| 210 |
+
| ArcLight | GGUF 本地端侧 / CPU / 桌面 / 服务器 | [arclight.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/arclight.md) | [minicpm5-deploy-arclight](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-arclight/SKILL.md) |
|
| 211 |
|
| 212 |
### 微调
|
| 213 |
|
| 214 |
| 框架 | 适用场景 | Cookbook | Agent Skill |
|
| 215 |
| --- | --- | --- | --- |
|
| 216 |
+
| TRL + PEFT | LoRA / SFT 微调 | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-trl/SKILL.md) |
|
| 217 |
+
| LLaMA-Factory | 微调 | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-llamafactory/SKILL.md) |
|
| 218 |
+
| ms-swift | 微调 | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-ms-swift/SKILL.md) |
|
| 219 |
+
| unsloth | 微调 | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
|
| 220 |
+
| xtuner | 微调 | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
|
| 221 |
|
| 222 |
## 桌宠
|
| 223 |
|
| 224 |
我们也发布了 **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**,一个由 MiniCPM5-1B 本地驱动的桌宠应用。它支持 Apple Silicon / NVIDIA GPU / CPU 路线,可以与 Cursor、Claude Code、Codex 等 coding agent 联动,并支持 LoRA 人格切换。
|
| 225 |
|
| 226 |
+
<a href="https://youtu.be/Ee0slMW8SEk"><img src="https://img.youtube.com/vi/UXtUccouXGY/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
|
| 227 |
+
|
| 228 |
## 局限性与负责任使用
|
| 229 |
|
| 230 |
MiniCPM5-1B 是一个基于训练数据统计规律生成文本的语言模型,可能生成不准确、有偏见或不安全的内容。在高风险场景中使用前,应对模型输出进行审查和验证。
|
README.md
CHANGED
|
@@ -22,17 +22,21 @@ datasets:
|
|
| 22 |
---
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
-
<img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/
|
| 26 |
</div>
|
| 27 |
|
| 28 |
<p align="center">
|
| 29 |
-
<a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM
|
| 30 |
-
<a href="https://github.com/OpenBMB/MiniCPM
|
| 31 |
-
<a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README-cn.md" target="_blank">中文</a> |
|
| 32 |
<a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
|
| 33 |
-
<a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM Desk Pet</a>
|
|
|
|
| 34 |
</p>
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
## Highlights
|
| 38 |
|
|
@@ -40,17 +44,13 @@ We are releasing **MiniCPM5-1B**, the first model in the **MiniCPM5** series. It
|
|
| 40 |
|
| 41 |
🏆 **1B-class open-source SOTA**: compared with strong open-source models in the same size class, MiniCPM5-1B reaches SOTA within this comparison set. Its advantage is most visible in agentic tool use, code generation, and difficult reasoning.
|
| 42 |
|
| 43 |
-

|
| 54 |
|
| 55 |
## Model List
|
| 56 |
|
|
@@ -82,7 +82,7 @@ MiniCPM5-1B is the first checkpoint in the MiniCPM5 series. It is designed for l
|
|
| 82 |
|
| 83 |
We compare MiniCPM5-1B with strong open-source models in the same size class, including **LFM2.5-1.2B-Thinking**, **Qwen3-0.6B/think** and **Qwen3.5-0.8B/think**. These are capable baselines; within this comparison set, MiniCPM5-1B reaches 1B-class open-source SOTA, with its advantage most visible in tool use, code generation, and difficult reasoning. This makes it a practical choice for local coding agents, tool assistants, and reasoning assistants.
|
| 84 |
|
| 85 |
-
. We then train specialized **RL teachers** for math, code, closed-book QA, writing, and related domains, and use **On-Policy Distillation (OPD)** to distill these teachers back into one release model.
|
| 94 |
|
| 95 |
-
 and uses a two-stage length schedule to reduce overlong responses while improving reasoning accuracy. We also use [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa), [NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open), [LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData), synthesized verifiable RLVR data, and pair-wise RLHF signals to improve reliability, instruction following, and user experience.
|
| 102 |
|
| 103 |
-
 and incorporates implementation improvements from [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016). In the RL framework, we use reverse KL divergence as the advantage estimate, replacing the original verification-based advantage. At each response position, we take top-k logits from both the student and teacher models, compute reverse KL on the union of the two token sets, and balance the accuracy of the RKL signal with training efficiency. OPD reuses the in-domain prompts used to train each RL teacher as distillation data, so no additional data curation is required.
|
| 106 |
|
| 107 |
-
**, a desktop pet driven locally by MiniCPM5-1B. It supports Apple Silicon / NVIDIA GPU / CPU paths, can work with coding agents such as Cursor, Claude Code, and Codex, and supports LoRA persona switching.
|
| 224 |
|
|
|
|
|
|
|
| 225 |
## Limitations and Responsible Use
|
| 226 |
|
| 227 |
MiniCPM5-1B is a language model that generates content based on learned statistical patterns from training data. It may produce inaccurate, biased, or unsafe outputs, and generated content should be reviewed and verified before use in high-stakes settings.
|
|
|
|
| 22 |
---
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
+
<img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png" width="500em" />
|
| 26 |
</div>
|
| 27 |
|
| 28 |
<p align="center">
|
| 29 |
+
<a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM Tech Report</a> |
|
| 30 |
+
<a href="https://github.com/OpenBMB/MiniCPM" target="_blank">GitHub Repo</a> |
|
|
|
|
| 31 |
<a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
|
| 32 |
+
<a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM Desk Pet</a> |
|
| 33 |
+
<a href="https://huggingface.co/spaces/openbmb/MiniCPM5-1B-Demo" target="_blank">Online Demo</a>
|
| 34 |
</p>
|
| 35 |
|
| 36 |
+
<p align="center">
|
| 37 |
+
English |
|
| 38 |
+
<a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README-cn.md" target="_blank">中文</a>
|
| 39 |
+
</p>
|
| 40 |
|
| 41 |
## Highlights
|
| 42 |
|
|
|
|
| 44 |
|
| 45 |
🏆 **1B-class open-source SOTA**: compared with strong open-source models in the same size class, MiniCPM5-1B reaches SOTA within this comparison set. Its advantage is most visible in agentic tool use, code generation, and difficult reasoning.
|
| 46 |
|
| 47 |
+

|
| 48 |
|
| 49 |
🧠 **Dual Mode Reasoning**: built-in `<think>` chat template, switch via `enable_thinking`. The same checkpoint serves as both a fast assistant and a deliberate reasoner.
|
| 50 |
|
| 51 |
🛠️ **Deployment / Fine-tuning Resources**: the MiniCPM GitHub repo provides single-page cookbooks and Agent Skills for major inference backends and fine-tuning frameworks.
|
| 52 |
|
| 53 |
+
🐱 **Desktop Pet**: a local-LLM desktop pet driven by MiniCPM5-1B.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
## Model List
|
| 56 |
|
|
|
|
| 82 |
|
| 83 |
We compare MiniCPM5-1B with strong open-source models in the same size class, including **LFM2.5-1.2B-Thinking**, **Qwen3-0.6B/think** and **Qwen3.5-0.8B/think**. These are capable baselines; within this comparison set, MiniCPM5-1B reaches 1B-class open-source SOTA, with its advantage most visible in tool use, code generation, and difficult reasoning. This makes it a practical choice for local coding agents, tool assistants, and reasoning assistants.
|
| 84 |
|
| 85 |
+

|
| 86 |
|
| 87 |
## Training Recipe
|
| 88 |
|
|
|
|
| 92 |
|
| 93 |
During **post-training**, we proceed in three steps: **SFT**, **RL**, and **OPD**. We first use **200B tokens of deep-thinking SFT** and **200B tokens of hybrid-thinking SFT** to establish deep-thinking, hybrid-thinking, and general chat abilities; the SFT data is released as [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605). We then train specialized **RL teachers** for math, code, closed-book QA, writing, and related domains, and use **On-Policy Distillation (OPD)** to distill these teachers back into one release model.
|
| 94 |
|
| 95 |
+

|
| 96 |
|
| 97 |
### What does RL + OPD bring?
|
| 98 |
|
|
|
|
| 100 |
|
| 101 |
**RL** combines complementary training signals for reasoning, closed-book QA, writing, instruction following, long-context understanding, and general dialogue. Reasoning RL is based on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) and uses a two-stage length schedule to reduce overlong responses while improving reasoning accuracy. We also use [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa), [NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open), [LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData), synthesized verifiable RLVR data, and pair-wise RLHF signals to improve reliability, instruction following, and user experience.
|
| 102 |
|
| 103 |
+

|
| 104 |
|
| 105 |
**OPD** builds on Thinking Machines Lab's [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) and incorporates implementation improvements from [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016). In the RL framework, we use reverse KL divergence as the advantage estimate, replacing the original verification-based advantage. At each response position, we take top-k logits from both the student and teacher models, compute reverse KL on the union of the two token sets, and balance the accuracy of the RKL signal with training efficiency. OPD reuses the in-domain prompts used to train each RL teacher as distillation data, so no additional data curation is required.
|
| 106 |
|
| 107 |
+

|
| 108 |
|
| 109 |
+

|
| 110 |
|
| 111 |
## Quickstart
|
| 112 |
|
|
|
|
| 200 |
|
| 201 |
| Backend | Model format / use case | Cookbook | Agent Skill |
|
| 202 |
| --- | --- | --- | --- |
|
| 203 |
+
| Transformers | BF16 / FP16 local Python inference, GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-transformers/SKILL.md) |
|
| 204 |
+
| vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/vllm.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-vllm/SKILL.md) |
|
| 205 |
+
| SGLang | BF16 / FP16 OpenAI server, recommended for tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-sglang/SKILL.md) |
|
| 206 |
+
| llama.cpp | GGUF local inference, CPU/GPU | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
|
| 207 |
+
| Ollama | GGUF local on-device runtime | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-ollama/SKILL.md) |
|
| 208 |
+
| LM Studio | GGUF Mac desktop app and OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-lmstudio/SKILL.md) |
|
| 209 |
+
| MLX | MLX / 4bit local inference on Apple Silicon | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-mlx/SKILL.md) |
|
| 210 |
+
| ArcLight | GGUF local on-device, CPU, Desktop & Server | [arclight.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/arclight.md) | [minicpm5-deploy-arclight](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-arclight/SKILL.md) |
|
| 211 |
|
| 212 |
### Fine-tuning
|
| 213 |
|
| 214 |
| Framework | Use case | Cookbook | Agent Skill |
|
| 215 |
| --- | --- | --- | --- |
|
| 216 |
+
| TRL + PEFT | LoRA / SFT fine-tuning | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-trl/SKILL.md) |
|
| 217 |
+
| LLaMA-Factory | Fine-tuning | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-llamafactory/SKILL.md) |
|
| 218 |
+
| ms-swift | Fine-tuning | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-ms-swift/SKILL.md) |
|
| 219 |
+
| unsloth | Fine-tuning | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
|
| 220 |
+
| xtuner | Fine-tuning | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
|
| 221 |
|
| 222 |
## Desktop Pet
|
| 223 |
|
| 224 |
We also ship **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**, a desktop pet driven locally by MiniCPM5-1B. It supports Apple Silicon / NVIDIA GPU / CPU paths, can work with coding agents such as Cursor, Claude Code, and Codex, and supports LoRA persona switching.
|
| 225 |
|
| 226 |
+
<a href="https://youtu.be/UXtUccouXGY"><img src="https://img.youtube.com/vi/UXtUccouXGY/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
|
| 227 |
+
|
| 228 |
## Limitations and Responsible Use
|
| 229 |
|
| 230 |
MiniCPM5-1B is a language model that generates content based on learned statistical patterns from training data. It may produce inaccurate, biased, or unsafe outputs, and generated content should be reviewed and verified before use in high-stakes settings.
|