Text Generation
Transformers
Safetensors
English
Chinese
llama
minicpm
minicpm5
long-context
tool-calling
on-device
edge-ai
conversational
text-generation-inference
Instructions to use openbmb/MiniCPM5-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM5-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM5-1B") model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM5-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM5-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM5-1B
- SGLang
How to use openbmb/MiniCPM5-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openbmb/MiniCPM5-1B with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM5-1B
update README
Browse files- README-cn.md +20 -16
- README.md +20 -16
README-cn.md
CHANGED
|
@@ -98,7 +98,7 @@ MiniCPM5-1B 的训练过程是 **[UltraData 分级数据管理体系](https://ar
|
|
| 98 |
|
| 99 |
**RL + OPD** 是 MiniCPM5-1B 后训练中的关键环节。在数学、代码、指令跟随三类任务上,RL + OPD 将平均分提升 **↑16 分**,同时将回复触顶 max-tokens 预算的比例降低 **↓29 个百分点**。下方图示展示 Reasoning RL 两阶段流程、分数提升和超长率下降。
|
| 100 |
|
| 101 |
-
**RL** 阶段组合了推理、闭卷问答、写作、指令跟随、长上下文理解和通用对话等多类互补训练信号。Reasoning RL 基于 [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k)
|
| 102 |
|
| 103 |

|
| 104 |
|
|
@@ -219,13 +219,11 @@ MiniCPM5-1B 使用**标准 `LlamaForCausalLM` 架构**,主流推理引擎可
|
|
| 219 |
| unsloth | 微调 | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
|
| 220 |
| xtuner | 微调 | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
|
| 221 |
|
| 222 |
-
##
|
| 223 |
|
| 224 |
-
|
| 225 |
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
## FlagOS 介绍
|
| 229 |
|
| 230 |
为解决不同 AI 芯片大规模落地应用,北京智源研究院联合众多科研机构、芯片企业、系统厂商、算法和软件相关单位等国内外机构共同发起并创立了 FlagOS 开源社区。
|
| 231 |
|
|
@@ -236,7 +234,7 @@ FlagOS 社区致力于打造面向多种 AI 芯片的统一、开源的系统软
|
|
| 236 |
<details>
|
| 237 |
<summary>FlagOS 多 AI 芯片支持与使用方式</summary>
|
| 238 |
|
| 239 |
-
## FlagOS 多 AI 芯片支持
|
| 240 |
|
| 241 |
基于 FlagOS 极短时间内适配 MiniCPM5-1B 到 9 种不同的 AI 芯片,得益于众智 FlagOS 的多芯片统一 AI 系统软件栈的能力。目前,在 FlagOS 团队构建的面向多架构人工智能芯片的大模型自动迁移、适配与发布平台 FlagRelease 上,已发布 MiniCPM5-1B 的多芯片版本。细节如下:
|
| 242 |
|
|
@@ -252,17 +250,17 @@ FlagOS 社区致力于打造面向多种 AI 芯片的统一、开源的系统软
|
|
| 252 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 253 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 254 |
|
| 255 |
-
## FlagOS 使用方式
|
| 256 |
|
| 257 |
-
### 使用 FlagOS 在 Nvidia 体验性能加速
|
| 258 |
|
| 259 |
-
#### From FlagRelease(**推荐**)
|
| 260 |
|
| 261 |
FlagRelease是FlagOS团队构建的一套面向多架构人工智能芯片的大模型自动迁移、适配与发布平台,已发布MiniCPM-1B的多芯片版本。FlagRelase已内置相关软件包,无需用户安装。
|
| 262 |
|
| 263 |
-
##### FlagRelease 镜像关键版本信息
|
| 264 |
|
| 265 |
-
##### FlagRelease 使用速递
|
| 266 |
|
| 267 |
|Vendor|ModelScope|Huggingface|
|
| 268 |
|---|---|---|
|
|
@@ -276,11 +274,11 @@ FlagRelease是FlagOS团队构建的一套面向多架构人工智能芯片的大
|
|
| 276 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 277 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 278 |
|
| 279 |
-
#### 从零开始
|
| 280 |
|
| 281 |
- 依赖Python3.12, GLIBC_2.39, GLIBCXX_3.4.33, CXXABI_1.3.15 环境
|
| 282 |
|
| 283 |
-
##### Vllm 版本
|
| 284 |
|
| 285 |
###### 安装 FlagOS 算子库
|
| 286 |
|
|
@@ -310,11 +308,11 @@ vllm serve ${model_path} \
|
|
| 310 |
--gpu-memory-utilization 0.85
|
| 311 |
```
|
| 312 |
|
| 313 |
-
### 使用 FlagOS 统一多芯片后端插件
|
| 314 |
|
| 315 |
**[vllm-plugin-FL](https://github.com/flagos-ai/vllm-plugin-FL)** 是一个为 **vLLM** 推理/服务框架构建的插件,它基于 **FlagOS 的统一多芯片后端**开发,旨在扩展 vLLM 在多种硬件环境下的功能和性能表现。
|
| 316 |
|
| 317 |
-
#### vllm-plugin-FL 使用
|
| 318 |
|
| 319 |
|厂商|从零开始|从 FlagRelease 开始||
|
| 320 |
|---|---|---|---|
|
|
@@ -322,6 +320,12 @@ vllm serve ${model_path} \
|
|
| 322 |
|
| 323 |
</details>
|
| 324 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 325 |
## 局限性与负责任使用
|
| 326 |
|
| 327 |
MiniCPM5-1B 是一个基于训练数据统计规律生成文本的语言模型,可能生成不准确、有偏见或不安全的内容。在高风险场景中使用前,应对模型输出进行审查和验证。
|
|
|
|
| 98 |
|
| 99 |
**RL + OPD** 是 MiniCPM5-1B 后训练中的关键环节。在数学、代码、指令跟随三类任务上,RL + OPD 将平均分提升 **↑16 分**,同时将回复触顶 max-tokens 预算的比例降低 **↓29 个百分点**。下方图示展示 Reasoning RL 两阶段流程、分数提升和超长率下降。
|
| 100 |
|
| 101 |
+
**RL** 阶段组合了推理、闭卷问答、写作、指令跟随、长上下文理解和通用对话等多类互补训练信号。Reasoning RL 基于 [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k),在遵循 [JustRL](https://arxiv.org/pdf/2512.16649) 极简配方的基础上,进一步加入了两阶段长度调度,逐步降低超长率并提升推理准确率。我们还使用 [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa)、[NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open)、[LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData)、合成可验证 RLVR 数据与 pair-wise RLHF 信号,提升可靠性、指令跟随和用户体验。
|
| 102 |
|
| 103 |

|
| 104 |
|
|
|
|
| 219 |
| unsloth | 微调 | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
|
| 220 |
| xtuner | 微调 | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
|
| 221 |
|
| 222 |
+
### 其他支持的框架
|
| 223 |
|
| 224 |
+
除上文列出的部署与微调框架外,MiniCPM5-1B 也支持通过 FlagOS 进行多芯片部署。
|
| 225 |
|
| 226 |
+
#### FlagOS 介绍
|
|
|
|
|
|
|
| 227 |
|
| 228 |
为解决不同 AI 芯片大规模落地应用,北京智源研究院联合众多科研机构、芯片企业、系统厂商、算法和软件相关单位等国内外机构共同发起并创立了 FlagOS 开源社区。
|
| 229 |
|
|
|
|
| 234 |
<details>
|
| 235 |
<summary>FlagOS 多 AI 芯片支持与使用方式</summary>
|
| 236 |
|
| 237 |
+
#### FlagOS 多 AI 芯片支持
|
| 238 |
|
| 239 |
基于 FlagOS 极短时间内适配 MiniCPM5-1B 到 9 种不同的 AI 芯片,得益于众智 FlagOS 的多芯片统一 AI 系统软件栈的能力。目前,在 FlagOS 团队构建的面向多架构人工智能芯片的大模型自动迁移、适配与发布平台 FlagRelease 上,已发布 MiniCPM5-1B 的多芯片版本。细节如下:
|
| 240 |
|
|
|
|
| 250 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 251 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 252 |
|
| 253 |
+
#### FlagOS 使用方式
|
| 254 |
|
| 255 |
+
##### 使用 FlagOS 在 Nvidia 体验性能加速
|
| 256 |
|
| 257 |
+
###### From FlagRelease(**推荐**)
|
| 258 |
|
| 259 |
FlagRelease是FlagOS团队构建的一套面向多架构人工智能芯片的大模型自动迁移、适配与发布平台,已发布MiniCPM-1B的多芯片版本。FlagRelase已内置相关软件包,无需用户安装。
|
| 260 |
|
| 261 |
+
###### FlagRelease 镜像关键版本信息
|
| 262 |
|
| 263 |
+
###### FlagRelease 使用速递
|
| 264 |
|
| 265 |
|Vendor|ModelScope|Huggingface|
|
| 266 |
|---|---|---|
|
|
|
|
| 274 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 275 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 276 |
|
| 277 |
+
###### 从零开始
|
| 278 |
|
| 279 |
- 依赖Python3.12, GLIBC_2.39, GLIBCXX_3.4.33, CXXABI_1.3.15 环境
|
| 280 |
|
| 281 |
+
###### Vllm 版本
|
| 282 |
|
| 283 |
###### 安装 FlagOS 算子库
|
| 284 |
|
|
|
|
| 308 |
--gpu-memory-utilization 0.85
|
| 309 |
```
|
| 310 |
|
| 311 |
+
##### 使用 FlagOS 统一多芯片后端插件
|
| 312 |
|
| 313 |
**[vllm-plugin-FL](https://github.com/flagos-ai/vllm-plugin-FL)** 是一个为 **vLLM** 推理/服务框架构建的插件,它基于 **FlagOS 的统一多芯片后端**开发,旨在扩展 vLLM 在多种硬件环境下的功能和性能表现。
|
| 314 |
|
| 315 |
+
###### vllm-plugin-FL 使用
|
| 316 |
|
| 317 |
|厂商|从零开始|从 FlagRelease 开始||
|
| 318 |
|---|---|---|---|
|
|
|
|
| 320 |
|
| 321 |
</details>
|
| 322 |
|
| 323 |
+
## 桌宠
|
| 324 |
+
|
| 325 |
+
我们也发布了 **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**,一个由 MiniCPM5-1B 本地驱动的桌宠应用。它支持 Apple Silicon / NVIDIA GPU / CPU 路线,可以与 Cursor、Claude Code、Codex 等 coding agent 联动,并支持 LoRA 人格切换。
|
| 326 |
+
|
| 327 |
+
<a href="https://youtu.be/Ee0slMW8SEk"><img src="https://img.youtube.com/vi/Ee0slMW8SEk/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
|
| 328 |
+
|
| 329 |
## 局限性与负责任使用
|
| 330 |
|
| 331 |
MiniCPM5-1B 是一个基于训练数据统计规律生成文本的语言模型,可能生成不准确、有偏见或不安全的内容。在高风险场景中使用前,应对模型输出进行审查和验证。
|
README.md
CHANGED
|
@@ -98,7 +98,7 @@ During **post-training**, we proceed in three steps: **SFT**, **RL**, and **OPD*
|
|
| 98 |
|
| 99 |
**RL + OPD** is a key part of MiniCPM5-1B post-training. On math, code and instruction-following tasks, RL + OPD raises the average score by **↑16 points** while cutting the share of responses that hit the max-tokens budget by **↓29 percentage points**. The figures below show the two-stage Reasoning RL pipeline, score gains, and the drop in overlong responses.
|
| 100 |
|
| 101 |
-
**RL** combines complementary training signals for reasoning, closed-book QA, writing, instruction following, long-context understanding, and general dialogue. Reasoning RL is based on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) and
|
| 102 |
|
| 103 |

|
| 104 |
|
|
@@ -219,13 +219,11 @@ MiniCPM5-1B uses the **standard `LlamaForCausalLM` architecture**, so mainstream
|
|
| 219 |
| unsloth | Fine-tuning | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
|
| 220 |
| xtuner | Fine-tuning | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
|
| 221 |
|
| 222 |
-
##
|
| 223 |
|
| 224 |
-
|
| 225 |
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
## FlagOS Overview
|
| 229 |
|
| 230 |
To enable large-scale deployment across different AI chips, Beijing Zhiyuan Research Institute, together with numerous research institutions, chip manufacturers, system vendors, and algorithm and software organizations both domestically and internationally, jointly initiated and established the FlagOS Open Source Community.
|
| 231 |
|
|
@@ -236,7 +234,7 @@ Official website express: [https://flagos.io](https://flagos.io/)
|
|
| 236 |
<details>
|
| 237 |
<summary>FlagOS multi-chip support and usage</summary>
|
| 238 |
|
| 239 |
-
## FlagOS: Supporting Multiple AI Chips
|
| 240 |
|
| 241 |
Thanks to FlagOS’s unified multi-chip AI system software stack, MiniCPM5-1B was adapted to 4–5 different AI chips in an extremely short time. Currently, the multi-chip version of MiniCPM5-1B has been released on FlagRelease, FlagOS’s platform for automatic migration, adaptation, and deployment of large models across multi-architecture AI chips. Details are as follows:
|
| 242 |
|
|
@@ -252,17 +250,17 @@ Thanks to FlagOS’s unified multi-chip AI system software stack, MiniCPM5-1B wa
|
|
| 252 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 253 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 254 |
|
| 255 |
-
## FlagOS Usage
|
| 256 |
|
| 257 |
-
### FlagOS Performance Acceleration on Nvidia
|
| 258 |
|
| 259 |
-
#### From FlagRelease (**Recommendation**)
|
| 260 |
|
| 261 |
FlagRelease is a platform developed by the FlagOS team for automatic migration, adaptation, and deployment of large models across multi-architecture AI chips. The multi-chip version of MiniCPM5-1B has already been released on FlagRelease. All necessary software packages are pre-installed on the platform, so users do not need to install anything.
|
| 262 |
|
| 263 |
-
##### FlagRelease Image Key Versions
|
| 264 |
|
| 265 |
-
##### FlagRelease Quick Start
|
| 266 |
|
| 267 |
|Vendor|ModelScope|Huggingface|
|
| 268 |
|---|---|---|
|
|
@@ -276,11 +274,11 @@ FlagRelease is a platform developed by the FlagOS team for automatic migration,
|
|
| 276 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 277 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 278 |
|
| 279 |
-
#### From Scratch
|
| 280 |
|
| 281 |
- Dependencies: Python 3.12, GLIBC 2.39, GLIBCXX 3.4.33, CXXABI 1.3.15
|
| 282 |
|
| 283 |
-
##### Vllm Version
|
| 284 |
|
| 285 |
###### Installing the FlagOS Operator Library
|
| 286 |
|
|
@@ -310,11 +308,11 @@ vllm serve ${model_path} \
|
|
| 310 |
--gpu-memory-utilization 0.85
|
| 311 |
```
|
| 312 |
|
| 313 |
-
### Using FlagOS Unified Multi-Chip Backend Plugin
|
| 314 |
|
| 315 |
[**vllm-plugin-FL**](https://github.com/flagos-ai/vllm-plugin-FL) is a plugin built for the vLLM inference/service framework. Developed on top of FlagOS’s unified multi-chip backend, it is designed to extend vLLM’s capabilities and performance across a variety of hardware environments.
|
| 316 |
|
| 317 |
-
#### Using vllm-plugin-FL
|
| 318 |
|
| 319 |
|Vendor|From Scratch|From FlagRelease||
|
| 320 |
|---|---|---|---|
|
|
@@ -322,6 +320,12 @@ vllm serve ${model_path} \
|
|
| 322 |
|
| 323 |
</details>
|
| 324 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 325 |
## Limitations and Responsible Use
|
| 326 |
|
| 327 |
MiniCPM5-1B is a language model that generates content based on learned statistical patterns from training data. It may produce inaccurate, biased, or unsafe outputs, and generated content should be reviewed and verified before use in high-stakes settings.
|
|
|
|
| 98 |
|
| 99 |
**RL + OPD** is a key part of MiniCPM5-1B post-training. On math, code and instruction-following tasks, RL + OPD raises the average score by **↑16 points** while cutting the share of responses that hit the max-tokens budget by **↓29 percentage points**. The figures below show the two-stage Reasoning RL pipeline, score gains, and the drop in overlong responses.
|
| 100 |
|
| 101 |
+
**RL** combines complementary training signals for reasoning, closed-book QA, writing, instruction following, long-context understanding, and general dialogue. Reasoning RL is based on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k), follows the minimalist recipe of [JustRL](https://arxiv.org/pdf/2512.16649), and further adds a two-stage length schedule to reduce overlong responses while improving reasoning accuracy. We also use [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa), [NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open), [LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData), synthesized verifiable RLVR data, and pair-wise RLHF signals to improve reliability, instruction following, and user experience.
|
| 102 |
|
| 103 |

|
| 104 |
|
|
|
|
| 219 |
| unsloth | Fine-tuning | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
|
| 220 |
| xtuner | Fine-tuning | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
|
| 221 |
|
| 222 |
+
### Other Supported Frameworks
|
| 223 |
|
| 224 |
+
In addition to the deployment and fine-tuning frameworks listed above, MiniCPM5-1B is also supported by FlagOS for multi-chip deployment.
|
| 225 |
|
| 226 |
+
#### FlagOS Overview
|
|
|
|
|
|
|
| 227 |
|
| 228 |
To enable large-scale deployment across different AI chips, Beijing Zhiyuan Research Institute, together with numerous research institutions, chip manufacturers, system vendors, and algorithm and software organizations both domestically and internationally, jointly initiated and established the FlagOS Open Source Community.
|
| 229 |
|
|
|
|
| 234 |
<details>
|
| 235 |
<summary>FlagOS multi-chip support and usage</summary>
|
| 236 |
|
| 237 |
+
#### FlagOS: Supporting Multiple AI Chips
|
| 238 |
|
| 239 |
Thanks to FlagOS’s unified multi-chip AI system software stack, MiniCPM5-1B was adapted to 4–5 different AI chips in an extremely short time. Currently, the multi-chip version of MiniCPM5-1B has been released on FlagRelease, FlagOS’s platform for automatic migration, adaptation, and deployment of large models across multi-architecture AI chips. Details are as follows:
|
| 240 |
|
|
|
|
| 250 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 251 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 252 |
|
| 253 |
+
#### FlagOS Usage
|
| 254 |
|
| 255 |
+
##### FlagOS Performance Acceleration on Nvidia
|
| 256 |
|
| 257 |
+
###### From FlagRelease (**Recommendation**)
|
| 258 |
|
| 259 |
FlagRelease is a platform developed by the FlagOS team for automatic migration, adaptation, and deployment of large models across multi-architecture AI chips. The multi-chip version of MiniCPM5-1B has already been released on FlagRelease. All necessary software packages are pre-installed on the platform, so users do not need to install anything.
|
| 260 |
|
| 261 |
+
###### FlagRelease Image Key Versions
|
| 262 |
|
| 263 |
+
###### FlagRelease Quick Start
|
| 264 |
|
| 265 |
|Vendor|ModelScope|Huggingface|
|
| 266 |
|---|---|---|
|
|
|
|
| 274 |
|Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
|
| 275 |
|ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
|
| 276 |
|
| 277 |
+
###### From Scratch
|
| 278 |
|
| 279 |
- Dependencies: Python 3.12, GLIBC 2.39, GLIBCXX 3.4.33, CXXABI 1.3.15
|
| 280 |
|
| 281 |
+
###### Vllm Version
|
| 282 |
|
| 283 |
###### Installing the FlagOS Operator Library
|
| 284 |
|
|
|
|
| 308 |
--gpu-memory-utilization 0.85
|
| 309 |
```
|
| 310 |
|
| 311 |
+
##### Using FlagOS Unified Multi-Chip Backend Plugin
|
| 312 |
|
| 313 |
[**vllm-plugin-FL**](https://github.com/flagos-ai/vllm-plugin-FL) is a plugin built for the vLLM inference/service framework. Developed on top of FlagOS’s unified multi-chip backend, it is designed to extend vLLM’s capabilities and performance across a variety of hardware environments.
|
| 314 |
|
| 315 |
+
###### Using vllm-plugin-FL
|
| 316 |
|
| 317 |
|Vendor|From Scratch|From FlagRelease||
|
| 318 |
|---|---|---|---|
|
|
|
|
| 320 |
|
| 321 |
</details>
|
| 322 |
|
| 323 |
+
## Desktop Pet
|
| 324 |
+
|
| 325 |
+
We also ship **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**, a desktop pet driven locally by MiniCPM5-1B. It supports Apple Silicon / NVIDIA GPU / CPU paths, can work with coding agents such as Cursor, Claude Code, and Codex, and supports LoRA persona switching.
|
| 326 |
+
|
| 327 |
+
<a href="https://youtu.be/Ee0slMW8SEk"><img src="https://img.youtube.com/vi/Ee0slMW8SEk/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
|
| 328 |
+
|
| 329 |
## Limitations and Responsible Use
|
| 330 |
|
| 331 |
MiniCPM5-1B is a language model that generates content based on learned statistical patterns from training data. It may produce inaccurate, biased, or unsafe outputs, and generated content should be reviewed and verified before use in high-stakes settings.
|