openpangu
/

openPangu-R-7B-2512

+## openPangu-R-7B-2512 在[vllm-ascend](https://github.com/vllm-project/vllm-ascend)部署指导文档
+### 部署环境说明
+Atlas 800T A2(64GB) 可部署openPangu-R-7B-2512。
+### A2镜像构建和启动
+拉取基础镜像：
+```
+docker pull quay.io/ascend/cann:8.3.rc1.alpha003-910b-ubuntu22.04-py3.11
+```
+使用Dockerfile.构建镜像：
+```
+IMAGE=quay.io/ascend/cann:8.3.rc1.alpha003-910b-ubuntu22.04-py3.11-vllm0.11
+docker build -t $IMAGE -f ./Dockerfile .
+```
+启动镜像：
+```
+export IMAGE=quay.io/ascend/cann:8.3.rc1.alpha003-910b-ubuntu22.04-py3.11-vllm0.11  # Use correct image id
+export NAME=XXX  # Custom docker name
+# Run the container using the defined variables
+# Note if you are running bridge network with docker, Please expose available ports for multiple nodes communication in advance
+# To prevent device interference from other docker containers, add the argument "--privileged"
+docker run -itd \
+--privileged \
+--ipc=host \
+--name $NAME \
+--network host \
+--device /dev/davinci0 \
+--device /dev/davinci1 \
+--device /dev/davinci2 \
+--device /dev/davinci3 \
+--device /dev/davinci4 \
+--device /dev/davinci5 \
+--device /dev/davinci6 \
+--device /dev/davinci7 \
+--device /dev/davinci_manager \
+--device /dev/devmm_svm \
+--device /dev/hisi_hdc \
+-v /usr/local/dcmi:/usr/local/dcmi \
+-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
+-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+-v /etc/ascend_install.info:/etc/ascend_install.info \
+-v /mnt/:/mnt/ \
+-v /data:/data \
+-v /home/work:/home/work \
+--entrypoint /bin/bash \
+$IMAGE
+```
+需要保证模型权重和本项目代码可在容器中访问。如果未进入容器，需以root用户进容器。
+```
+docker exec -itu root $NAME /bin/bash
+cd inference
+pip install -r requirements.txt
+bash ./cann910B-omni_inference_custom_ops-0.7.0-8.3.RC1-linux-aarch64.run --install-path=/usr/local/Ascend/ascend-toolkit/latest/opp
+source /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/omni_custom_ops/bin/set_env.bash
+pip install omni_inference_ascendc_custom_ops-0.7.0+8.3.rc1.pta2.7.1-cp311-cp311-linux_aarch64.whl --force-reinstall
+```
+### openPangu-R-7B-2512推理
+启动脚本：inference/launch.sh
+执行命令：
+```
+export LOAD_CKPT_DIR = XXX/checkpoint/   # The pangu_7b bf16 weight
+bash inference/launch.sh
+```
+启动脚本示例：
+```
+# 指定 HOST=127.0.0.1（本地主机）表示服务器只能从主设备访问。
+# 指定 HOST=0.0.0.0 允许从同一网络上的其他设备甚至从互联网访问 vLLM 服务器，前提是网络配置正确（例如，防火墙规则、端口转发）。
+HOST=xxx.xxx.xxx.xxx
+python $SCRIPT_DIR/vllm_register.py \
+	--model $LOCAL_CKPT_DIR \
+	--served-model-name ${SERVED_MODEL_NAME:=pangu_7b} \
+	--tensor-parallel-size ${TENSOR_PARALLEL_SIZE:=8} \
+	--trust-remote-code \
+    --host $HOST \
+	--port ${PORT:=8000} \
+	--max-num-seqs ${MAX_NUM_SEQS:=256} \
+	--max-model-len ${MAX_MODEL_LEN:=40960} \
+	--tokenizer-mode "slow" \
+	--dtype bfloat16 \
+	--enable-log-requests \
+	--distributed-executor-backend mp \
+	--gpu-memory-utilization 0.9 \
+  	--max-num-batched-tokens ${MAX_NUM_BATCHED_TOKENS:=4096} \
+	--no-enable-prefix-caching \
+	--enforce_eager \
+	--reasoning-parser pangu \
+```
+### 发请求测试
+服务启动后，可发送测试请求：
+```
+MASTER_NODE_IP=xxx.xxx.xxx.xxx  # server node ip
+curl http://${MASTER_NODE_IP}:${PORT}/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "'$SERVED_MODEL_NAME'",
+        "messages": [
+            {
+                "role": "user",
+                "content": "Who are you?"
+            }
+        ],
+        "max_tokens": 512,
+        "temperature": 0
+    }'
+```