Deepdive404's picture
|
download
raw
951 Bytes

Inference code for DeepSeek models

First convert huggingface model weight files to the format of this project.

export EXPERTS=384
export MP=8
export CONFIG=config.json
python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}

Then chat with DeepSeek model at will!

torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive

Or batch inference from file.

torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}

Or multi nodes inference.

torchrun --nnodes ${NODES} --nproc-per-node $((MP / NODES)) --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}

If you want to use fp8, just remove "expert_dtype": "fp4" in config.json and specify --expert-dtype fp8 in convert.py.

Xet Storage Details

Size:
951 Bytes
·
Xet hash:
b14bc97b5f42e4bc27ed5f278c10a7febf453710569f9f423703cd140a52fbe8

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.