--- license: mit base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B library_name: openvino pipeline_tag: text-generation tags: - deepseek - deepseek-r1 - qwen - openvino - openvino-genai - optimum-intel - nncf - int4 - llm - text-generation - reasoning - converted - conversational --- # DeepSeek-R1-Distill-Qwen-32B OpenVINO INT4 This repository contains an unofficial OpenVINO™ IR conversion of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) with INT4 weight compression. The model was converted using **Optimum Intel** and is intended for local inference with **OpenVINO**. For generative inference, this repository also includes an **OpenVINO GenAI** example, which is the preferred runtime path for getting strong performance from OpenVINO-converted large language models on Intel hardware. ## Original model - Original model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B` - Original creator: DeepSeek - Architecture family: Qwen-based DeepSeek-R1 distilled reasoning model - Converted format: OpenVINO IR - Weight format: INT4 - Task: text generation / reasoning This is an unofficial converted model repository. It is not an official DeepSeek or OpenVINO release. ## License and publishability The original `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B` model card states that both the code repository and model weights are licensed under the **MIT License**. It also states that the DeepSeek-R1 series supports commercial use and allows modifications and derivative works, including distillation. Because of that, this OpenVINO INT4 conversion can be published publicly under the same MIT license, provided the original license notice is preserved. This repository includes the original MIT `LICENSE` file for attribution and compliance. Please refer to the original model card for full model details, intended use, safety notes, license terms, and limitations. ## Model summary DeepSeek-R1-Distill-Qwen-32B is a distilled reasoning model based on Qwen. It belongs to the DeepSeek-R1 family, which was released to provide strong reasoning capabilities in smaller distilled models. This OpenVINO version is designed for local inference on Intel hardware using OpenVINO and OpenVINO GenAI. ## Conversion This model was converted with Optimum Intel using the OpenVINO export path. ```bash optimum-cli export openvino \ --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \ --weight-format int4 \ --group-size 128 \ --ratio 1.0 \ ov_DeepSeek-R1-Distill-Qwen-32B_int4 Quantization Weight format: INT4 Group size: 128 Ratio: 1.0 Export tool: Optimum Intel Compression backend: NNCF through Optimum Intel Runtime format: OpenVINO IR INT4 compression is intended to reduce model size and memory usage compared with higher precision weights. As with any converted and quantized model, output quality and numerical behavior may differ from the original model and should be validated for your use case. Installation pip install -r examples/requirements.txt 6. Test with Optimum Intel first Create or use the included script: python examples/test_deepseek32b_ov_optimum.py \ --model-dir . \ --device CPU \ --max-new-tokens 128 \ --prompt "Explain OpenVINO in one short paragraph." 7. Test with OpenVINO GenAI OpenVINO GenAI provides a clean runtime path for generative inference with OpenVINO-converted models. Run from inside the model directory: python examples/test_deepseek32b_ov_genai.py \ --model-dir . \ --device CPU \ --max-new-tokens 128 \ --prompt "Explain OpenVINO in one short paragraph." If CPU works, then try GPU First check that OpenVINO detects GPU devices. The included GenAI script prints available OpenVINO devices. Then run: python examples/test_deepseek32b_ov_genai.py \ --model-dir . \ --device GPU.0 \ --max-new-tokens 64 \ --prompt "Explain OpenVINO in one short paragraph." Notes This model is text-only. This repository uses both Optimum Intel and OpenVINO GenAI examples. The Optimum Intel path is useful for validating the exported model with Transformers-style APIs. The OpenVINO GenAI path is recommended for generative inference with OpenVINO-converted models. OpenVINO Model Server compatibility is not claimed unless separately validated. Limitations This repository inherits the limitations of the original deepseek-ai/DeepSeek-R1-Distill-Qwen-32B model. Additional differences may arise from OpenVINO conversion, INT4 compression, runtime package versions, and generation configuration. Attribution This is an unofficial OpenVINO conversion of the original DeepSeek model. All rights to the original model, training, and licensing remain with the original authors. Here is how I converted the original model to OV: cd ~ python3.11 -m venv deepseek32b_ov_env source ~/deepseek32b_ov_env/bin/activate python -m pip install --upgrade pip setuptools wheel pip install -U \ "openvino>=2025.1.0" \ "optimum-intel[openvino]>=1.22.0" \ "nncf>=2.14.0" \ "transformers>=4.48.0" \ "accelerate" \ "safetensors" \ "huggingface_hub" \ "sentencepiece" \ "protobuf" cd ~/ov_models MODEL_ID="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B" OUT_DIR="ov_DeepSeek-R1-Distill-Qwen-32B_int4" mkdir -p export_logs optimum-cli export openvino \ --model "$MODEL_ID" \ --weight-format int4 \ --group-size 128 \ --ratio 1.0 \ "$OUT_DIR" \ 2>&1 | tee export_logs/deepseek_r1_distill_qwen_32b_int4_export.log