Qwen3-4B-Instruct-2507-heretic OpenVINO NF4 for Intel NPU
This repository contains a ready-to-run OpenVINO IR export of
p-e-w/Qwen3-4B-Instruct-2507-heretic,
prepared for local inference on Intel NPU through OpenVINO Model Server.
This is not a fine-tune. It is an OpenVINO NF4 runtime export of the source model above. In the local benchmark below, this was the best balance of speed and memory among the tested INT4, NF4, and INT8 exports.
Source model
- Source model:
p-e-w/Qwen3-4B-Instruct-2507-heretic - Original base model:
Qwen/Qwen3-4B-Instruct-2507 - Architecture:
Qwen3ForCausalLM - Task: text generation
- License: Apache-2.0, inherited from the source model metadata
OpenVINO export
Compression metadata from this export:
{
"mode": "nf4",
"nncf_mode": "NF4",
"group_size": -1,
"ratio": 1.0,
"all_layers": true
}
Known local artifact size:
openvino_model.bin: about 1.88 GiB
Tested Intel NPU runtime
Tested locally on Windows with OpenVINO Model Server / OpenVINO GenAI:
- Target device:
NPU - OVMS task:
text_generation - Runtime prompt limit:
16384 - Max concurrent sequences:
1 - Cache interval multiplier:
64
Example OVMS command:
ovms.exe `
--model_path Q:/llm/models/OpenVINO/p-e-w--Qwen3-4B-Instruct-2507-heretic-text-fp16-nf4-cw-ov `
--model_name p-e-w--Qwen3-4B-Instruct-2507-heretic-nf4-npu `
--rest_port 8000 `
--rest_bind_address 0.0.0.0 `
--task text_generation `
--target_device NPU `
--max_prompt_len 16384 `
--max_num_seqs 1 `
--cache_interval_multiplier 64 `
--tool_parser hermes3
Local benchmark
Measured on the local Intel NPU setup above with the OVMS OpenAI-compatible chat completions endpoint.
| Quantization | Load time | Avg output speed | Working set after runs | Private memory after runs |
|---|---|---|---|---|
| NF4 | 366.8 s | 12.34 tok/s | 7.42 GiB | 2.57 GiB |
The benchmark prompt was a short three-sentence OpenVINO explanation request
with max_tokens=128.
- Downloads last month
- 23
Model tree for machine-made-Fibre/Qwen3-4B-Instruct-2507-heretic-OpenVINO-NF4-NPU
Base model
Qwen/Qwen3-4B-Instruct-2507