--- license: apache-2.0 language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - "no" - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su - yue tags: - automatic-speech-recognition - openvino - whisper - int8 - quantized base_model: openai/whisper-large-v3-turbo library_name: openvino pipeline_tag: automatic-speech-recognition --- # ov-whisper_large_v3_turbo-int8-2026.0.0 [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) exported to OpenVINO IR with **INT8 asymmetric weight compression** (group size 128). The model layout targets `openvino_genai.WhisperPipeline` and includes stateful decoder (`-with-past`), tokenizer, and detokenizer. Whisper large-v3-turbo is a distilled version of Whisper large-v3 that is 6x faster with minimal quality loss. It supports 99 languages. ## Quantization details | Parameter | Value | |-----------|-------| | Source model | `openai/whisper-large-v3-turbo` | | Weight format | INT8 asymmetric (per-channel) | | Group size | 128 | | Encoder layers compressed | 194 / 194 (100%) | | Decoder layers compressed | 42 / 42 (100%) | | Task | `automatic-speech-recognition-with-past` | ## Toolchain | Package | Version | |---------|---------| | Python | 3.11.9 | | openvino | 2026.0.0 | | openvino-genai | 2026.0.0.0 | | openvino-tokenizers | 2026.0.0.0 | | optimum-intel | 1.27.0 | | optimum | 2.1.0 | | nncf | 3.0.0 | | transformers | 4.57.6 | | torch | 2.11.0 | ## Usage ```python import numpy as np import openvino_genai as ov_genai pipe = ov_genai.WhisperPipeline("ov-whisper_large_v3_turbo-int8-2026.0.0", "CPU") # Load audio as 16 kHz float32 mono (e.g. via librosa) import librosa samples, _ = librosa.load("audio.wav", sr=16000, mono=True) samples = np.asarray(samples, dtype=np.float32) result = pipe.generate(samples) print(result.text) ``` Supported devices: `CPU`, `GPU`, `NPU` (tested on Intel Core Ultra 7 255H / Arc 140T / AI Boost). ## Reproduce the export ```bash pip install -r requirements.txt python export_whisper_int8_ov.py \ --model openai/whisper-large-v3-turbo \ --output ov-whisper_large_v3_turbo-int8-2026.0.0 \ --cache-dir ./cache_dir ``` Or equivalently with `optimum-cli` directly: ```bash optimum-cli export openvino \ -m openai/whisper-large-v3-turbo \ --task automatic-speech-recognition-with-past \ --weight-format int8 \ --group-size 128 \ ov-whisper_large_v3_turbo-int8-2026.0.0 ``` ## Validate ```bash python validate_whisper_genai.py ov-whisper_large_v3_turbo-int8-2026.0.0 --device CPU ``` ## Files - `openvino_encoder_model.bin/.xml` -- Whisper encoder (INT8) - `openvino_decoder_model.bin/.xml` -- Whisper decoder with past/beam_idx (INT8) - `openvino_tokenizer.bin/.xml` -- Tokenizer - `openvino_detokenizer.bin/.xml` -- Detokenizer - `config.json`, `generation_config.json` -- Model configuration - `tokenizer.json`, `vocab.json`, `merges.txt` -- Tokenizer data - `export_whisper_int8_ov.py` -- Export script used to produce this model - `validate_whisper_genai.py` -- Smoke-test script - `requirements.txt` -- Pinned Python dependencies