---
license: apache-2.0
language:
  - en
  - zh
  - de
  - es
  - ru
  - ko
  - fr
  - ja
  - pt
  - tr
  - pl
  - ca
  - nl
  - ar
  - sv
  - it
  - id
  - hi
  - fi
  - vi
  - he
  - uk
  - el
  - ms
  - cs
  - ro
  - da
  - hu
  - ta
  - "no"
  - th
  - ur
  - hr
  - bg
  - lt
  - la
  - mi
  - ml
  - cy
  - sk
  - te
  - fa
  - lv
  - bn
  - sr
  - az
  - sl
  - kn
  - et
  - mk
  - br
  - eu
  - is
  - hy
  - ne
  - mn
  - bs
  - kk
  - sq
  - sw
  - gl
  - mr
  - pa
  - si
  - km
  - sn
  - yo
  - so
  - af
  - oc
  - ka
  - be
  - tg
  - sd
  - gu
  - am
  - yi
  - lo
  - uz
  - fo
  - ht
  - ps
  - tk
  - nn
  - mt
  - sa
  - lb
  - my
  - bo
  - tl
  - mg
  - as
  - tt
  - haw
  - ln
  - ha
  - ba
  - jw
  - su
  - yue
tags:
  - automatic-speech-recognition
  - openvino
  - whisper
  - int8
  - quantized
base_model: openai/whisper-large-v3-turbo
library_name: openvino
pipeline_tag: automatic-speech-recognition
---

# ov-whisper_large_v3_turbo-int8-2026.0.0

[openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) exported to OpenVINO IR with **INT8 asymmetric weight compression** (group size 128).

The model layout targets `openvino_genai.WhisperPipeline` and includes stateful decoder (`-with-past`), tokenizer, and detokenizer.

Whisper large-v3-turbo is a distilled version of Whisper large-v3 that is 6x faster with minimal quality loss. It supports 99 languages.

## Quantization details

| Parameter | Value |
|-----------|-------|
| Source model | `openai/whisper-large-v3-turbo` |
| Weight format | INT8 asymmetric (per-channel) |
| Group size | 128 |
| Encoder layers compressed | 194 / 194 (100%) |
| Decoder layers compressed | 42 / 42 (100%) |
| Task | `automatic-speech-recognition-with-past` |

## Toolchain

| Package | Version |
|---------|---------|
| Python | 3.11.9 |
| openvino | 2026.0.0 |
| openvino-genai | 2026.0.0.0 |
| openvino-tokenizers | 2026.0.0.0 |
| optimum-intel | 1.27.0 |
| optimum | 2.1.0 |
| nncf | 3.0.0 |
| transformers | 4.57.6 |
| torch | 2.11.0 |

## Usage

```python
import numpy as np
import openvino_genai as ov_genai

pipe = ov_genai.WhisperPipeline("ov-whisper_large_v3_turbo-int8-2026.0.0", "CPU")

# Load audio as 16 kHz float32 mono (e.g. via librosa)
import librosa
samples, _ = librosa.load("audio.wav", sr=16000, mono=True)
samples = np.asarray(samples, dtype=np.float32)

result = pipe.generate(samples)
print(result.text)
```

Supported devices: `CPU`, `GPU`, `NPU` (tested on Intel Core Ultra 7 255H / Arc 140T / AI Boost).

## Reproduce the export

```bash
pip install -r requirements.txt
python export_whisper_int8_ov.py \
    --model openai/whisper-large-v3-turbo \
    --output ov-whisper_large_v3_turbo-int8-2026.0.0 \
    --cache-dir ./cache_dir
```

Or equivalently with `optimum-cli` directly:

```bash
optimum-cli export openvino \
    -m openai/whisper-large-v3-turbo \
    --task automatic-speech-recognition-with-past \
    --weight-format int8 \
    --group-size 128 \
    ov-whisper_large_v3_turbo-int8-2026.0.0
```

## Validate

```bash
python validate_whisper_genai.py ov-whisper_large_v3_turbo-int8-2026.0.0 --device CPU
```

## Files

- `openvino_encoder_model.bin/.xml` -- Whisper encoder (INT8)
- `openvino_decoder_model.bin/.xml` -- Whisper decoder with past/beam_idx (INT8)
- `openvino_tokenizer.bin/.xml` -- Tokenizer
- `openvino_detokenizer.bin/.xml` -- Detokenizer
- `config.json`, `generation_config.json` -- Model configuration
- `tokenizer.json`, `vocab.json`, `merges.txt` -- Tokenizer data
- `export_whisper_int8_ov.py` -- Export script used to produce this model
- `validate_whisper_genai.py` -- Smoke-test script
- `requirements.txt` -- Pinned Python dependencies