Whisper large-v3-turbo — Apple Core AI export (autoregressive)

A pre-converted .aimodel from Apple's official coreai-models Whisper recipe, packaged so it actually transcribes on the stock Core AI runtime (no engine patch).

Whisper large-v3-turbo (809 M) is OpenAI's multilingual ASR encoder-decoder.

What's different from the stock recipe (and why)

Apple's models/whisper/export.py traces the model with decoder_input_ids of shape [1, 1] — a single decode step, no KV cache. That graph can't be driven autoregressively: with one token and no cache, every step is "position 0" and loses all prior context (it emits nothing useful).

This bundle is the same recipe with one change: the decoder is traced at a fixed 128-token window (decoder_input_ids: [1, 128]). You pad the decoder buffer to 128 and read the logits at the real last position. Because the self-attention is causal, the real token at position k never attends to the padding, so the read is exact — and because the shape is constant, MPSGraph compiles once (a dynamic-length export instead recompiles every step → ~15 s/token; this is ~0.18 s/token). Everything else is the upstream recipe and it runs on the stock runtime.

# stock single-step recipe:           uv run models/whisper/export.py
# this bundle (fixed 128-token decode): see _export_whisper_fixed.py in the zoo conversion/ dir

Bundle

whisper-large-v3-turbo_float16_fixed128.aimodel/   main.mlirb + main.hash + metadata.json
tokenizer/                                         HF Whisper tokenizer (detokenize output ids)
mel_filters_128.npy                                [201, 128] mel filterbank for the audio frontend
preprocessor_config.json                           n_fft=400, hop=160, 128 mels, 16 kHz

File	SHA-256
`…_fixed128.aimodel/main.mlirb` (~1.5 GB)	`f5824a2e01906ad72bb3241573e75a41ecbf89c2ebe5fb8b87716752cf144881`
`…_fixed128.aimodel/main.hash`	`2bd169f0ca2812f7a6321f973a7bf88ef0ff37bc823265efb13e1beb12a7bf2c`
`mel_filters_128.npy`	`4eb6b0fe7aa985fa2ce80d81260d8b5b30ff908d2808a5d637683228c648db6f`

Measured (M4 Max, GPU)

Greedy decode, English clip, vs the HF PyTorch reference (generate, greedy):

Metric	Value
Transcript	token-for-token identical to PyTorch greedy
First step (compile + warmup)	0.68 s
Per token (steady state)	0.18 s

The fixed window caps a single 30 s decode at 128 tokens (enough for a 30 s window; chunk longer audio into 30 s segments).

How to run it (the decode loop)

The graph takes input_features [1, 128, 3000] (log-mel) + decoder_input_ids [1, 128] → logits [1, 128, 51866].

Audio → log-mel (mel_filters_128.npy, n_fft 400 / hop 160): STFT → power → mel filterbank → log10, clamp to max-8, (x+4)/4; pad/trim to 3000 frames.
Prompt: [<|startoftranscript|>, <|en|>, <|transcribe|>, <|notimestamps|>] (50258, 50259, 50360, 50364).
Loop: pad the prompt to 128, run, take argmax(logits[0, k]) at the real last index k, append, repeat until <|endoftext|> (50257).
Detokenize the generated ids with the bundled tokenizer.

Runs on macOS + iOS via coreai.runtime / the Swift CoreAI framework. See the CoreAITranscribe sample app.

Export environment

macOS 27.0 beta · coreai-core 1.0.0b1 · coreai-torch 0.4.0 · transformers 4.57
recipe: Apple models/whisper/export.py + a fixed 128-token decoder trace

License

Whisper is Apache-2.0 (OpenAI). This bundle is a format conversion and inherits that license.

Maintained alongside coreai-model-zoo (official/).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mlboydaisuke/whisper-large-v3-turbo-CoreAI-official

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(548)

this model