Whisper large-v3-turbo β Apple Core AI export (autoregressive)
A pre-converted .aimodel from Apple's official
coreai-models Whisper recipe, packaged so it
actually transcribes on the stock Core AI runtime (no engine patch).
Whisper large-v3-turbo (809 M) is OpenAI's multilingual ASR encoder-decoder.
What's different from the stock recipe (and why)
Apple's models/whisper/export.py traces the model with decoder_input_ids of shape
[1, 1] β a single decode step, no KV cache. That graph can't be driven
autoregressively: with one token and no cache, every step is "position 0" and loses all
prior context (it emits nothing useful).
This bundle is the same recipe with one change: the decoder is traced at a fixed
128-token window (decoder_input_ids: [1, 128]). You pad the decoder buffer to 128 and
read the logits at the real last position. Because the self-attention is causal, the real
token at position k never attends to the padding, so the read is exact β and because the
shape is constant, MPSGraph compiles once (a dynamic-length export instead recompiles
every step β ~15 s/token; this is ~0.18 s/token). Everything else is the upstream recipe and
it runs on the stock runtime.
# stock single-step recipe: uv run models/whisper/export.py
# this bundle (fixed 128-token decode): see _export_whisper_fixed.py in the zoo conversion/ dir
Bundle
whisper-large-v3-turbo_float16_fixed128.aimodel/ main.mlirb + main.hash + metadata.json
tokenizer/ HF Whisper tokenizer (detokenize output ids)
mel_filters_128.npy [201, 128] mel filterbank for the audio frontend
preprocessor_config.json n_fft=400, hop=160, 128 mels, 16 kHz
| File | SHA-256 |
|---|---|
β¦_fixed128.aimodel/main.mlirb (~1.5 GB) |
f5824a2e01906ad72bb3241573e75a41ecbf89c2ebe5fb8b87716752cf144881 |
β¦_fixed128.aimodel/main.hash |
2bd169f0ca2812f7a6321f973a7bf88ef0ff37bc823265efb13e1beb12a7bf2c |
mel_filters_128.npy |
4eb6b0fe7aa985fa2ce80d81260d8b5b30ff908d2808a5d637683228c648db6f |
Measured (M4 Max, GPU)
Greedy decode, English clip, vs the HF PyTorch reference (generate, greedy):
| Metric | Value |
|---|---|
| Transcript | token-for-token identical to PyTorch greedy |
| First step (compile + warmup) | 0.68 s |
| Per token (steady state) | 0.18 s |
The fixed window caps a single 30 s decode at 128 tokens (enough for a 30 s window; chunk longer audio into 30 s segments).
How to run it (the decode loop)
The graph takes input_features [1, 128, 3000] (log-mel) + decoder_input_ids [1, 128]
β logits [1, 128, 51866].
- Audio β log-mel (
mel_filters_128.npy, n_fft 400 / hop 160): STFT β power β mel filterbank βlog10, clamp tomax-8,(x+4)/4; pad/trim to 3000 frames. - Prompt:
[<|startoftranscript|>, <|en|>, <|transcribe|>, <|notimestamps|>](50258, 50259, 50360, 50364). - Loop: pad the prompt to 128, run, take
argmax(logits[0, k])at the real last indexk, append, repeat until<|endoftext|>(50257). - Detokenize the generated ids with the bundled tokenizer.
Runs on macOS + iOS via coreai.runtime / the Swift CoreAI framework. See the
CoreAITranscribe sample app.
Export environment
- macOS 27.0 beta Β·
coreai-core 1.0.0b1Β·coreai-torch 0.4.0Β·transformers 4.57 - recipe: Apple
models/whisper/export.py+ a fixed 128-token decoder trace
License
Whisper is Apache-2.0 (OpenAI). This bundle is a format conversion and inherits that license.
Maintained alongside coreai-model-zoo
(official/).
Model tree for mlboydaisuke/whisper-large-v3-turbo-CoreAI-official
Base model
openai/whisper-large-v3