Parakeet-TDT-0.6b-v3 Conformer encoder β CoreML / Apple Neural Engine
The Conformer encoder of nvidia/parakeet-tdt-0.6b-v3, converted to CoreML so it runs
on the Apple Neural Engine (ANE). Pair it with the MLX TDT decoder in
mlx-audio-swift: the encoder runs on the ANE
while decoding stays on the GPU/CPU.
This is the encoder only β you still need the MLX model
(beshkenadze/parakeet-tdt-0.6b-v3-mlx-fp16) for the decoder.
Format
- CoreML MLProgram, fp16 weights, fp32 I/O,
CPU_AND_NE. - Fixed input shape:
features [1, 128, 1000](1000 mel frames β 10 s) βencoded [1, 1024, 125]. A fixed shape is required for ANE residency (a dynamic axis drops it to 0%); chunks are padded to 1000 frames and the output cropped back.
Usage (mlx-audio-swift)
mlx-audio-swift-stt \
--model beshkenadze/parakeet-tdt-0.6b-v3-mlx-fp16 \
--audio input.wav --output-path out \
--coreml-encoder parakeet_enc_0.6b_v3.mlpackage \
--chunk-duration 9.95
Keep --chunk-duration β€ 10s (the fixed encoder length).
Measured (M1 Max, TED-LIUM 3 talk, 20.8 min)
| all-MLX | hybrid (this encoder) | |
|---|---|---|
| ANE residency | β | 100% (0 graph interruptions) |
| WER vs reference | 7.28% | 7.11% (agreement 1.07%) |
| RTF (Swift release) | ~95Γ | ~131Γ (~1.38Γ) |
| GPU power | 17.3 W | 3.0 W (Γ·5.8) |
The transcript is reproduced ~1:1; CoreML-fp16 is actually closer to fp32 than the
shipped MLX-bf16 encoder. Uses only public MLModel + MLComputeUnits APIs.
Conversion
Produced with the converter in
mlx-audio-swift tools/coreml-ane/:
NeMo encoder β torch.jit.trace (fixed shape) β coremltools (fp16 MLProgram,
CPU_AND_NE).
License follows the base model (nvidia/parakeet-tdt-0.6b-v3, CC-BY-4.0).
- Downloads last month
- 11
Model tree for beshkenadze/parakeet-tdt-0.6b-v3-coreml-ane
Base model
nvidia/parakeet-tdt-0.6b-v3