Omi Med STT v1 MLX q8

Apple Silicon q8 export of Omi Med STT v1.

This is the default Mac artifact used by the omi-med-stt CLI. It is much smaller than the full MLX export and keeps very similar benchmark quality.

Quickstart

pip install -U "omi-med-stt[mlx]"
omi-med-stt audio.wav

Explicit selection:

omi-med-stt audio.wav --runtime mlx --model omi-health/omi-med-stt-v1-mlx-q8

Evaluation

Full evaluation details: omi.health/research/omi-med-stt. Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster.

NeMo vs Open / Local Models

Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100.

Model WER M-WER Drug M-WER Medical Recall Speed: time / 1 hour audio (formula-derived x realtime)
VibeVoice-ASR 9B 11.10% 1.78% 1.36% 98.71% 5m 20s (11.2x)
Omi Med STT v1 NeMo 8.30% 2.37% 4.75% 97.95% 25s (146.3x)
Qwen3 ASR 1.7B 10.72% 3.13% 6.11% 97.21% 44s (81.1x)
Whisper Large v3 Turbo (A10) 11.98% 3.93% 5.88% 96.45% 1m 19s (45.8x)
Cohere Transcribe 03-2026 14.88% 5.05% 11.09% 95.16% 25s (146.3x)
Parakeet TDT 0.6B v3 15.26% 8.01% 9.50% 96.34% 23s (157.9x)
Parakeet TDT 0.6B v2 base 16.45% 8.36% 8.60% 96.20% 23s (153.8x)

Runtime Artifacts

Same internal evaluation as the canonical checkpoint.

Artifact WER M-WER Drug M-WER Medical Recall Speed: time / 1 hour audio (formula-derived x realtime)
NeMo canonical 8.30% 2.37% 4.75% 97.95% 25s (146.3x)
MLX full precision 8.59% 2.65% 5.20% 97.70% 56s (64.5x)
MLX q8 8.61% 2.75% 5.20% 97.63% 53s (67.4x)

Why q8 is the Mac default: it is smaller than full precision, easy to download, and did not worsen Drug M-WER versus the full MLX export in this evaluation.

Compatibility

This is not a drop-in parakeet-mlx checkpoint. Omi Med STT v1 includes a medical adapter, and the supported Mac path is the omi-med-stt CLI.

Links

Safety

Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage, prescribing, or clinical decision model, and it is not clinically validated. Transcripts must be reviewed before any clinical use.

Downloads last month
109
Safetensors
Model size
0.2B params
Tensor type
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for omi-health/omi-med-stt-v1-mlx-q8

Finetuned
(2)
this model