Instructions to use omi-health/omi-med-stt-v1-mlx-q8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use omi-health/omi-med-stt-v1-mlx-q8 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir omi-med-stt-v1-mlx-q8 omi-health/omi-med-stt-v1-mlx-q8
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Omi Med STT v1 MLX q8
Apple Silicon q8 export of Omi Med STT v1.
This is the default Mac artifact used by the omi-med-stt CLI. It is much
smaller than the full MLX export and keeps very similar benchmark quality.
Quickstart
pip install -U "omi-med-stt[mlx]"
omi-med-stt audio.wav
Explicit selection:
omi-med-stt audio.wav --runtime mlx --model omi-health/omi-med-stt-v1-mlx-q8
Evaluation
Full evaluation details: omi.health/research/omi-med-stt. Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster.
NeMo vs Open / Local Models
Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100.
| Model | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) |
|---|---|---|---|---|---|
| VibeVoice-ASR 9B | 11.10% | 1.78% | 1.36% | 98.71% | 5m 20s (11.2x) |
| Omi Med STT v1 NeMo | 8.30% | 2.37% | 4.75% | 97.95% | 25s (146.3x) |
| Qwen3 ASR 1.7B | 10.72% | 3.13% | 6.11% | 97.21% | 44s (81.1x) |
| Whisper Large v3 Turbo (A10) | 11.98% | 3.93% | 5.88% | 96.45% | 1m 19s (45.8x) |
| Cohere Transcribe 03-2026 | 14.88% | 5.05% | 11.09% | 95.16% | 25s (146.3x) |
| Parakeet TDT 0.6B v3 | 15.26% | 8.01% | 9.50% | 96.34% | 23s (157.9x) |
| Parakeet TDT 0.6B v2 base | 16.45% | 8.36% | 8.60% | 96.20% | 23s (153.8x) |
Runtime Artifacts
Same internal evaluation as the canonical checkpoint.
| Artifact | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) |
|---|---|---|---|---|---|
| NeMo canonical | 8.30% | 2.37% | 4.75% | 97.95% | 25s (146.3x) |
| MLX full precision | 8.59% | 2.65% | 5.20% | 97.70% | 56s (64.5x) |
| MLX q8 | 8.61% | 2.75% | 5.20% | 97.63% | 53s (67.4x) |
Why q8 is the Mac default: it is smaller than full precision, easy to download, and did not worsen Drug M-WER versus the full MLX export in this evaluation.
Compatibility
This is not a drop-in parakeet-mlx checkpoint. Omi Med STT v1 includes a
medical adapter, and the supported Mac path is the omi-med-stt CLI.
Links
- Canonical model:
omi-health/omi-med-stt-v1 - Full MLX export:
omi-health/omi-med-stt-v1-mlx - CPU GGUF export:
omi-health/omi-med-stt-v1-gguf - Runtime CLI:
Omi-Health/omi-med-stt-runtime - Broader evaluation and product context: omi.health/research/omi-med-stt
Safety
Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage, prescribing, or clinical decision model, and it is not clinically validated. Transcripts must be reviewed before any clinical use.
- Downloads last month
- 109
Quantized