--- library_name: mlx language: - en tags: - mlx - automatic-speech-recognition - speech - audio - FastConformer - Conformer - Parakeet - medical-asr - clinical-dialogue license: cc-by-4.0 pipeline_tag: automatic-speech-recognition base_model: omi-health/omi-med-stt-v1 --- # Omi Med STT v1 MLX Full-precision Apple Silicon / MLX export of [Omi Med STT v1](https://huggingface.co/omi-health/omi-med-stt-v1). For most Mac users, the smaller q8 export is the recommended default. Use this repo when you specifically want the full MLX weights. ## Quickstart ```bash pip install -U "omi-med-stt[mlx]" omi-med-stt audio.wav --runtime mlx --model omi-health/omi-med-stt-v1-mlx ``` ## Evaluation Full evaluation details: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/). Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster. ### NeMo vs Open / Local Models Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100. | Model | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) | |---|---:|---:|---:|---:|---:| | VibeVoice-ASR 9B | 11.10% | 1.78% | 1.36% | 98.71% | 5m 20s (11.2x) | | **Omi Med STT v1 NeMo** | **8.30%** | **2.37%** | **4.75%** | **97.95%** | **25s (146.3x)** | | Qwen3 ASR 1.7B | 10.72% | 3.13% | 6.11% | 97.21% | 44s (81.1x) | | Whisper Large v3 Turbo (A10) | 11.98% | 3.93% | 5.88% | 96.45% | 1m 19s (45.8x) | | Cohere Transcribe 03-2026 | 14.88% | 5.05% | 11.09% | 95.16% | 25s (146.3x) | | Parakeet TDT 0.6B v3 | 15.26% | 8.01% | 9.50% | 96.34% | 23s (157.9x) | | Parakeet TDT 0.6B v2 base | 16.45% | 8.36% | 8.60% | 96.20% | 23s (153.8x) | ### Runtime Artifacts Same internal evaluation as the canonical checkpoint. | Artifact | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) | |---|---:|---:|---:|---:|---:| | NeMo canonical | 8.30% | 2.37% | 4.75% | 97.95% | 25s (146.3x) | | **MLX full precision** | **8.59%** | **2.65%** | **5.20%** | **97.70%** | **56s (64.5x)** | | MLX q8 | 8.61% | 2.75% | 5.20% | 97.63% | 53s (67.4x) | The full MLX export is slightly ahead of q8 on M-WER, but q8 is much smaller and is the default Mac artifact. ## Compatibility This is not a drop-in `parakeet-mlx` checkpoint. Omi Med STT v1 includes a medical adapter, and the supported Mac path is the `omi-med-stt` CLI. ## Links - Canonical model: [`omi-health/omi-med-stt-v1`](https://huggingface.co/omi-health/omi-med-stt-v1) - Mac q8 default: [`omi-health/omi-med-stt-v1-mlx-q8`](https://huggingface.co/omi-health/omi-med-stt-v1-mlx-q8) - CPU GGUF export: [`omi-health/omi-med-stt-v1-gguf`](https://huggingface.co/omi-health/omi-med-stt-v1-gguf) - Runtime CLI: [`Omi-Health/omi-med-stt-runtime`](https://github.com/Omi-Health/omi-med-stt-runtime) - Broader evaluation and product context: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/) ## Safety Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage, prescribing, or clinical decision model, and it is not clinically validated. Transcripts must be reviewed before any clinical use.