| --- |
| license: cc-by-4.0 |
| language: |
| - en |
| library_name: gguf |
| tags: |
| - automatic-speech-recognition |
| - medical |
| - parakeet |
| - gguf |
| - parakeet.cpp |
| - omi-med-stt |
| pipeline_tag: automatic-speech-recognition |
| base_model: nvidia/parakeet-tdt-0.6b-v2 |
| --- |
| |
| # Omi Med STT v1 GGUF |
|
|
| GGUF export of [Omi Med STT v1](https://huggingface.co/omi-health/omi-med-stt-v1) |
| for Linux and Windows CPU use through the `omi-med-stt` CLI. |
|
|
| This is the portability path. If you have Apple Silicon, use the MLX q8 repo. If |
| you have an NVIDIA GPU, use the canonical NeMo checkpoint. |
|
|
| ## Quickstart |
|
|
| ```bash |
| pip install -U omi-med-stt |
| omi-med-stt install-cpp --cpp-backend cpu |
| omi-med-stt audio.wav --runtime cpp |
| ``` |
|
|
| ## Files |
|
|
| | File | Status | |
| |---|---| |
| | `omi-med-stt-v1-q8_0.gguf` | Default CPU artifact, benchmarked | |
| | `omi-med-stt-v1-f16.gguf` | Provided for conversion/experimentation; not independently benchmarked | |
|
|
| ## Evaluation |
|
|
| Full evaluation details: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/). |
| Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster. |
|
|
| ### NeMo vs Open / Local Models |
|
|
| Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100. |
|
|
| | Model | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) | |
| |---|---:|---:|---:|---:|---:| |
| | VibeVoice-ASR 9B | 11.10% | 1.78% | 1.36% | 98.71% | 5m 20s (11.2x) | |
| | **Omi Med STT v1 NeMo** | **8.30%** | **2.37%** | **4.75%** | **97.95%** | **25s (146.3x)** | |
| | Qwen3 ASR 1.7B | 10.72% | 3.13% | 6.11% | 97.21% | 44s (81.1x) | |
| | Whisper Large v3 Turbo (A10) | 11.98% | 3.93% | 5.88% | 96.45% | 1m 19s (45.8x) | |
| | Cohere Transcribe 03-2026 | 14.88% | 5.05% | 11.09% | 95.16% | 25s (146.3x) | |
| | Parakeet TDT 0.6B v3 | 15.26% | 8.01% | 9.50% | 96.34% | 23s (157.9x) | |
| | Parakeet TDT 0.6B v2 base | 16.45% | 8.36% | 8.60% | 96.20% | 23s (153.8x) | |
|
|
| ### Runtime Artifacts |
|
|
| Same internal evaluation as the canonical checkpoint. |
|
|
| | Artifact | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) | |
| |---|---:|---:|---:|---:|---:| |
| | NeMo canonical | 8.30% | 2.37% | 4.75% | 97.95% | 25s (146.3x) | |
| | MLX q8 | 8.61% | 2.75% | 5.20% | 97.63% | 53s (67.4x) | |
| | **GGUF q8_0** | **9.12%** | **3.20%** | **6.33%** | **97.53%** | **2m 53s (20.8x)** | |
| |
| The GGUF q8_0 build is useful when CPU portability matters. It is not the |
| quality-leading artifact. |
| |
| ## Compatibility |
| |
| These files are **not llama.cpp text-model GGUF files**. They require a Parakeet |
| ASR runtime. The supported path is: |
| |
| ```bash |
| omi-med-stt audio.wav --runtime cpp |
| ``` |
| |
| The CLI installs the patched `parakeet.cpp` runtime needed for Omi Med STT v1. |
| |
| ## Links |
| |
| - Canonical model: [`omi-health/omi-med-stt-v1`](https://huggingface.co/omi-health/omi-med-stt-v1) |
| - Mac q8 default: [`omi-health/omi-med-stt-v1-mlx-q8`](https://huggingface.co/omi-health/omi-med-stt-v1-mlx-q8) |
| - Runtime CLI: [`Omi-Health/omi-med-stt-runtime`](https://github.com/Omi-Health/omi-med-stt-runtime) |
| - Broader evaluation and product context: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/) |
| - parakeet.cpp: [`mudler/parakeet.cpp`](https://github.com/mudler/parakeet.cpp) |
| |
| ## Safety |
| |
| Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage, |
| prescribing, or clinical decision model, and it is not clinically validated. |
| Transcripts must be reviewed before any clinical use. |
| |