Omi Med STT v1 GGUF

GGUF export of Omi Med STT v1 for Linux and Windows CPU use through the omi-med-stt CLI.

This is the portability path. If you have Apple Silicon, use the MLX q8 repo. If you have an NVIDIA GPU, use the canonical NeMo checkpoint.

Quickstart

pip install -U omi-med-stt
omi-med-stt install-cpp --cpp-backend cpu
omi-med-stt audio.wav --runtime cpp

Files

File Status
omi-med-stt-v1-q8_0.gguf Default CPU artifact, benchmarked
omi-med-stt-v1-f16.gguf Provided for conversion/experimentation; not independently benchmarked

Evaluation

Full evaluation details: omi.health/research/omi-med-stt. Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster.

NeMo vs Open / Local Models

Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100.

Model WER M-WER Drug M-WER Medical Recall Speed: time / 1 hour audio (formula-derived x realtime)
VibeVoice-ASR 9B 11.10% 1.78% 1.36% 98.71% 5m 20s (11.2x)
Omi Med STT v1 NeMo 8.30% 2.37% 4.75% 97.95% 25s (146.3x)
Qwen3 ASR 1.7B 10.72% 3.13% 6.11% 97.21% 44s (81.1x)
Whisper Large v3 Turbo (A10) 11.98% 3.93% 5.88% 96.45% 1m 19s (45.8x)
Cohere Transcribe 03-2026 14.88% 5.05% 11.09% 95.16% 25s (146.3x)
Parakeet TDT 0.6B v3 15.26% 8.01% 9.50% 96.34% 23s (157.9x)
Parakeet TDT 0.6B v2 base 16.45% 8.36% 8.60% 96.20% 23s (153.8x)

Runtime Artifacts

Same internal evaluation as the canonical checkpoint.

Artifact WER M-WER Drug M-WER Medical Recall Speed: time / 1 hour audio (formula-derived x realtime)
NeMo canonical 8.30% 2.37% 4.75% 97.95% 25s (146.3x)
MLX q8 8.61% 2.75% 5.20% 97.63% 53s (67.4x)
GGUF q8_0 9.12% 3.20% 6.33% 97.53% 2m 53s (20.8x)

The GGUF q8_0 build is useful when CPU portability matters. It is not the quality-leading artifact.

Compatibility

These files are not llama.cpp text-model GGUF files. They require a Parakeet ASR runtime. The supported path is:

omi-med-stt audio.wav --runtime cpp

The CLI installs the patched parakeet.cpp runtime needed for Omi Med STT v1.

Links

Safety

Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage, prescribing, or clinical decision model, and it is not clinically validated. Transcripts must be reviewed before any clinical use.

Downloads last month
198
GGUF
Model size
0.6B params
Architecture
parakeet
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for omi-health/omi-med-stt-v1-gguf

Quantized
(17)
this model