Reset repository history to current release state

c783336 8 days ago

3.58 kB

	---
	license: cc-by-4.0
	language:
	- en
	library_name: gguf
	tags:
	- automatic-speech-recognition
	- medical
	- parakeet
	- gguf
	- parakeet.cpp
	- omi-med-stt
	pipeline_tag: automatic-speech-recognition
	base_model: nvidia/parakeet-tdt-0.6b-v2
	---

	# Omi Med STT v1 GGUF

	GGUF export of [Omi Med STT v1](https://huggingface.co/omi-health/omi-med-stt-v1)
	for Linux and Windows CPU use through the `omi-med-stt` CLI.

	This is the portability path. If you have Apple Silicon, use the MLX q8 repo. If
	you have an NVIDIA GPU, use the canonical NeMo checkpoint.

	## Quickstart

	```bash
	pip install -U omi-med-stt
	omi-med-stt install-cpp --cpp-backend cpu
	omi-med-stt audio.wav --runtime cpp
	```

	## Files

	\| File \| Status \|
	\|---\|---\|
	\| `omi-med-stt-v1-q8_0.gguf` \| Default CPU artifact, benchmarked \|
	\| `omi-med-stt-v1-f16.gguf` \| Provided for conversion/experimentation; not independently benchmarked \|

	## Evaluation

	Full evaluation details: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/).
	Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster.

	### NeMo vs Open / Local Models

	Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100.

	\| Model \| WER \| M-WER \| Drug M-WER \| Medical Recall \| Speed: time / 1 hour audio (formula-derived x realtime) \|
	\|---\|---:\|---:\|---:\|---:\|---:\|
	\| VibeVoice-ASR 9B \| 11.10% \| 1.78% \| 1.36% \| 98.71% \| 5m 20s (11.2x) \|
	\| Omi Med STT v1 NeMo \| 8.30% \| 2.37% \| 4.75% \| 97.95% \| 25s (146.3x) \|
	\| Qwen3 ASR 1.7B \| 10.72% \| 3.13% \| 6.11% \| 97.21% \| 44s (81.1x) \|
	\| Whisper Large v3 Turbo (A10) \| 11.98% \| 3.93% \| 5.88% \| 96.45% \| 1m 19s (45.8x) \|
	\| Cohere Transcribe 03-2026 \| 14.88% \| 5.05% \| 11.09% \| 95.16% \| 25s (146.3x) \|
	\| Parakeet TDT 0.6B v3 \| 15.26% \| 8.01% \| 9.50% \| 96.34% \| 23s (157.9x) \|
	\| Parakeet TDT 0.6B v2 base \| 16.45% \| 8.36% \| 8.60% \| 96.20% \| 23s (153.8x) \|

	### Runtime Artifacts

	Same internal evaluation as the canonical checkpoint.

	\| Artifact \| WER \| M-WER \| Drug M-WER \| Medical Recall \| Speed: time / 1 hour audio (formula-derived x realtime) \|
	\|---\|---:\|---:\|---:\|---:\|---:\|
	\| NeMo canonical \| 8.30% \| 2.37% \| 4.75% \| 97.95% \| 25s (146.3x) \|
	\| MLX q8 \| 8.61% \| 2.75% \| 5.20% \| 97.63% \| 53s (67.4x) \|
	\| GGUF q8_0 \| 9.12% \| 3.20% \| 6.33% \| 97.53% \| 2m 53s (20.8x) \|

	The GGUF q8_0 build is useful when CPU portability matters. It is not the
	quality-leading artifact.

	## Compatibility

	These files are not llama.cpp text-model GGUF files. They require a Parakeet
	ASR runtime. The supported path is:

	```bash
	omi-med-stt audio.wav --runtime cpp
	```

	The CLI installs the patched `parakeet.cpp` runtime needed for Omi Med STT v1.

	## Links

	- Canonical model: [`omi-health/omi-med-stt-v1`](https://huggingface.co/omi-health/omi-med-stt-v1)
	- Mac q8 default: [`omi-health/omi-med-stt-v1-mlx-q8`](https://huggingface.co/omi-health/omi-med-stt-v1-mlx-q8)
	- Runtime CLI: [`Omi-Health/omi-med-stt-runtime`](https://github.com/Omi-Health/omi-med-stt-runtime)
	- Broader evaluation and product context: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/)
	- parakeet.cpp: [`mudler/parakeet.cpp`](https://github.com/mudler/parakeet.cpp)

	## Safety

	Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage,
	prescribing, or clinical decision model, and it is not clinically validated.
	Transcripts must be reviewed before any clinical use.