Instructions to use kinoppy555/Pathumma-whisper-th-large-v3-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use kinoppy555/Pathumma-whisper-th-large-v3-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Pathumma-whisper-th-large-v3-mlx kinoppy555/Pathumma-whisper-th-large-v3-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Pathumma-whisper-th-large-v3 (MLX)
Unofficial community conversion. This repository is a community-maintained MLX-format conversion. It is not affiliated with, endorsed by, or maintained by NECTEC. Model weights are unchanged from the upstream release; only the storage format has been converted.
This repository provides NECTEC's Pathumma-whisper-th-large-v3 pre-converted to Apple MLX format, so Apple Silicon users can load the model directly with mlx-whisper without re-running the conversion (which takes ~5–10 minutes plus a 3 GB upstream download).
Apple Silicon required. MLX is not supported on Intel Macs or non-Apple platforms. If you are not on an M-series Mac, use the original nectec/Pathumma-whisper-th-large-v3 with HuggingFace Transformers or faster-whisper instead.
Why Pathumma over generic Whisper-large-v3
Generic Whisper-large-v3 (OpenAI release, or mlx-community/whisper-large-v3-mlx) is widely reported by the community to produce repetition-loop hallucinations on Thai audio — the same word or token repeats indefinitely until the segment ends. Pathumma is a Thai-language full fine-tune that mitigates this issue and reports the lowest CER among open-source offline baselines in the Typhoon ASR Real-time paper.
In informal testing on a 73-second Thai social-media audio clip:
- Generic
mlx-community/whisper-large-v3-mlx: produced repetition-loop output on the first run. - This Pathumma MLX conversion: produced fluent Thai output without repetition loops.
This is not a controlled hallucination-rate study, only an anecdotal observation. Readers who care about hallucination behaviour should evaluate on their own data.
Reported benchmarks (CER %, lower is better)
The CER values below are reported in the literature for the original Pathumma model. They have not been re-measured against this MLX conversion. Because the conversion preserves the FP16 weights bit-identically (no retraining, no quantization beyond FP16), accuracy should be equivalent up to floating-point conversion noise; this has not been independently verified in this repository.
Source: Sirichotedumrong et al., "Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition", arXiv:2601.13044, Table 6 (Impact of Data Quality on Model Performance).
| Model | TVSpeech (noisy) | Gigaspeech2-Th | FLEURS-Th (Orig. / Norm.) |
|---|---|---|---|
| Biodatlab Whisper Large (Thonburian) | 18.96 | 13.22 | 16.50 / 15.26 |
| Biodatlab Distil-Whisper Large | 13.82 | 8.24 | 6.77 / 8.63 |
| Pathumma Whisper Large-v3 | 10.36 | 5.84 | 6.29 / 7.88 |
| (reference) Typhoon Whisper Large-v3 | 6.32 | 4.69 | 9.98 / 5.69 |
The FLEURS-Th column has two values in the source: Orig. (original references as released) and Norm. (after the Typhoon team's normalization pipeline). Pathumma achieves the best Orig. score; Typhoon Whisper Large-v3 achieves the best Norm. score. Neither is universally "best" — the right choice depends on whether your downstream pipeline normalizes Thai numbers, the mai yamok repetition marker, and similar conventions.
Usage
# Apple Silicon Mac required
pip install mlx-whisper
# ffmpeg is required by mlx-whisper to decode .m4a / .mp3 / .mp4
brew install ffmpeg
import mlx_whisper
result = mlx_whisper.transcribe(
"audio.wav",
path_or_hf_repo="kinoppy555/Pathumma-whisper-th-large-v3-mlx",
language="th",
)
print(result["text"])
Passing language="th" explicitly is recommended. Whisper's automatic language detection can mislabel Thai-only content when the first 30 seconds contain music, silence, or code-switched English; specifying the language avoids these failure modes.
Anecdotal speed
On an Apple M3 Pro (36 GB unified memory), transcribing a 73-second Thai social-media audio clip:
- This MLX conversion (Pathumma): ~85 s wall time (≈0.87× real-time, including model load).
mlx-community/whisper-large-v3-mlx(generic): ~100–230 s, output sometimes degraded by repetition loops.
These are single-run informal numbers, not a benchmark.
Conversion details
Converted using ml-explore/mlx-examples/whisper/convert.py at commit e52c128d113f10546f0fa391f87edcc58d3880cb (sha256 of convert.py: 1f1de41ac5d3faeb241bcea97a3f99c760b802cd92e02715699c17b5658f5cb2).
python3 convert.py \
--torch-name-or-path nectec/Pathumma-whisper-th-large-v3 \
--mlx-path ./pathumma-th-large-v3-mlx \
--dtype float16
Note for re-converters only. Upstream
convert.pywrites weights asmodel.safetensors, butmlx_whisper.load_modelsexpectsweights.safetensors. After running the command above, rename the file:mv pathumma-th-large-v3-mlx/model.safetensors pathumma-th-large-v3-mlx/weights.safetensorsThe pre-converted files in this Hugging Face repository are already named correctly; the rename is only needed if you reproduce the conversion from scratch.
Limitations
- Format conversion only. No retraining, no fine-tuning, no quantization beyond the FP16 down-cast performed by
convert.py --dtype float16. All limitations of the upstream Pathumma model are inherited. - Tokenizer.
mlx-whisperloads tokenizer assets from the openai-whisper distribution rather than this repo, so this repository ships onlyweights.safetensorsandconfig.json. - Hallucinations. Whisper-family models can still produce repetition loops or skip segments on out-of-domain audio (very noisy, very short, multi-speaker overlap, code-switched audio, music). The mitigation described above is anecdotal, not a controlled study.
- Streaming. Pathumma is an offline encoder-decoder Whisper. For real-time / low-latency streaming Thai ASR, see Typhoon ASR Real-time.
- Domain coverage. Trained primarily on the corpora described in the upstream NECTEC release; performance on highly specialised domains (medical, legal, regional dialects) is not characterised here.
Files
| File | Size | Description |
|---|---|---|
weights.safetensors |
~2.9 GB | MLX-formatted FP16 weights (bit-identical conversion of the original) |
config.json |
<1 KB | MLX whisper model dimensions |
LICENSE |
11 KB | Apache License 2.0 (inherited from upstream) |
License
Apache License 2.0, inherited from nectec/Pathumma-whisper-th-large-v3. The full text is included in the LICENSE file in this repository. This MLX-format derivative is distributed under the same Apache License 2.0.
This repository constitutes a "Modification" under Apache 2.0 §1: weights have been converted from the upstream PyTorch checkpoint format to MLX-format safetensors (FP16). No model weights, architecture, training data, or training procedure have been changed.
Credits
- Original model: NECTEC (National Electronics and Computer Technology Center), Thailand —
nectec/Pathumma-whisper-th-large-v3. Authors: Pattara Tipaksorn, Wayupuk Sommuang, Oatsada Chatthong, Kwanchiva Thangthai (Pathumma Audio Team). - Base architecture: OpenAI Whisper-large-v3 (
openai/whisper-large-v3). - MLX framework: Apple Inc. —
ml-explore/mlx. - Conversion script:
ml-explore/mlx-examples(Apache 2.0).
Citation
If you use Pathumma in your research, please cite the upstream NECTEC work as instructed on the source model card:
@misc{tipaksorn2024PathummaWhisper,
title = { {Pathumma Whisper Large V3 (TH)} },
author = { Pattara Tipaksorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
url = { https://huggingface.co/nectec/Pathumma-whisper-th-large-v3 },
publisher = { Hugging Face },
year = { 2024 },
}
For the benchmark numbers reproduced above, please cite:
@misc{sirichotedumrong2026typhoon,
title = { Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition },
author = { Warit Sirichotedumrong and Adisai Na-Thalang and Potsawee Manakul and Pittawat Taveekitworachai and Sittipong Sripaisarnmongkol and Kunat Pipatanakul },
eprint = { 2601.13044 },
archivePrefix = { arXiv },
year = { 2026 },
}
- Downloads last month
- 30
Quantized
Model tree for kinoppy555/Pathumma-whisper-th-large-v3-mlx
Base model
openai/whisper-large-v3