Instructions to use aufklarer/Omnilingual-ASR-CTC-1B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use aufklarer/Omnilingual-ASR-CTC-1B-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Omnilingual-ASR-CTC-1B-MLX-8bit aufklarer/Omnilingual-ASR-CTC-1B-MLX-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Omnilingual ASR β CTC 1B (MLX 8-bit)
MLX-compatible 8-bit quantization of Meta's Omnilingual ASR CTC-1B model for on-device inference on Apple Silicon (M1/M2/M3/M4). Prefer this variant when you need the smallest possible WER regression from fp32 and can afford an extra ~460 MB compared to the 4-bit build.
Omnilingual ASR is a wav2vec 2.0-style encoder-only model with a linear CTC head, trained by Meta for speech recognition across 1,600+ languages. The CTC variant is language-agnostic at inference time.
Model
| Parameters | 1.01 B |
| Format | MLX safetensors (quantized linear layers + fp16 features) |
| Quantization | 8-bit per-group min-max, group size 64 |
| Encoder layers | 48 |
| Encoder dim | 1280 |
| Attention heads | 20 |
| FFN dim | 5120 |
| Sample rate | 16 kHz (raw waveform input) |
| Frame rate | 50 fps |
| Max duration | 40 s |
| Languages | 1,600+ |
| Vocabulary | 10,288 SentencePiece tokens |
Files
| File | Size | Description |
|---|---|---|
model.safetensors |
1006 MB | 8-bit quantized transformer weights + fp16 conv frontend |
tokenizer.model |
1.2 MB | SentencePiece tokenizer |
config.json |
<1 KB | Architecture + quantization metadata |
Architecture
Wav2Vec2FeatureExtractor (7-layer CNN, 320Γ downsample) β Linear 512β1280 β conv position encoder β 48Γ pre-norm Transformer encoder (dim 1280, 20 heads, ffn 5120) β LayerNorm β Linear CTC head (β 10,288 tokens).
Performance
See the 4-bit variant for architecture notes and the 300M reference for FLEURS WER across en/fr/de/ar/hi. The 1B model is ~3Γ the encoder capacity and delivers correspondingly lower WER on low-resource languages.
Source
- Upstream model: facebook/omniASR-CTC-1B
- Paper: Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
- Meta blog: Omnilingual ASR announcement
Links
- speech-swift β Apple SDK
- soniqo.audio β website
- blog
License
Apache 2.0 (inherited from upstream).
- Guide: soniqo.audio/guides/omnilingual
- Docs: soniqo.audio
- GitHub: soniqo/speech-swift
- Downloads last month
- 8
Quantized
Model tree for aufklarer/Omnilingual-ASR-CTC-1B-MLX-8bit
Base model
facebook/omniASR-CTC-1B