---
license: other
track_downloads: true
language:
- en
pipeline_tag: automatic-speech-recognition
library_name: mlx
datasets:
- nvidia/Granary
- YTC
- Yodas2
- LibriLight
- librispeech_asr
- fisher_corpus
- Switchboard-1
- WSJ-0
- WSJ-1
- National-Singapore-Corpus-Part-1
- National-Singapore-Corpus-Part-6
- vctk
- voxpopuli
- europarl
- multilingual_librispeech
- fleurs
- mozilla-foundation/common_voice_8_0
- MLCommons/peoples_speech
- google/speech_commands
tags:
- quantized
- speech-recognition
- cache-aware ASR
- automatic-speech-recognition
- streaming-asr
- speech
- audio
- FastConformer
- RNNT
- Parakeet
- ASR
- pytorch
- NeMo
- mlx
base_model: nvidia/nemotron-speech-streaming-en-0.6b
base_model_relation: quantized
---

# **animaslabs/nemotron-speech-streaming-en-0.6b-mlx-4bit**

This model was converted to MLX format, 4-bit quantized from [nvidia/nemotron-speech-streaming-en-0.6b](https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b) using the scripts in this [github repo](https://github.com/animaslabs/mlx-models). Please refer to [original model card](https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b) for more details on the model.

## Usage

Quantized models require calling `mlx.nn.quantize()` before loading weights.

```python
import json
import mlx.nn as nn
from huggingface_hub import hf_hub_download
from parakeet_mlx.utils import from_config

# Download and load config
config_path = hf_hub_download("animaslabs/nemotron-speech-streaming-en-0.6b-mlx-4bit", "config.json")
with open(config_path) as f:
    config = json.load(f)

# Build model and apply quantization structure
model = from_config(config)
nn.quantize(
    model,
    bits=config["quantization"]["bits"],
    group_size=config["quantization"]["group_size"],
)

# Load quantized weights
weights_path = hf_hub_download("animaslabs/nemotron-speech-streaming-en-0.6b-mlx-4bit", "model.safetensors")
model.load_weights(weights_path)

# Transcribe
result = model.transcribe("audio.wav")
print(result.text)
```