--- license: other track_downloads: true language: - en pipeline_tag: automatic-speech-recognition library_name: mlx datasets: - nvidia/Granary - YTC - Yodas2 - LibriLight - librispeech_asr - fisher_corpus - Switchboard-1 - WSJ-0 - WSJ-1 - National-Singapore-Corpus-Part-1 - National-Singapore-Corpus-Part-6 - vctk - voxpopuli - europarl - multilingual_librispeech - fleurs - mozilla-foundation/common_voice_8_0 - MLCommons/peoples_speech - google/speech_commands tags: - quantized - speech-recognition - cache-aware ASR - automatic-speech-recognition - streaming-asr - speech - audio - FastConformer - RNNT - Parakeet - ASR - pytorch - NeMo - mlx base_model: nvidia/nemotron-speech-streaming-en-0.6b base_model_relation: quantized --- # **animaslabs/nemotron-speech-streaming-en-0.6b-mlx-4bit** This model was converted to MLX format, 4-bit quantized from [nvidia/nemotron-speech-streaming-en-0.6b](https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b) using the scripts in this [github repo](https://github.com/animaslabs/mlx-models). Please refer to [original model card](https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b) for more details on the model. ## Usage Quantized models require calling `mlx.nn.quantize()` before loading weights. ```python import json import mlx.nn as nn from huggingface_hub import hf_hub_download from parakeet_mlx.utils import from_config # Download and load config config_path = hf_hub_download("animaslabs/nemotron-speech-streaming-en-0.6b-mlx-4bit", "config.json") with open(config_path) as f: config = json.load(f) # Build model and apply quantization structure model = from_config(config) nn.quantize( model, bits=config["quantization"]["bits"], group_size=config["quantization"]["group_size"], ) # Load quantized weights weights_path = hf_hub_download("animaslabs/nemotron-speech-streaming-en-0.6b-mlx-4bit", "model.safetensors") model.load_weights(weights_path) # Transcribe result = model.transcribe("audio.wav") print(result.text) ```