--- language: - fa license: apache-2.0 library_name: nemo tags: - automatic-speech-recognition - speech - persian - farsi - nvfp4 - nemo - modelopt - quantized base_model: - Reza2kn/visualears-fastconformer-fa-full-ab base_model_relation: quantized datasets: - Reza2kn/persian-asr-eval-v0 metrics: - wer - cer pipeline_tag: automatic-speech-recognition --- # visualears-fastconformer-fa-full-ab-nvfp4 NVFP4 (W4A4) post-training quantization of [`Reza2kn/visualears-fastconformer-fa-full-ab`](https://huggingface.co/Reza2kn/visualears-fastconformer-fa-full-ab) via NVIDIA `modelopt`. - **Base architecture:** EncDecHybridRNNTCTCBPEModel (NeMo) - **Calibration:** 32 Persian clips from `Reza2kn/persian-asr-eval-v0` (held out from eval). - **Hardware target:** NVIDIA Blackwell tensor cores. ## Eval — `Reza2kn/persian-asr-eval-v0` (FLEURS-fa, 200 clips) | Variant | WER ↓ | CER ↓ | per-clip latency | peak VRAM | |---|---|---|---|---| | **NVFP4 (this repo)** | **20.33%** | **7.38%** | 48 ms | 603 MiB | ## Usage ```python import nemo.collections.asr as nemo_asr m = nemo_asr.models.ASRModel.restore_from("visualears-fastconformer-fa-full-ab-NVFP4.nemo").cuda().eval() transcripts = m.transcribe(["clip.wav"]) print(transcripts[0]) ``` ## License Inherits the base model's license.