--- language: - fa license: apache-2.0 library_name: nemo tags: - automatic-speech-recognition - speech - persian - farsi - fp8 - nemo - modelopt - quantized base_model: - Reza2kn/visualears-fastconformer-fa-full-ab base_model_relation: quantized datasets: - Reza2kn/persian-asr-eval-v0 metrics: - wer - cer pipeline_tag: automatic-speech-recognition --- # visualears-fastconformer-fa-full-ab-fp8 FP8 post-training quantization of [`Reza2kn/visualears-fastconformer-fa-full-ab`](https://huggingface.co/Reza2kn/visualears-fastconformer-fa-full-ab) via NVIDIA `modelopt`. - **Base architecture:** EncDecHybridRNNTCTCBPEModel (NeMo) - **Calibration:** 32 Persian clips from `Reza2kn/persian-asr-eval-v0` (held out from eval). - **Hardware target:** NVIDIA GPUs with FP8/TensorRT-family runtime support. ## Eval — `Reza2kn/persian-asr-eval-v0` (FLEURS-fa, 200 clips) | Variant | WER ↓ | CER ↓ | per-clip latency | peak VRAM | |---|---|---|---|---| | FP base | 18.38% | 6.58% | 31 ms | 588 MiB | | **FP8 (this repo)** | **18.48%** | **6.69%** | 51 ms | 662 MiB | ## Usage ```python import nemo.collections.asr as nemo_asr m = nemo_asr.models.ASRModel.restore_from("visualears-fastconformer-fa-full-ab-FP8.nemo").cuda().eval() transcripts = m.transcribe(["clip.wav"]) print(transcripts[0]) ``` ## License Inherits the base model's license. ## Base Comparison On the same 200 FLEURS-fa clips, FP8 WER retention vs the FP base was 99.47% and CER retention was 98.34%. Exact normalized transcript match was 54.0%; rough word-position agreement was 93.13%. See `validation/fp8_vs_base_eval_summary.json`.