---
language:
- fa
license: apache-2.0
library_name: nemo
tags:
- automatic-speech-recognition
- speech
- persian
- farsi
- fp8
- nemo
- modelopt
- quantized
base_model:
- Reza2kn/visualears-fastconformer-fa-full-ab
base_model_relation: quantized
datasets:
- Reza2kn/persian-asr-eval-v0
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
---

# visualears-fastconformer-fa-full-ab-fp8

FP8 post-training quantization of [`Reza2kn/visualears-fastconformer-fa-full-ab`](https://huggingface.co/Reza2kn/visualears-fastconformer-fa-full-ab) via NVIDIA `modelopt`.

- **Base architecture:** EncDecHybridRNNTCTCBPEModel (NeMo)
- **Calibration:** 32 Persian clips from `Reza2kn/persian-asr-eval-v0` (held out from eval).
- **Hardware target:** NVIDIA GPUs with FP8/TensorRT-family runtime support.

## Eval — `Reza2kn/persian-asr-eval-v0` (FLEURS-fa, 200 clips)

| Variant | WER ↓ | CER ↓ | per-clip latency | peak VRAM |
|---|---|---|---|---|
| FP base | 18.38% | 6.58% | 31 ms | 588 MiB |
| **FP8 (this repo)** | **18.48%** | **6.69%** | 51 ms | 662 MiB |

## Usage

```python
import nemo.collections.asr as nemo_asr
m = nemo_asr.models.ASRModel.restore_from("visualears-fastconformer-fa-full-ab-FP8.nemo").cuda().eval()
transcripts = m.transcribe(["clip.wav"])
print(transcripts[0])
```

## License

Inherits the base model's license.

## Base Comparison

On the same 200 FLEURS-fa clips, FP8 WER retention vs the FP base was 99.47% and CER retention was 98.34%. Exact normalized transcript match was 54.0%; rough word-position agreement was 93.13%. See `validation/fp8_vs_base_eval_summary.json`.