visualears-fastconformer-fa-full-ab-fp8

FP8 post-training quantization of Reza2kn/visualears-fastconformer-fa-full-ab via NVIDIA modelopt.

Base architecture: EncDecHybridRNNTCTCBPEModel (NeMo)
Calibration: 32 Persian clips from Reza2kn/persian-asr-eval-v0 (held out from eval).
Hardware target: NVIDIA GPUs with FP8/TensorRT-family runtime support.

Eval — `Reza2kn/persian-asr-eval-v0` (FLEURS-fa, 200 clips)

Variant	WER ↓	CER ↓	per-clip latency	peak VRAM
FP base	18.38%	6.58%	31 ms	588 MiB
FP8 (this repo)	18.48%	6.69%	51 ms	662 MiB

Usage

import nemo.collections.asr as nemo_asr
m = nemo_asr.models.ASRModel.restore_from("visualears-fastconformer-fa-full-ab-FP8.nemo").cuda().eval()
transcripts = m.transcribe(["clip.wav"])
print(transcripts[0])

License

Inherits the base model's license.

Base Comparison

On the same 200 FLEURS-fa clips, FP8 WER retention vs the FP base was 99.47% and CER retention was 98.34%. Exact normalized transcript match was 54.0%; rough word-position agreement was 93.13%. See validation/fp8_vs_base_eval_summary.json.

Downloads last month: 11

Model tree for Reza2kn/visualears-fastconformer-fa-full-ab-fp8

Base model

nvidia/stt_fa_fastconformer_hybrid_large

Finetuned

Reza2kn/visualears-fastconformer-fa-full-ab

Quantized

(12)