Instructions to use Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-litert-fp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-litert-fp with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-litert-fp") transcriptions = asr_model.transcribe(["file.wav"]) - LiteRT
How to use Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-litert-fp with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
VisualEars FA32M Streaming BPE1024 β LiteRT FP
LiteRT/TFLite fixed-frame acoustic CTC-core export of Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024.
This is the FA32M length-aware core: it accepts precomputed NeMo-compatible log-mel features plus the real valid feature length, so short utterances do not get decoded as if all 2005 padded frames were valid.
Runtime contract
- input 0 (
serving_default_args_0):processed_signalfloat32[1, 80, 2005] - input 1 (
serving_default_args_1):processed_signal_lengthint64[1]β valid log-mel frame count before zero padding - output 0 (
serving_default_output_0_output):logitsfloat32[1, 252, 1025] - output 1 (
serving_default_output_1_output):encoded_lengthsint64[1] - tokenizer blank id: 1024
Artifact
- File:
fastconformer_fa32m_ctc_fixed2005_len_fp.tflite - Size:
110,574,440bytes - SHA256:
ae671928398d98ad86a67926d60c53b8885e2224ba8b7beea3318718afb9bb84
269-clip transcription parity
Source: PyTorch NeMo preprocessor + encoder + auxiliary CTC fp32, decoded during calibration export.
Candidate: this LiteRT/TFLite model through ai_edge_litert XNNPACK CPU.
Validation set: all 269 clips from Reza2kn/visualears-benchmark-269-gold.
| Metric | Result |
|---|---|
| Exact transcript matches | 269 / 269 |
| Exact transcript parity | 100.00% |
| Exact normalized transcript parity | 100.00% |
| Mean character similarity | 100.00% |
| Candidate non-empty rate | 98.88% |
| Source non-empty rate | 98.88% |
| Encoded length match rate | 100.00% |
Result: passes the >98% transcription parity gate.
Feature contract
Use the sidecars preprocessor.json and mel_filters_slaney_80x257.json:
- sample rate: 16 kHz mono
- preemphasis:
0.97 - STFT:
n_fft=512,win_length=400,hop_length=160, centered with reflect padding - mel: Slaney/librosa 80-bin filterbank from sidecar
- log: natural log with tiny floor
- no per-bin normalization (
normalize=NA) - zero-pad/truncate features to 2005 frames, and pass true
processed_signal_length
Files
fastconformer_fa32m_ctc_fixed2005_len_fp.tfliteβ LiteRT/TFLite modeltokens.jsonβ tokenizer pieces + blank idpreprocessor.jsonβ feature settingsmel_filters_slaney_80x257.jsonβ browser/runtime-compatible mel filtersvalidation/parity_full269_litert_fp_fp16.jsonβ full transcript parity for FP and FP16validation/fa32m_litert_export_manifest.jsonβ calibration/export manifestscripts/β export, conversion, quantization, and parity scripts
Provenance / conversion notes
- Source model:
Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024/fa32m_streaming_bpe1024_final.nemo - Source SHA256:
034fb2afa19da13db8a120970a7f8d3e696987014cc62684ce50a1382d332448 - Conversion: NeMo CTC encoder/auxiliary decoder β TorchScript β
litert_torchβ LiteRT/TFLite. - LiteRT workaround: relative positional encoding was fixed to the known 2005-frame contract to avoid dynamic scalar lowering in
litert_torch;processed_signal_lengthremains a runtime input and drives padding/attention masking plusencoded_lengths.
- Downloads last month
- 5
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js