stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

Streaming Arabic FastConformer (RNNT decoder), exported as the sherpa-onnx 3-file split so sherpa-onnx's OnlineRecognizer can run it on iOS / Android / desktop with cache-aware encoder state propagation.

Mobile counterpart of …-v1.1-mirror — see the mirror repo's README for training recipe, dataset, and results (in short: val_wer 1.50% at epoch 7, ~9× better than v1's 13.23%).

Geometry

att_context_size = [70, 13]
Left context: 5.6 s of past audio (memory window, no extra latency)
Right context (lookahead): 1.04 s — constant emission delay
Encoder subsampling: 8× (10 ms frames → 80 ms output steps)

Files

file	size	purpose
`encoder.onnx`	456 MB	streaming encoder with cache I/O slots
`decoder.onnx`	16 MB	RNNT prediction network
`joiner.onnx`	5.6 MB	joint network
`tokens.txt`	13 KB	SentencePiece vocab (`token\tid` per line)
`silero_vad.onnx`	2.3 MB	bundled Silero VAD (offline-fallback path)
`STREAMING.marker`	<1 KB	flag file so consumers can detect streaming bundle
`README.md`	—	this file

Total: ~480 MB. fp32 weights — int8 quantization (QDQ static, calibrated) is on the roadmap.

Usage (Dart / sherpa-onnx)

final transducer = sherpa.OnlineTransducerModelConfig(
  encoder: 'encoder.onnx',
  decoder: 'decoder.onnx',
  joiner:  'joiner.onnx',
);
final model = sherpa.OnlineModelConfig(
  transducer: transducer,
  tokens: 'tokens.txt',
  modelType: 'transducer',
  provider: 'cpu',
);
final recognizer = sherpa.OnlineRecognizer(sherpa.OnlineRecognizerConfig(
  model: model,
  decodingMethod: 'greedy_search', // beam=1
  enableEndpoint: true,
  rule1MinTrailingSilence: 2.4,
  rule2MinTrailingSilence: 1.2,
  rule3MinUtteranceLength: 30.0,
));
final stream = recognizer.createStream();
stream.acceptWaveform(samples: micFrame, sampleRate: 16000);
while (recognizer.isReady(stream)) recognizer.decode(stream);
print(recognizer.getResult(stream).text);

Usage (Python / sherpa-onnx)

import sherpa_onnx
recognizer = sherpa_onnx.OnlineRecognizer.from_transducer(
    encoder="encoder.onnx",
    decoder="decoder.onnx",
    joiner="joiner.onnx",
    tokens="tokens.txt",
    num_threads=2,
    provider="cpu",
    decoding_method="greedy_search",
)
stream = recognizer.create_stream()
# stream.accept_waveform(16000, samples)  # repeatedly
# while recognizer.is_ready(stream): recognizer.decode_stream(stream)
# print(recognizer.get_result(stream))

Why a separate `joiner.onnx`?

sherpa-onnx's OnlineTransducerModelConfig calls the joiner separately from the prediction network so the predictor state can be carried between frames without re-running the joint. NeMo's default export bundles them into a single decoder_joint.onnx — this repo's files were exported via NeMo's per-submodule .export() to get the three-way split.

Decoding mode

Greedy only (beam = 1). Murattil's product invariant is faithful acoustic transcription so that the on-device mistake-detection layer sees what the user actually said, not what an LM thinks they meant.

Limitations

See the …-v1.1-mirror README for the full list. Most relevant for mobile:

fp32 download is 456 MB. Wi-Fi-recommended for first launch.
Quran-only training. Don't expect strong results on general MSA or dialect.
No mujawwad coverage — by design.

License

CC-BY-4.0 (inherits NVIDIA's base model license).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for dev-ahmedhany/stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

Base model

nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0

Finetuned

dev-ahmedhany/stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-mirror

Quantized

(1)

this model

dev-ahmedhany
/

stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

Geometry

Files

Usage (Dart / sherpa-onnx)

Usage (Python / sherpa-onnx)

Why a separate `joiner.onnx`?

Decoding mode

Limitations

License

Model tree for dev-ahmedhany/stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

Dataset used to train dev-ahmedhany/stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

Geometry

Files

Usage (Dart / sherpa-onnx)

Usage (Python / sherpa-onnx)

Why a separate joiner.onnx?

Decoding mode

Limitations

License

Model tree for dev-ahmedhany/stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

Dataset used to train dev-ahmedhany/stt_ar_fastconformer_hybrid_large_streaming_pcd_v1.1-sherpa

Why a separate `joiner.onnx`?