--- license: other license_name: openmdw-1.1 license_link: https://openmdw.ai/license/1-1/ language: - cy library_name: nemo pipeline_tag: automatic-speech-recognition base_model: nvidia/nemotron-3.5-asr-streaming-0.6b datasets: - LokaalHub/cy-asr-cv tags: - automatic-speech-recognition - speech - audio - cy - nemo - fastconformer - rnnt - cache-aware-streaming - nemotron metrics: - wer model-index: - name: cy-asr-streaming-0.6b results: - task: {type: automatic-speech-recognition, name: Automatic Speech Recognition} dataset: {name: LokaalHub/cy-asr-cv (test), type: LokaalHub/cy-asr-cv, split: test} metrics: - type: wer value: 22.48 name: WER (offline / full-context, normalized) --- # cy-asr-streaming-0.6b A streaming Welsh (`cy`) ASR model, fine-tuned from [`nvidia/nemotron-3.5-asr-streaming-0.6b`](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b) on [`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv). > **Community fine-tune, not an NVIDIA model.** A derivative of NVIDIA's Nemotron 3.5 ASR. > NVIDIA did not produce, endorse, or review this model. "Nemotron" is a trademark of NVIDIA, > used here only to identify the base model. ## TL;DR Welsh (`cy`) is **not** one of the base model's supported locales, so it is fine-tuned conditioned on the closest available slot (`en`). Fine-tuning on ~50.1h takes WER from ~99.2% to ~22.48%. Prompt slot used during fine-tuning: `en` (nearest relative). ## Results | Condition | Base | Fine-tuned | Rel. improvement | |-----------|-----:|-----------:|-----------------:| | WER (offline, full-context, normalized) on `LokaalHub/cy-asr-cv` test | 99.2% | **22.48%** | 77.3% | > Offline (full-context) WER via NeMo `transcribe_speech.py`. Cache-aware **streaming** WER > (the condition NVIDIA headlines) was not measured for this release. ## Usage ```python import nemo.collections.asr as nemo_asr m = nemo_asr.models.ASRModel.restore_from("model.nemo") # from this repo m.transcribe(["audio.wav"]) # target_lang prompt: en ``` ## Training Single full fine-tune (`init_from_nemo_model`), bf16, NoamAnnealing. Data: [`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv) (~50.1h train). Built and trained by the [asr-loop](https://huggingface.co/LokaalHub) pipeline. ## Limitations Low-resource fine-tune on read speech (Common Voice). Evaluated on a 2.0h speaker-disjoint test subset — not directly comparable to published full-Common-Voice-test numbers.