Automatic Speech Recognition
NeMo
Welsh
speech
audio
fastconformer
rnnt
cache-aware-streaming
nemotron
Eval Results (legacy)
Instructions to use LokaalHub/nemotron-3.5-cy with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use LokaalHub/nemotron-3.5-cy with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("LokaalHub/nemotron-3.5-cy") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: openmdw-1.1 | |
| license_link: https://openmdw.ai/license/1-1/ | |
| language: | |
| - cy | |
| library_name: nemo | |
| pipeline_tag: automatic-speech-recognition | |
| base_model: nvidia/nemotron-3.5-asr-streaming-0.6b | |
| datasets: | |
| - LokaalHub/cy-asr-cv | |
| tags: | |
| - automatic-speech-recognition | |
| - speech | |
| - audio | |
| - cy | |
| - nemo | |
| - fastconformer | |
| - rnnt | |
| - cache-aware-streaming | |
| - nemotron | |
| metrics: | |
| - wer | |
| model-index: | |
| - name: cy-asr-streaming-0.6b | |
| results: | |
| - task: {type: automatic-speech-recognition, name: Automatic Speech Recognition} | |
| dataset: {name: LokaalHub/cy-asr-cv (test), type: LokaalHub/cy-asr-cv, split: test} | |
| metrics: | |
| - type: wer | |
| value: 22.48 | |
| name: WER (offline / full-context, normalized) | |
| # cy-asr-streaming-0.6b | |
| A streaming Welsh (`cy`) ASR model, fine-tuned from | |
| [`nvidia/nemotron-3.5-asr-streaming-0.6b`](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b) on | |
| [`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv). | |
| > **Community fine-tune, not an NVIDIA model.** A derivative of NVIDIA's Nemotron 3.5 ASR. | |
| > NVIDIA did not produce, endorse, or review this model. "Nemotron" is a trademark of NVIDIA, | |
| > used here only to identify the base model. | |
| ## TL;DR | |
| Welsh (`cy`) is **not** one of the base model's supported locales, so it is fine-tuned conditioned on the closest available slot (`en`). Fine-tuning on ~50.1h takes WER from ~99.2% to ~22.48%. Prompt slot used during fine-tuning: `en` (nearest relative). | |
| ## Results | |
| | Condition | Base | Fine-tuned | Rel. improvement | | |
| |-----------|-----:|-----------:|-----------------:| | |
| | WER (offline, full-context, normalized) on `LokaalHub/cy-asr-cv` test | 99.2% | **22.48%** | 77.3% | | |
| > Offline (full-context) WER via NeMo `transcribe_speech.py`. Cache-aware **streaming** WER | |
| > (the condition NVIDIA headlines) was not measured for this release. | |
| ## Usage | |
| ```python | |
| import nemo.collections.asr as nemo_asr | |
| m = nemo_asr.models.ASRModel.restore_from("model.nemo") # from this repo | |
| m.transcribe(["audio.wav"]) # target_lang prompt: en | |
| ``` | |
| ## Training | |
| Single full fine-tune (`init_from_nemo_model`), bf16, NoamAnnealing. Data: | |
| [`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv) (~50.1h train). | |
| Built and trained by the [asr-loop](https://huggingface.co/LokaalHub) pipeline. | |
| ## Limitations | |
| Low-resource fine-tune on read speech (Common Voice). Evaluated on a 2.0h | |
| speaker-disjoint test subset — not directly comparable to published full-Common-Voice-test numbers. | |