---
license: other
license_name: openmdw-1.1
license_link: https://openmdw.ai/license/1-1/
language:
- cy
library_name: nemo
pipeline_tag: automatic-speech-recognition
base_model: nvidia/nemotron-3.5-asr-streaming-0.6b
datasets:
- LokaalHub/cy-asr-cv
tags:
- automatic-speech-recognition
- speech
- audio
- cy
- nemo
- fastconformer
- rnnt
- cache-aware-streaming
- nemotron
metrics:
- wer
model-index:
- name: cy-asr-streaming-0.6b
  results:
  - task: {type: automatic-speech-recognition, name: Automatic Speech Recognition}
    dataset: {name: LokaalHub/cy-asr-cv (test), type: LokaalHub/cy-asr-cv, split: test}
    metrics:
    - type: wer
      value: 22.48
      name: WER (offline / full-context, normalized)
---

# cy-asr-streaming-0.6b

A streaming Welsh (`cy`) ASR model, fine-tuned from
[`nvidia/nemotron-3.5-asr-streaming-0.6b`](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b) on
[`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv).

> **Community fine-tune, not an NVIDIA model.** A derivative of NVIDIA's Nemotron 3.5 ASR.
> NVIDIA did not produce, endorse, or review this model. "Nemotron" is a trademark of NVIDIA,
> used here only to identify the base model.

## TL;DR

Welsh (`cy`) is **not** one of the base model's supported locales, so it is fine-tuned conditioned on the closest available slot (`en`). Fine-tuning on ~50.1h takes WER from ~99.2% to ~22.48%. Prompt slot used during fine-tuning: `en` (nearest relative).

## Results

| Condition | Base | Fine-tuned | Rel. improvement |
|-----------|-----:|-----------:|-----------------:|
| WER (offline, full-context, normalized) on `LokaalHub/cy-asr-cv` test | 99.2% | **22.48%** | 77.3% |

> Offline (full-context) WER via NeMo `transcribe_speech.py`. Cache-aware **streaming** WER
> (the condition NVIDIA headlines) was not measured for this release.

## Usage

```python
import nemo.collections.asr as nemo_asr
m = nemo_asr.models.ASRModel.restore_from("model.nemo")  # from this repo
m.transcribe(["audio.wav"])   # target_lang prompt: en
```

## Training

Single full fine-tune (`init_from_nemo_model`), bf16, NoamAnnealing. Data:
[`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv) (~50.1h train).
Built and trained by the [asr-loop](https://huggingface.co/LokaalHub) pipeline.

## Limitations

Low-resource fine-tune on read speech (Common Voice). Evaluated on a 2.0h
speaker-disjoint test subset — not directly comparable to published full-Common-Voice-test numbers.