nemotron-3.5-cy / README.md
jellewas's picture
Upload README.md with huggingface_hub
3282d0c verified
|
Raw
History Blame Contribute Delete
2.52 kB
---
license: other
license_name: openmdw-1.1
license_link: https://openmdw.ai/license/1-1/
language:
- cy
library_name: nemo
pipeline_tag: automatic-speech-recognition
base_model: nvidia/nemotron-3.5-asr-streaming-0.6b
datasets:
- LokaalHub/cy-asr-cv
tags:
- automatic-speech-recognition
- speech
- audio
- cy
- nemo
- fastconformer
- rnnt
- cache-aware-streaming
- nemotron
metrics:
- wer
model-index:
- name: cy-asr-streaming-0.6b
results:
- task: {type: automatic-speech-recognition, name: Automatic Speech Recognition}
dataset: {name: LokaalHub/cy-asr-cv (test), type: LokaalHub/cy-asr-cv, split: test}
metrics:
- type: wer
value: 22.48
name: WER (offline / full-context, normalized)
---
# cy-asr-streaming-0.6b
A streaming Welsh (`cy`) ASR model, fine-tuned from
[`nvidia/nemotron-3.5-asr-streaming-0.6b`](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b) on
[`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv).
> **Community fine-tune, not an NVIDIA model.** A derivative of NVIDIA's Nemotron 3.5 ASR.
> NVIDIA did not produce, endorse, or review this model. "Nemotron" is a trademark of NVIDIA,
> used here only to identify the base model.
## TL;DR
Welsh (`cy`) is **not** one of the base model's supported locales, so it is fine-tuned conditioned on the closest available slot (`en`). Fine-tuning on ~50.1h takes WER from ~99.2% to ~22.48%. Prompt slot used during fine-tuning: `en` (nearest relative).
## Results
| Condition | Base | Fine-tuned | Rel. improvement |
|-----------|-----:|-----------:|-----------------:|
| WER (offline, full-context, normalized) on `LokaalHub/cy-asr-cv` test | 99.2% | **22.48%** | 77.3% |
> Offline (full-context) WER via NeMo `transcribe_speech.py`. Cache-aware **streaming** WER
> (the condition NVIDIA headlines) was not measured for this release.
## Usage
```python
import nemo.collections.asr as nemo_asr
m = nemo_asr.models.ASRModel.restore_from("model.nemo") # from this repo
m.transcribe(["audio.wav"]) # target_lang prompt: en
```
## Training
Single full fine-tune (`init_from_nemo_model`), bf16, NoamAnnealing. Data:
[`LokaalHub/cy-asr-cv`](https://huggingface.co/datasets/LokaalHub/cy-asr-cv) (~50.1h train).
Built and trained by the [asr-loop](https://huggingface.co/LokaalHub) pipeline.
## Limitations
Low-resource fine-tune on read speech (Common Voice). Evaluated on a 2.0h
speaker-disjoint test subset — not directly comparable to published full-Common-Voice-test numbers.