---
base_model: Qwen/Qwen3-ASR-0.6B
pipeline_tag: automatic-speech-recognition
library_name: transformers
license: apache-2.0
language:
- lg
tags:
- automatic-speech-recognition
- qwen3-asr
- luganda
- typical-speech
- low-resource
datasets:
- KasuleTrevor/lg_100hrs
- dmusingu/yogera-dataset
---

# CDLI Qwen3-ASR Luganda Typical Speech Fine-tune

This repo contains the selected checkpoint `checkpoint-82500` from the `LG-QWEN3-ASR-TYPICAL-0P6B-T1` run.

## Training Setup
- Base model: Qwen/Qwen3-ASR-0.6B
- Datasets: KasuleTrevor/lg_100hrs + dmusingu/yogera-dataset (Luganda split)
- Training language tag: `Luganda`
- Forced inference language: disabled
- Epochs: 10
- Batch size: 4
- Gradient accumulation: 2
- Learning rate: 2e-05
- Scheduler: cosine
- Warmup ratio: 0.03
- Save steps: 500
- Selected checkpoint: checkpoint-82500
- Selection reason: existing test_metrics.json checkpoint

## Final Test Metrics
- Corpus WER (normalized): 0.285200
- Corpus CER (normalized): 0.072388
- Average utterance WER (normalized): 0.289360
- Average utterance CER (normalized): 0.073545

## Checkpoint Selection Evidence
| checkpoint | step | eval_loss |
| --- | --- | --- |
| checkpoint-81000 | 81000 | 0.1503518372774124 |
| checkpoint-81500 | 81500 | 0.1503709256649017 |
| checkpoint-82000 | 82000 | 0.1504658609628677 |
| checkpoint-82500 | 82500 | 0.1502721607685089 |
| checkpoint-83000 | 83000 | 0.1503749638795852 |
| checkpoint-83500 | 83500 | 0.150355577468872 |
| checkpoint-84000 | 84000 | 0.1503630727529525 |
| checkpoint-84380 | 84380 | 0.1503630727529525 |

## Source Dataset Breakdown
| source_dataset | n_samples | mean_wer | mean_cer | median_wer | median_cer |
| --- | --- | --- | --- | --- | --- |
| lg100 | 2828 | 0.2894 | 0.0735 | 0.25 | 0.0426 |

## Artifacts
- Result folder: `results/checkpoint-82500/`
- Includes checkpoint validation summaries, final test predictions, scored outputs, and grouped analyses.