Canary-Qwen-2.5B Fine-Tuned for ATC ASR (LoRA)

Fine-tuned nvidia/canary-qwen-2.5b on the UWB-ATCC corpus for Air Traffic Control speech recognition using LoRA adaptation.

Results

Model	Params Trained	WER
Canary-Qwen (zero-shot)	0	81.49%
Canary-Qwen (LoRA)	27.8M (0.97%)	23.32%
W2V2 Large (no LM)	317M (100%)	14.54%
W2V2 Large (with KenLM)	317M (100%)	12.69%

Training

Dataset: UWB-ATCC (Prague Airport ATC, 11,543 train / 2,886 test utterances)
Steps: 10,000 | LR: 5e-4 | Warmup: 1,000
LoRA: r=128, alpha=256, targets=[q_proj, v_proj]
Strategy: FSDP across 4x RTX 2080 Ti | Precision: fp16-true (eps=1e-4)
Framework: NVIDIA NeMo 2.8.0rc0

Learning Curve

Step	WER
0	81.49%
500	39.14%
2,000	30.87%
3,000	26.28%
5,000	24.77%
10,000	24.53%

Usage

from nemo.collections.speechlm2.models import SALM
import torch

model = SALM.from_pretrained('nvidia/canary-qwen-2.5b')
state = torch.load('consolidated_model.pt', map_location='cpu')
model.load_state_dict(state, strict=False)
model.cuda().eval()

answer_ids = model.generate(
    prompts=[[{
        'role': 'user',
        'content': f'Transcribe the following: {model.audio_locator_tag}',
        'audio': ['atc_audio.wav']
    }]],
    max_new_tokens=128,
)
print(model.tokenizer.ids_to_text(answer_ids[0].cpu()))

Part of

Pilot-to-ATC Research — Comparative evaluation of W2V2 vs Canary-Qwen for ATC domain ASR.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for suideepmax/canary-qwen-2.5b-atc-lora

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

nvidia/canary-qwen-2.5b

Adapter

(1)

this model