Instructions to use suideepmax/canary-qwen-2.5b-atc-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use suideepmax/canary-qwen-2.5b-atc-lora with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
Canary-Qwen-2.5B Fine-Tuned for ATC ASR (LoRA)
Fine-tuned nvidia/canary-qwen-2.5b on the UWB-ATCC corpus for Air Traffic Control speech recognition using LoRA adaptation.
Results
| Model | Params Trained | WER |
|---|---|---|
| Canary-Qwen (zero-shot) | 0 | 81.49% |
| Canary-Qwen (LoRA) | 27.8M (0.97%) | 23.32% |
| W2V2 Large (no LM) | 317M (100%) | 14.54% |
| W2V2 Large (with KenLM) | 317M (100%) | 12.69% |
Training
- Dataset: UWB-ATCC (Prague Airport ATC, 11,543 train / 2,886 test utterances)
- Steps: 10,000 | LR: 5e-4 | Warmup: 1,000
- LoRA: r=128, alpha=256, targets=[q_proj, v_proj]
- Strategy: FSDP across 4x RTX 2080 Ti | Precision: fp16-true (eps=1e-4)
- Framework: NVIDIA NeMo 2.8.0rc0
Learning Curve
| Step | WER |
|---|---|
| 0 | 81.49% |
| 500 | 39.14% |
| 2,000 | 30.87% |
| 3,000 | 26.28% |
| 5,000 | 24.77% |
| 10,000 | 24.53% |
Usage
from nemo.collections.speechlm2.models import SALM
import torch
model = SALM.from_pretrained('nvidia/canary-qwen-2.5b')
state = torch.load('consolidated_model.pt', map_location='cpu')
model.load_state_dict(state, strict=False)
model.cuda().eval()
answer_ids = model.generate(
prompts=[[{
'role': 'user',
'content': f'Transcribe the following: {model.audio_locator_tag}',
'audio': ['atc_audio.wav']
}]],
max_new_tokens=128,
)
print(model.tokenizer.ids_to_text(answer_ids[0].cpu()))
Part of
Pilot-to-ATC Research — Comparative evaluation of W2V2 vs Canary-Qwen for ATC domain ASR.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support