--- language: ur tags: - audio - automatic-speech-recognition - urdu - wav2vec2 - xlsr license: apache-2.0 --- # Urdu ASR - Fine-tuned XLSR-53 Fine-tuned `facebook/wav2vec2-large-xlsr-53` on Urdu speech data for automatic speech recognition. ## Model Details - **Base model:** facebook/wav2vec2-large-xlsr-53 - **Language:** Urdu (ur) - **Task:** Automatic Speech Recognition (ASR) - **Training data:** Unified Urdu Speech ASR Dataset ## Usage ```python from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor import torchaudio, torch processor = Wav2Vec2Processor.from_pretrained("abidanoaman/urdu-asr-xlsr53-finetuned") model = Wav2Vec2ForCTC.from_pretrained("abidanoaman/urdu-asr-xlsr53-finetuned") # Load audio (must be 16kHz mono) waveform, sr = torchaudio.load("your_audio.wav") inputs = processor(waveform.squeeze(), sampling_rate=16000, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.decode(predicted_ids[0]) print(transcription) ``` ## Training Configuration - Epochs: 10 - Batch size: 2 (grad accum: 4) - Learning rate: 0.0001 - Mixed precision: True