# LexiCore Wav2Vec2 XLS-R 300M CTC — শব্দতরী Bangla Dialect ASR

This model is a fine-tuned version of  
[`arijitx/wav2vec2-xls-r-300m-bengali`](https://huggingface.co/arijitx/wav2vec2-xls-r-300m-bengali)  
for the **“শব্দতরী: Where Dialects Flow into Bangla”** competition.

- Task: dialectal Bangla speech → standard Bangla text
- Data: 3,350 audio clips from 20 regions of Bangladesh (competition dataset only)
- Metric: Normalized Levenshtein Similarity (char-level)
- Decoding: CTC + 5-gram KenLM (`pyctcdecode`) + small punctuation rule
- Training:
  - 20 epochs
  - LR = 1e-4
  - Batch size ≈ 8 (4 × 2 grad accumulation)
  - Strong waveform augmentations (speed, gain, noise, time-drop)

## Intended Use

- Research and experimentation on Bangla ASR for low-resource and dialectal settings
- Non-commercial applications, respecting the original competition and dataset license

## Limitations

- Trained only on short, scripted sentences from 20 Bangladeshi regions
- May not generalize to very long utterances, noisy real-world audio, or code-switching
- Output is in standard written Bangla, not dialect spelling

## Usage (pseudo-code)

```python
from transformers import Wav2Vec2Processor, AutoModelForCTC
import torch, torchaudio

processor = Wav2Vec2Processor.from_pretrained("your-username/your-repo")
model = AutoModelForCTC.from_pretrained("your-username/your-repo").to("cuda").eval()

waveform, sr = torchaudio.load("example.wav")
# resample to 16k if needed...

inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    logits = model(inputs.input_values.to("cuda")).logits
pred_ids = torch.argmax(logits, dim=-1)
transcript = processor.batch_decode(pred_ids)[0]
print(transcript)