---
language:
- kmr
- ku
license: apache-2.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_8_0
- kmr
- wav2vec2
- kurdish
- asr
- speech-to-text
- maniwebdev
datasets:
- mozilla-foundation/common_voice_8_0
model-index:
- name: maniwebdev/xlsr_kurmanji_kurdish_custom
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 8
      type: mozilla-foundation/common_voice_8_0
      args: kmr
    metrics:
    - name: Test WER
      type: wer
      value: 0.3307
    - name: Test CER
      type: cer
      value: 0.0803
---

# maniwebdev/xlsr_kurmanji_kurdish_custom

**Kurmanji Kurdish Speech Recognition Model**  
This model was created by fine-tuning [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the **Mozilla Common Voice 8.0** Kurmanji Kurdish dataset.

It is designed to convert spoken **Kurmanji Kurdish** audio into text accurately and efficiently.

---

## 🧠 Model Description
This model is part of the **Ferhengy** project — a Kurdish language learning and transcription tool.  
It builds upon multilingual speech representation learning using Wav2Vec2 XLS-R 300M and adapts it specifically for **Kurmanji Kurdish**.

---

## 🎯 Intended Uses
- Speech-to-text for Kurmanji Kurdish content (education, linguistics, or accessibility)
- Transcription of Kurdish audio for apps, media, or research
- Integration in applications that promote Kurdish digital language tools

---

## 🧾 Training Data
The model was trained on **Common Voice Kurmanji Kurdish (v8.0)**, using:
- `train.tsv`
- `dev.tsv`
- `invalidated.tsv`
- `reported.tsv`
- `other.tsv`

Only samples with **positive upvotes** were used, and **duplicates were removed** to ensure high-quality data.

---

## ⚙️ Training Details
Training configuration (for reproducibility):

| Hyperparameter | Value |
|----------------|-------|
| learning_rate | 9.6e-5 |
| train_batch_size | 16 |
| eval_batch_size | 16 |
| gradient_accumulation_steps | 16 |
| lr_scheduler_type | cosine_with_restarts |
| num_epochs | 100 |
| seed | 13 |
| mixed_precision_training | Native AMP |

### Results
| Step | Training Loss | Validation Loss | WER |
|------|----------------|-----------------|-----|
| 1200 | 0.2263 | 0.2924 | 0.3886 |

---

## 🧩 Framework Versions
- **Transformers:** 4.16.0  
- **PyTorch:** 1.10.0  
- **Datasets:** 1.18.1  
- **Tokenizers:** 0.10.3  

---

## 🧪 Evaluation Example

To evaluate on **Common Voice 8.0 (Kurdish)**:
```bash
python eval.py --model_id maniwebdev/xlsr_kurmanji_kurdish_custom --dataset mozilla-foundation/common_voice_8_0 --config kmr --split test