File size: 4,209 Bytes
e0cc599
86ebf59
 
e0cc599
 
86ebf59
 
e0cc599
86ebf59
 
 
e0cc599
 
86ebf59
 
 
 
 
 
 
 
 
 
f37e18e
86ebf59
 
 
f37e18e
 
e0cc599
 
 
 
 
 
 
0f67b6d
e0cc599
a742827
 
e0cc599
f37e18e
e0cc599
f37e18e
e0cc599
f37e18e
e0cc599
f37e18e
e0cc599
f37e18e
e0cc599
 
 
 
 
 
 
 
 
 
 
 
a742827
e0cc599
 
 
 
 
 
 
a742827
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e0cc599
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
language: 
- hi
license: apache-2.0
tags:
- automatic-speech-recognition
- robust-speech-event
datasets:
- mozilla-foundation/common_voice_7_0
metrics:
- wer
model-index:
- name: wav2vec2-large-xls-r-300m-hi-wx1
  results:
  - task: 
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      type: mozilla-foundation/common_voice_7_0
      name: Common Voice 7
      args: hi
    metrics:
      - type: wer  
        value: 0.3719684845500431 
        name: Test WER 
      - name: Test CER
        type: cer
        value: 0.11763235514672798

---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# wav2vec2-large-xls-r-300m-hi-wx1

This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 -HI dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6552
- Wer: 0.3200

Evaluation Commands

1. To evaluate on mozilla-foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hi-wx1 --dataset mozilla-foundation/common_voice_7_0 --config hi --split test --log_outputs

2. To evaluate on speech-recognition-community-v2/dev_data

NA

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.00024
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1800
- num_epochs: 50
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer    |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 12.2663       | 1.36  | 200  | 5.9245          | 1.0    |
| 4.1856        | 2.72  | 400  | 3.4968          | 1.0    |
| 3.3908        | 4.08  | 600  | 2.9970          | 1.0    |
| 1.5444        | 5.44  | 800  | 0.9071          | 0.6139 |
| 0.7237        | 6.8   | 1000 | 0.6508          | 0.4862 |
| 0.5323        | 8.16  | 1200 | 0.6217          | 0.4647 |
| 0.4426        | 9.52  | 1400 | 0.5785          | 0.4288 |
| 0.3933        | 10.88 | 1600 | 0.5935          | 0.4217 |
| 0.3532        | 12.24 | 1800 | 0.6358          | 0.4465 |
| 0.3319        | 13.6  | 2000 | 0.5789          | 0.4118 |
| 0.2877        | 14.96 | 2200 | 0.6163          | 0.4056 |
| 0.2663        | 16.33 | 2400 | 0.6176          | 0.3893 |
| 0.2511        | 17.68 | 2600 | 0.6065          | 0.3999 |
| 0.2275        | 19.05 | 2800 | 0.6183          | 0.3842 |
| 0.2098        | 20.41 | 3000 | 0.6486          | 0.3864 |
| 0.1943        | 21.77 | 3200 | 0.6365          | 0.3885 |
| 0.1877        | 23.13 | 3400 | 0.6013          | 0.3677 |
| 0.1679        | 24.49 | 3600 | 0.6451          | 0.3795 |
| 0.1667        | 25.85 | 3800 | 0.6410          | 0.3635 |
| 0.1514        | 27.21 | 4000 | 0.6000          | 0.3577 |
| 0.1453        | 28.57 | 4200 | 0.6020          | 0.3518 |
| 0.134         | 29.93 | 4400 | 0.6531          | 0.3517 |
| 0.1354        | 31.29 | 4600 | 0.6874          | 0.3578 |
| 0.1224        | 32.65 | 4800 | 0.6519          | 0.3492 |
| 0.1199        | 34.01 | 5000 | 0.6553          | 0.3490 |
| 0.1077        | 35.37 | 5200 | 0.6621          | 0.3429 |
| 0.0997        | 36.73 | 5400 | 0.6641          | 0.3413 |
| 0.0964        | 38.09 | 5600 | 0.6722          | 0.3385 |
| 0.0931        | 39.45 | 5800 | 0.6365          | 0.3363 |
| 0.0944        | 40.81 | 6000 | 0.6454          | 0.3326 |
| 0.0862        | 42.18 | 6200 | 0.6497          | 0.3256 |
| 0.0848        | 43.54 | 6400 | 0.6599          | 0.3226 |
| 0.0793        | 44.89 | 6600 | 0.6625          | 0.3232 |
| 0.076         | 46.26 | 6800 | 0.6463          | 0.3186 |
| 0.0749        | 47.62 | 7000 | 0.6559          | 0.3225 |
| 0.0663        | 48.98 | 7200 | 0.6552          | 0.3200 |


### Framework versions

- Transformers 4.16.2
- Pytorch 1.10.0+cu111
- Datasets 1.18.3
- Tokenizers 0.11.0