File size: 2,021 Bytes
be9aa9f
 
 
a595d5a
9f8cd0c
 
 
7cafe2c
 
9f8cd0c
be9aa9f
9f8cd0c
be9aa9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f8cd0c
 
 
be9aa9f
 
 
 
 
 
 
 
 
9f8cd0c
 
be9aa9f
7cafe2c
 
 
 
 
 
 
 
 
be9aa9f
 
 
 
 
 
 
 
 
 
9f8cd0c
 
be9aa9f
9f8cd0c
 
be9aa9f
9f8cd0c
be9aa9f
 
9f8cd0c
 
 
 
 
 
be9aa9f
 
9f8cd0c
be9aa9f
9f8cd0c
 
 
be9aa9f
9f8cd0c
be9aa9f
9f8cd0c
 
 
 
 
be9aa9f
9f8cd0c
be9aa9f
9f8cd0c
be9aa9f
9f8cd0c
be9aa9f
9f8cd0c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: apache-2.0
tags:
- voxtral
- asr
- speech-to-text
- fine-tuning
- tonic

pipeline_tag: automatic-speech-recognition
base_model: {{base_model}}
{{#if has_hub_dataset_id}}
datasets:
- {{dataset_name}}
{{/if}}
{{#if author_name}}
author: {{author_name}}
{{/if}}
{{#if training_config_type}}
training_config: {{training_config_type}}
{{/if}}
{{#if trainer_type}}
trainer_type: {{trainer_type}}
{{/if}}
{{#if batch_size}}
batch_size: {{batch_size}}
{{/if}}
{{#if gradient_accumulation_steps}}
gradient_accumulation_steps: {{gradient_accumulation_steps}}
{{/if}}
{{#if learning_rate}}
learning_rate: {{learning_rate}}
{{/if}}
{{#if max_epochs}}
max_epochs: {{max_epochs}}
{{/if}}
{{#if max_seq_length}}
max_seq_length: {{max_seq_length}}
{{/if}}
{{#if hardware_info}}
hardware: "{{hardware_info}}"
{{/if}}
language:
- hi
- en
- fr
- de
- it
- pt
- nl
library_name: peft
---

# {{model_name}}

{{model_description}}

## Usage

```python
import torch
from transformers import AutoProcessor, AutoModelForSeq2SeqLM
import soundfile as sf

processor = AutoProcessor.from_pretrained("{{repo_name}}")
model = AutoModelForSeq2SeqLM.from_pretrained(
    "{{repo_name}}",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

audio, sr = sf.read("sample.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=256)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
```

## Training Configuration

- Base model: {{base_model}}
{{#if training_config_type}}- Config: {{training_config_type}}{{/if}}
{{#if trainer_type}}- Trainer: {{trainer_type}}{{/if}}

## Training Parameters

- Batch size: {{batch_size}}
- Grad accumulation: {{gradient_accumulation_steps}}
- Learning rate: {{learning_rate}}
- Max epochs: {{max_epochs}}
- Sequence length: {{max_seq_length}}

## Hardware

- {{hardware_info}}

## Notes

- This repository contains a fine-tuned Voxtral ASR model.