Spaces:
Running
Running
File size: 2,021 Bytes
be9aa9f a595d5a 9f8cd0c 7cafe2c 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 7cafe2c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c be9aa9f 9f8cd0c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | ---
license: apache-2.0
tags:
- voxtral
- asr
- speech-to-text
- fine-tuning
- tonic
pipeline_tag: automatic-speech-recognition
base_model: {{base_model}}
{{#if has_hub_dataset_id}}
datasets:
- {{dataset_name}}
{{/if}}
{{#if author_name}}
author: {{author_name}}
{{/if}}
{{#if training_config_type}}
training_config: {{training_config_type}}
{{/if}}
{{#if trainer_type}}
trainer_type: {{trainer_type}}
{{/if}}
{{#if batch_size}}
batch_size: {{batch_size}}
{{/if}}
{{#if gradient_accumulation_steps}}
gradient_accumulation_steps: {{gradient_accumulation_steps}}
{{/if}}
{{#if learning_rate}}
learning_rate: {{learning_rate}}
{{/if}}
{{#if max_epochs}}
max_epochs: {{max_epochs}}
{{/if}}
{{#if max_seq_length}}
max_seq_length: {{max_seq_length}}
{{/if}}
{{#if hardware_info}}
hardware: "{{hardware_info}}"
{{/if}}
language:
- hi
- en
- fr
- de
- it
- pt
- nl
library_name: peft
---
# {{model_name}}
{{model_description}}
## Usage
```python
import torch
from transformers import AutoProcessor, AutoModelForSeq2SeqLM
import soundfile as sf
processor = AutoProcessor.from_pretrained("{{repo_name}}")
model = AutoModelForSeq2SeqLM.from_pretrained(
"{{repo_name}}",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)
audio, sr = sf.read("sample.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
generated_ids = model.generate(**inputs, max_new_tokens=256)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
```
## Training Configuration
- Base model: {{base_model}}
{{#if training_config_type}}- Config: {{training_config_type}}{{/if}}
{{#if trainer_type}}- Trainer: {{trainer_type}}{{/if}}
## Training Parameters
- Batch size: {{batch_size}}
- Grad accumulation: {{gradient_accumulation_steps}}
- Learning rate: {{learning_rate}}
- Max epochs: {{max_epochs}}
- Sequence length: {{max_seq_length}}
## Hardware
- {{hardware_info}}
## Notes
- This repository contains a fine-tuned Voxtral ASR model.
|