Automatic Speech Recognition
Transformers
TensorBoard
Safetensors
Swahili
wav2vec2
Generated from Trainer
Eval Results (legacy)
Instructions to use eddiegulay/wav2vec2-large-xlsr-mvc-swahili with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eddiegulay/wav2vec2-large-xlsr-mvc-swahili with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="eddiegulay/wav2vec2-large-xlsr-mvc-swahili")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("eddiegulay/wav2vec2-large-xlsr-mvc-swahili") model = AutoModelForCTC.from_pretrained("eddiegulay/wav2vec2-large-xlsr-mvc-swahili") - Notebooks
- Google Colab
- Kaggle
wav2vec2-large-xlsr-mvc-swahili
This model is a finetuned version of facebook/wav2vec2-large-xlsr-53.
How to use the model
There was an issue with vocab, seems like there are special characters included and they were not considered during training
You could try
from transformers import AutoProcessor, AutoModelForCTC
repo_name = "eddiegulay/wav2vec2-large-xlsr-mvc-swahili"
processor = AutoProcessor.from_pretrained(repo_name)
model = AutoModelForCTC.from_pretrained(repo_name)
# if you have GPU
# move model to CUDA
model = model.to("cuda")
def transcribe(audio_path):
# Load the audio file
audio_input, sample_rate = torchaudio.load(audio_path)
target_sample_rate = 16000
audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input)
# Preprocess the audio data
input_dict = processor(audio_input[0], return_tensors="pt", padding=True, sampling_rate=16000)
# Perform inference and transcribe
logits = model(input_dict.input_values.to("cuda")).logits
pred_ids = torch.argmax(logits, dim=-1)[0]
transcription = processor.decode(pred_ids)
return transcription
transcript = transcribe('your_audio.mp3')
- Downloads last month
- 476,845
Model tree for eddiegulay/wav2vec2-large-xlsr-mvc-swahili
Evaluation results
- Wer on common_voice_13_0test set self-reported0.200