This model was created for the On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track competition. It was trained on a large-scale dataset specifically designed for children's speech recognition.
Model is based on facebook/wav2vec2-lv-60-espeak-cv-ft.
- Local validation CER: 0.2899
- Private Leaderboard CER: (unknown)
Usage:
wget https://github.com/drivendataorg/childrens-speech-recognition-runtime/raw/refs/heads/main/data-demo/phonetic/audio/U_1c8757065e355c35.flac
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import librosa
# load model and processor
processor = Wav2Vec2Processor.from_pretrained("ZFTurbo/wav2vec2-lv-60-Children-Phonetic")
model = Wav2Vec2ForCTC.from_pretrained("ZFTurbo/wav2vec2-lv-60-Children-Phonetic")
# tokenize
wav, sr = librosa.load("U_1c8757065e355c35.flac", sr=16000, mono=True)
input_values = processor(wav, return_tensors="pt").input_values
# retrieve logits
with torch.no_grad():
logits = model(input_values).logits
# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
More usage examples: https://github.com/ZFTurbo/Children-Speech-Recognition-Challenge-Solution
- Downloads last month
- 27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for ZFTurbo/wav2vec2-lv-60-Children-Phonetic
Base model
facebook/wav2vec2-lv-60-espeak-cv-ft