This model was created for the On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track competition. It was trained on a large-scale dataset specifically designed for children's speech recognition.

Model is based on facebook/wav2vec2-lv-60-espeak-cv-ft.

  • Local validation CER: 0.2899
  • Private Leaderboard CER: (unknown)

Usage:

wget https://github.com/drivendataorg/childrens-speech-recognition-runtime/raw/refs/heads/main/data-demo/phonetic/audio/U_1c8757065e355c35.flac
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import librosa
 
# load model and processor
processor = Wav2Vec2Processor.from_pretrained("ZFTurbo/wav2vec2-lv-60-Children-Phonetic")
model = Wav2Vec2ForCTC.from_pretrained("ZFTurbo/wav2vec2-lv-60-Children-Phonetic")
     
# tokenize
wav, sr = librosa.load("U_1c8757065e355c35.flac", sr=16000, mono=True)
input_values = processor(wav, return_tensors="pt").input_values
 
# retrieve logits
with torch.no_grad():
  logits = model(input_values).logits
 
# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

More usage examples: https://github.com/ZFTurbo/Children-Speech-Recognition-Challenge-Solution

Downloads last month
27
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZFTurbo/wav2vec2-lv-60-Children-Phonetic

Finetuned
(84)
this model