This model was created for the On Top of Pasketti: Children’s Speech Recognition Challenge - Phonetic Track competition. It was trained on a large-scale dataset specifically designed for children's speech recognition.

Model is based on Qwen/Qwen3-ASR-1.7B.

Local validation CER: 0.2794
Public Leaderboard CER: 0.2795
Private Leaderboard CER: 0.2806

Usage:

import torch
from qwen_asr import Qwen3ASRModel

def get_dynamic_batches(items):
    total = len(items)
    i = 0

    while i < total:
        if items[i]['audio_duration_sec'] > 200:
            batch_size = 1
        else:
            batch_size = int(5000 / items[i]['audio_duration_sec']) + 1
            batch_size = min(batch_size, 64)

        yield items[i:i + batch_size]
        i += batch_size

model = Qwen3ASRModel.from_pretrained(
    "ZFTurbo/Qwen3-ASR-Children-Phonetic",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    max_inference_batch_size=64,
    max_new_tokens=-1,
)

with torch.inference_mode():
    for batch in get_dynamic_batches(items):
        paths = []
        languages = []
        max_new_tokens = 0
        for item in batch:
            path = str(data_dir / item["audio_path"])
            paths.append(path)
            languages.append("English")
            max_new_tokens = max(max_new_tokens, int(item["audio_duration_sec"] * 20))

        print(
            "Batch size:", len(batch),
            "Duration:", batch[0]["audio_duration_sec"],
            "Processed:", total,
            "Max tokens:", max_new_tokens,
        )
        cur_time = time.time()

        results = model.transcribe(
            audio=paths,
            language=languages,  # can also be set to None for automatic language detection
            return_time_stamps=False,
            max_new_tokens=max_new_tokens,
        )

       predictions = {}
        for i, r in enumerate(results):
            predictions[i] = r.text

More usage examples: https://github.com/ZFTurbo/Children-Speech-Recognition-Challenge-Solution

Downloads last month: 43

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZFTurbo/Qwen3-ASR-Children-Phonetic

Base model

Qwen/Qwen3-ASR-1.7B

Finetuned

(58)

this model