google/fleurs
Viewer • Updated • 768k • 58.2k • 405
How to use sliderforthewin/parakeet-tdt-lt with NeMo:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("sliderforthewin/parakeet-tdt-lt")
transcriptions = asr_model.transcribe(["file.wav"])Fine-tuned version of nvidia/parakeet-tdt-0.6b-v3 on ~43 hours of Lithuanian speech data. Achieves a 45.5% relative WER reduction on Common Voice 25 Lithuanian test (16.53% → 8.91% with beam search + domain 5-gram language model, BasicTextNormalizer).
| Configuration | CV25 LT WER | CV25 LT CER | FLEURS LT WER |
|---|---|---|---|
| Baseline (pretrained, greedy) | 16.53% | 4.29% | 22.15%* |
| Fine-tuned epoch 11 (greedy) | 13.55% | 2.76% | — |
| Fine-tuned + beam + domain 5-gram LM α=0.5 | 9.40% | 2.15% | — |
| Same, BasicTextNormalizer (leaderboard) | 8.91% | 2.07% | 15.87% |
* BasicTextNormalizer. Live results: speechbench-viz.web.app
import nemo.collections.asr as nemo_asr
# Greedy decoding
model = nemo_asr.models.ASRModel.from_pretrained("sliderforthewin/parakeet-tdt-lt")
transcriptions = model.transcribe(["audio.wav"])
from omegaconf import open_dict
from huggingface_hub import hf_hub_download
model = nemo_asr.models.ASRModel.from_pretrained("sliderforthewin/parakeet-tdt-lt")
# Download the token-level LM
lm_path = hf_hub_download("sliderforthewin/parakeet-tdt-lt", "lt_token_4gram.arpa")
# Switch to beam search with LM fusion
decoding_cfg = model.cfg.decoding
with open_dict(decoding_cfg):
decoding_cfg.strategy = "maes"
decoding_cfg.beam.beam_size = 4
decoding_cfg.beam.return_best_hypothesis = True
decoding_cfg.beam.ngram_lm_model = lm_path
decoding_cfg.beam.ngram_lm_alpha = 0.5
model.change_decoding_strategy(decoding_cfg)
transcriptions = model.transcribe(["audio.wav"])
git clone https://github.com/jasontitus/finetuneparakeet.git
cd finetuneparakeet
bash scripts/gcp_eval.sh # on a GCP VM with GPU
parakeet-tdt-lt.nemo — NeMo checkpoint (epoch 11, best WER)lt_europarl_wiki_subs_5gram.arpa — Europarl+wiki+subs 5-gram token LM (recommended)lt_token_4gram.arpa — Original 4-gram token LM (smaller, still good)CC-BY-4.0 (same as the training data sources)
Base model
nvidia/parakeet-tdt-0.6b-v3