Instructions to use bond005/whisper-large-v3-ru-podlodka with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bond005/whisper-large-v3-ru-podlodka with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="bond005/whisper-large-v3-ru-podlodka")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("bond005/whisper-large-v3-ru-podlodka") model = AutoModelForSpeechSeq2Seq.from_pretrained("bond005/whisper-large-v3-ru-podlodka") - Notebooks
- Google Colab
- Kaggle
Whisper Large V3 Russian Podlodka
This repository contains a fine-tuned Whisper Large V3 model for Russian speech recognition. It serves as the core transcription component of the Pisets system, specifically optimized for long audio recordings such as lectures and interviews.
The model was presented in the paper Pisets: A Robust Speech Recognition System for Lectures and Interviews.
System Architecture
The Pisets system implements a three-component architecture to improve recognition accuracy while minimizing hallucinations:
- Wav2Vec2: For primary recognition and segmentation.
- Audio Spectrogram Transformer (AST): For filtering non-speech segments.
- Whisper (this model): For the final high-quality transcription.
Implementation
The complete source code and instructions for using the system (including generation of SRT and DocX files) can be found in the GitHub repository:
GitHub: https://github.com/bond005/pisets
Citation
If you use this model or the Pisets system in your research, please cite:
@article{bondarenko2026pisets,
title={Pisets: A Robust Speech Recognition System for Lectures and Interviews},
author={Ivan Bondarenko},
journal={arXiv preprint arXiv:2601.18415},
year={2026}
}
- Downloads last month
- 763
Model tree for bond005/whisper-large-v3-ru-podlodka
Datasets used to train bond005/whisper-large-v3-ru-podlodka
bond005/podlodka_speech
bond005/taiga_speech_v2
Space using bond005/whisper-large-v3-ru-podlodka 1
Collection including bond005/whisper-large-v3-ru-podlodka
Paper for bond005/whisper-large-v3-ru-podlodka
Evaluation results
- WER (with punctuation and capital letters) on Podlodka.ioself-reported20.910
- WER (without punctuation) on Podlodka.ioself-reported10.987
- WER (without punctuation) on Russian Librispeechself-reported9.795