--- license: mit datasets: - aldigobbler/stt-correction language: - en base_model: - Qwen/Qwen3-0.6B pipeline_tag: text-generation tags: - speech-to-text - error-correction - text-cleaning model-index: - name: STT Error Correction Model results: - task: type: text-generation name: STT Error Correction dataset: name: stt-correction type: aldigobbler/stt-correction split: validation metrics: - name: Validation Loss type: loss value: 5.0712228 --- # STT Error Correction Model A fine-tuned Qwen3 0.6b model designed to clean and correct noisy speech-to-text (STT) transcriptions by removing filler words, fixing recognition errors, and improving overall text quality. ## Model Description This model corrects common STT errors including: - Filler words and hesitations ("umm", "uh", "like") - Phonetic misrecognitions ("no egg" → "Nutmeg") - Stutters and repeated words - Grammatical inconsistencies from spoken language ## Performance | Epoch | Training Loss | Validation Loss | |-------|---------------|-----------------| | 1 | 6.9071383 | 6.3923564 | | 2 | 5.6487107 | 5.8343363 | | 3 | 5.1722913 | 5.0712228 | Final validation loss: **5.0712228** ![W&B Chart 13_2_2026, 18_08_44](https://cdn-uploads.huggingface.co/production/uploads/64da645be42fba08b88d0315/BnO0kiDxQk-WB6GK1Amgy.png) ![W&B Chart 13_2_2026, 18_08_37](https://cdn-uploads.huggingface.co/production/uploads/64da645be42fba08b88d0315/THwfRK8ZldYQx7iiVlqiq.png) ## Usage The suggested system prompt is as follows: ``` You are a professional text editor. Transform raw speech transcriptions into polished written text. Apply these transformations: - Remove filler words (um, uh, ah, like, you know, I mean, sort of, kind of, basically, actually, literally) - Eliminate false starts and self-corrections (keep only the final intended phrase) - Fix grammar, punctuation, and sentence structure - Remove repetitions and redundant phrases - Convert spoken patterns to written prose - Preserve original meaning, tone, and technical terms Output only the corrected text with no preamble, labels, or explanations. ``` ## Training Data The model was trained on the [aldigobbler/stt-correction](https://huggingface.co/datasets/aldigobbler/stt-correction) dataset, which is based on the CHSER dataset methodology for speech error correction. ## Citation Dataset methodology based on: ```bibtex @misc{shankar2025chser, title={CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR}, author={Natarajan Balaji Shankar and Zilai Wang and Kaiyuan Zhang and Mohan Shi and Abeer Alwan}, year={2025}, eprint={2505.18463}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2505.18463}, } ``` ## License MIT