---
license: mit
datasets:
- aldigobbler/stt-correction
language:
- en
base_model:
- Qwen/Qwen3-0.6B
pipeline_tag: text-generation
tags:
- speech-to-text
- error-correction
- text-cleaning
model-index:
- name: STT Error Correction Model
  results:
  - task:
      type: text-generation
      name: STT Error Correction
    dataset:
      name: stt-correction
      type: aldigobbler/stt-correction
      split: validation
    metrics:
    - name: Validation Loss
      type: loss
      value: 5.0712228
---

# STT Error Correction Model

A fine-tuned Qwen3 0.6b model designed to clean and correct noisy speech-to-text (STT) transcriptions by removing filler words, fixing recognition errors, and improving overall text quality.

## Model Description

This model corrects common STT errors including:
- Filler words and hesitations ("umm", "uh", "like")
- Phonetic misrecognitions ("no egg" → "Nutmeg")
- Stutters and repeated words
- Grammatical inconsistencies from spoken language

## Performance

| Epoch | Training Loss | Validation Loss |
|-------|---------------|-----------------|
| 1     | 6.9071383     | 6.3923564       |
| 2     | 5.6487107     | 5.8343363       |
| 3     | 5.1722913     | 5.0712228       |

Final validation loss: **5.0712228**

![W&B Chart 13_2_2026, 18_08_44](https://cdn-uploads.huggingface.co/production/uploads/64da645be42fba08b88d0315/BnO0kiDxQk-WB6GK1Amgy.png)
![W&B Chart 13_2_2026, 18_08_37](https://cdn-uploads.huggingface.co/production/uploads/64da645be42fba08b88d0315/THwfRK8ZldYQx7iiVlqiq.png)

## Usage
The suggested system prompt is as follows:
```
You are a professional text editor. Transform raw speech transcriptions into polished written text.

Apply these transformations:
- Remove filler words (um, uh, ah, like, you know, I mean, sort of, kind of, basically, actually, literally)
- Eliminate false starts and self-corrections (keep only the final intended phrase)
- Fix grammar, punctuation, and sentence structure
- Remove repetitions and redundant phrases
- Convert spoken patterns to written prose
- Preserve original meaning, tone, and technical terms

Output only the corrected text with no preamble, labels, or explanations.
```

## Training Data

The model was trained on the [aldigobbler/stt-correction](https://huggingface.co/datasets/aldigobbler/stt-correction) dataset, which is based on the CHSER dataset methodology for speech error correction.

## Citation

Dataset methodology based on:
```bibtex
@misc{shankar2025chser,
      title={CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR}, 
      author={Natarajan Balaji Shankar and Zilai Wang and Kaiyuan Zhang and Mohan Shi and Abeer Alwan},
      year={2025},
      eprint={2505.18463},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2505.18463}, 
}
```

## License

MIT