Instructions to use CohereLabs/cohere-transcribe-03-2026 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CohereLabs/cohere-transcribe-03-2026 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="CohereLabs/cohere-transcribe-03-2026", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("CohereLabs/cohere-transcribe-03-2026", trust_remote_code=True) model = AutoModelForSpeechSeq2Seq.from_pretrained("CohereLabs/cohere-transcribe-03-2026", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Can Cohere somehow handle punctuation and capitalization on chunk boundaries?
Hi Cohere team,
I am using the Cohere transcription model and have been very impressed with the quality, especially on chunks up to around 30 seconds. Word accuracy remains very strong when I use Silero VAD to split longer audio into smaller segments.
However, I am running into an issue with punctuation and capitalization consistency across VAD chunk boundaries. Because each chunk is transcribed independently, the model sometimes treats an artificial chunk boundary as a sentence boundary.
For example, if the original sentence is:
“Best of them come from the UK.”
but the audio is split between “them” and “come,” the output may become:
“Best of them. Come from the UK.”
The recognized words are correct, but the punctuation and capitalization become incorrect because the model does not have enough cross-chunk context.
Do you have any recommended approach for handling this? Increasing required silence duration or increasing chunks padding did not help as it's treating symptoms rather than the cause.
This model is currently being integrated into Microsoft's Foundry Local for large-scale local inference scenarios, models we expose often hit millions of people in use-case, so having a robust and production-ready solution for cross-chunk punctuation consistency would be extremely valuable.