Can Cohere somehow handle punctuation and capitalization on chunk boundaries?

#39

by nenad1002 - opened 19 days ago

Discussion

nenad1002

19 days ago

•

edited 18 days ago

Hi Cohere team,

I am using the Cohere transcription model and have been very impressed with the quality, especially on chunks up to around 30 seconds. Word accuracy remains very strong when I use Silero VAD to split longer audio into smaller segments.

However, I am running into an issue with punctuation and capitalization consistency across VAD chunk boundaries. Because each chunk is transcribed independently, the model sometimes treats an artificial chunk boundary as a sentence boundary.

For example, if the original sentence is:

“Best of them come from the UK.”

but the audio is split between “them” and “come,” the output may become:

“Best of them. Come from the UK.”

The recognized words are correct, but the punctuation and capitalization become incorrect because the model does not have enough cross-chunk context.

Do you have any recommended approach for handling this? Increasing required silence duration or increasing chunks padding did not help as it's treating symptoms rather than the cause.

This model is currently being integrated into Microsoft's Foundry Local for large-scale local inference scenarios, models we expose often hit millions of people in use-case, so having a robust and production-ready solution for cross-chunk punctuation consistency would be extremely valuable.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment