Quick Cohere finetune for Japanese focused on general/anime domains

  • GSPO with CE
  • No HPO
  • 2^17 steps
  • 64 bsz
  • Each sequence sample 1 greedy/7 temp1,topk20
  • Audio length 0-35s, avg ~25.3s

Bigger but faster than whisper-large-v3-turbo from more subsampling

See my most recent whisper finetune for more evals

tedx jsut-book
cer sr ir dr cer sr ir dr
whisper-tiny 31.3 18.9 7.2 5.2 32.4 22.2 3.3 6.8
whisper-base 21.2 12.2 5.1 3.9 27.0 18.1 2.7 6.2
whisper-small 13.7 7.0 3.6 3.2 22.9 14.7 1.7 6.5
whisper-medium 11.0 5.0 3.0 3.0 19.9 12.5 1.4 6.0
whisper-large-v1 10.4 4.6 2.7 3.1 19.9 12.7 1.6 5.6
whisper-large-v2 9.9 4.2 2.8 2.9 18.6 11.1 1.2 6.4
whisper-large-v3 9.2 3.8 2.4 2.9 19.2 11.6 1.1 6.5
whisper-turbo 9.5 4.2 2.6 2.8 19.6 12.1 0.8 6.7
cohere 22.0 5.0 6.0 11.1 29.8 9.9 1.9 18.0
cohere-ja 14.4 4.7 7.5 2.2 18.3 11.2 0.9 6.3

cv/reazon removed due to contamination

fleurs jsut-basic bluearchive nekopara
cer sr ir dr cer sr ir dr cer sr ir dr cer sr ir dr
nue-asr_b1 9.4 4.5 1.4 3.5 8.9 6.0 1.9 1.0 24.7 5.3 1.4 18.1 36.0 13.8 1.9 20.3
nue-asr_b5 8.1 3.9 1.0 3.1 8.7 5.8 1.9 1.0 23.9 5.3 1.6 17.1 35.3 13.9 2.0 19.4
reazon-k2-v2_b1 9.5 4.5 1.0 4.0 6.9 4.7 1.3 0.8 10.7 4.1 1.2 5.4 26.3 10.8 1.2 14.3
reazon-k2-v2_b5 9.5 4.5 1.0 4.1 6.9 4.8 1.4 0.8 10.9 4.1 1.3 5.5 26.2 10.9 1.3 13.9
reazon-espnet-v2 5.9 3.5 1.2 1.3 7.0 5.0 1.3 0.7 9.1 4.5 1.4 3.1 26.3 12.5 1.5 12.4
reazon-nemo-v2 6.3 3.7 1.0 1.5 8.5 5.3 1.8 1.4 17.8 5.8 2.2 9.8 34.2 12.5 2.6 19.1
parakeet-ctc 6.0 2.7 1.0 2.3 6.7 4.4 1.4 1.0 9.7 5.3 2.2 2.3 26.3 12.4 2.5 11.3
parakeet-tdt 6.0 2.7 1.0 2.3 6.6 4.4 1.4 0.8 9.6 5.4 2.2 2.1 26.0 13.0 2.6 10.5
granite-4.0_b1 6.5 4.3 1.2 0.9 10.0 5.3 3.9 0.8 40.6 5.5 30.8 4.3 78.4 18.7 48.8 10.9
granite-4.0_b5 6.1 4.0 1.2 0.9 9.5 5.1 3.7 0.8 50.8 5.2 41.4 4.1 94.7 20.7 65.1 8.8
granite-4.0_b5_n5 7.4 4.5 1.2 1.6 7.8 5.3 1.6 0.8 12.1 4.8 2.4 4.8 33.6 16.7 4.3 12.6
granite-4.1_b1 6.6 4.3 1.1 1.1 7.6 5.4 1.4 0.8 24.9 5.9 15.4 3.6 84.1 15.0 54.7 14.4
granite-4.1_b5 6.1 4.1 1.0 1.0 7.5 5.3 1.4 0.8 33.6 5.5 24.3 3.8 213.2 16.6 184.0 12.6
granite-4.1_b5_n5 7.1 4.5 1.2 1.4 7.8 5.5 1.5 0.8 11.0 5.2 1.9 3.9 33.2 14.8 4.1 14.3
qwen3-asr-0.6b_b1 8.6 5.9 1.1 1.5 12.1 9.1 1.6 1.3 10.4 5.7 1.5 3.1 30.9 16.2 1.9 12.8
qwen3-asr-0.6b_b5 8.1 5.5 1.1 1.5 11.6 8.7 1.6 1.3 10.0 5.4 1.4 3.1 30.4 15.8 1.9 12.7
qwen3-asr-1.7b_b1 5.3 3.5 0.8 1.0 8.9 6.6 1.4 1.0 8.4 4.1 1.1 3.2 28.6 14.1 1.3 13.2
qwen3-asr-1.7b_b5 5.1 3.4 0.8 1.0 8.7 6.4 1.4 0.9 8.3 4.1 1.1 3.2 28.5 14.0 1.3 13.2
voxtral-480 9.4 4.4 0.8 4.3 18.8 10.5 4.8 3.4 8.8 4.0 1.1 3.7 30.4 11.5 1.4 17.5
voxtral-2400 5.5 3.7 0.6 1.2 10.2 6.5 1.7 2.1 7.9 3.6 1.0 3.3 28.4 10.9 1.3 16.1
large-v3_b1 6.6 3.5 2.1 1.0 7.2 5.2 1.0 0.9 12.2 4.2 6.5 1.5 68.9 14.8 45.8 8.4
large-v3_b5 4.7 3.0 0.7 0.9 7.2 5.2 1.1 0.9 10.8 4.2 5.1 1.5 62.3 15.4 38.8 8.2
cohere_b1 4.9 2.7 0.4 1.8 8.2 5.6 1.7 0.8 8.1 4.0 1.2 2.9 30.4 13.5 3.7 13.3
cohere_b5 4.8 2.6 0.4 1.8 8.2 5.6 1.7 0.8 8.1 3.9 1.2 3.0 29.2 12.6 2.0 14.6
cohere-ja_b1 5.2 2.8 0.4 1.9 6.9 4.9 1.1 0.8 6.8 3.9 1.5 1.4 24.9 14.0 3.2 7.7
cohere-ja_b5 5.0 2.7 0.4 1.8 6.8 4.9 1.1 0.8 6.8 3.8 1.4 1.6 24.5 13.4 2.1 9.0
Downloads last month
183
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for efwkjn/cohere-asr-ja-v0.1

Finetuned
(6)
this model
Quantizations
1 model