Quick Cohere finetune for Japanese focused on general/anime domains

GSPO with CE
No HPO
2^17 steps
64 bsz
Each sequence sample 1 greedy/7 temp1,topk20
Audio length 0-35s, avg ~25.3s

Bigger but faster than whisper-large-v3-turbo from more subsampling

See my most recent whisper finetune for more evals

	tedx				jsut-book
	cer	sr	ir	dr	cer	sr	ir	dr
whisper-tiny	31.3	18.9	7.2	5.2	32.4	22.2	3.3	6.8
whisper-base	21.2	12.2	5.1	3.9	27.0	18.1	2.7	6.2
whisper-small	13.7	7.0	3.6	3.2	22.9	14.7	1.7	6.5
whisper-medium	11.0	5.0	3.0	3.0	19.9	12.5	1.4	6.0
whisper-large-v1	10.4	4.6	2.7	3.1	19.9	12.7	1.6	5.6
whisper-large-v2	9.9	4.2	2.8	2.9	18.6	11.1	1.2	6.4
whisper-large-v3	9.2	3.8	2.4	2.9	19.2	11.6	1.1	6.5
whisper-turbo	9.5	4.2	2.6	2.8	19.6	12.1	0.8	6.7
cohere	22.0	5.0	6.0	11.1	29.8	9.9	1.9	18.0

cohere-ja	14.4	4.7	7.5	2.2	18.3	11.2	0.9	6.3

cv/reazon removed due to contamination

	fleurs				jsut-basic				bluearchive				nekopara
	cer	sr	ir	dr	cer	sr	ir	dr	cer	sr	ir	dr	cer	sr	ir	dr
nue-asr_b1	9.4	4.5	1.4	3.5	8.9	6.0	1.9	1.0	24.7	5.3	1.4	18.1	36.0	13.8	1.9	20.3
nue-asr_b5	8.1	3.9	1.0	3.1	8.7	5.8	1.9	1.0	23.9	5.3	1.6	17.1	35.3	13.9	2.0	19.4
reazon-k2-v2_b1	9.5	4.5	1.0	4.0	6.9	4.7	1.3	0.8	10.7	4.1	1.2	5.4	26.3	10.8	1.2	14.3
reazon-k2-v2_b5	9.5	4.5	1.0	4.1	6.9	4.8	1.4	0.8	10.9	4.1	1.3	5.5	26.2	10.9	1.3	13.9
reazon-espnet-v2	5.9	3.5	1.2	1.3	7.0	5.0	1.3	0.7	9.1	4.5	1.4	3.1	26.3	12.5	1.5	12.4
reazon-nemo-v2	6.3	3.7	1.0	1.5	8.5	5.3	1.8	1.4	17.8	5.8	2.2	9.8	34.2	12.5	2.6	19.1
parakeet-ctc	6.0	2.7	1.0	2.3	6.7	4.4	1.4	1.0	9.7	5.3	2.2	2.3	26.3	12.4	2.5	11.3
parakeet-tdt	6.0	2.7	1.0	2.3	6.6	4.4	1.4	0.8	9.6	5.4	2.2	2.1	26.0	13.0	2.6	10.5

granite-4.0_b1	6.5	4.3	1.2	0.9	10.0	5.3	3.9	0.8	40.6	5.5	30.8	4.3	78.4	18.7	48.8	10.9
granite-4.0_b5	6.1	4.0	1.2	0.9	9.5	5.1	3.7	0.8	50.8	5.2	41.4	4.1	94.7	20.7	65.1	8.8
granite-4.0_b5_n5	7.4	4.5	1.2	1.6	7.8	5.3	1.6	0.8	12.1	4.8	2.4	4.8	33.6	16.7	4.3	12.6
granite-4.1_b1	6.6	4.3	1.1	1.1	7.6	5.4	1.4	0.8	24.9	5.9	15.4	3.6	84.1	15.0	54.7	14.4
granite-4.1_b5	6.1	4.1	1.0	1.0	7.5	5.3	1.4	0.8	33.6	5.5	24.3	3.8	213.2	16.6	184.0	12.6
granite-4.1_b5_n5	7.1	4.5	1.2	1.4	7.8	5.5	1.5	0.8	11.0	5.2	1.9	3.9	33.2	14.8	4.1	14.3

qwen3-asr-0.6b_b1	8.6	5.9	1.1	1.5	12.1	9.1	1.6	1.3	10.4	5.7	1.5	3.1	30.9	16.2	1.9	12.8
qwen3-asr-0.6b_b5	8.1	5.5	1.1	1.5	11.6	8.7	1.6	1.3	10.0	5.4	1.4	3.1	30.4	15.8	1.9	12.7
qwen3-asr-1.7b_b1	5.3	3.5	0.8	1.0	8.9	6.6	1.4	1.0	8.4	4.1	1.1	3.2	28.6	14.1	1.3	13.2
qwen3-asr-1.7b_b5	5.1	3.4	0.8	1.0	8.7	6.4	1.4	0.9	8.3	4.1	1.1	3.2	28.5	14.0	1.3	13.2
voxtral-480	9.4	4.4	0.8	4.3	18.8	10.5	4.8	3.4	8.8	4.0	1.1	3.7	30.4	11.5	1.4	17.5
voxtral-2400	5.5	3.7	0.6	1.2	10.2	6.5	1.7	2.1	7.9	3.6	1.0	3.3	28.4	10.9	1.3	16.1
large-v3_b1	6.6	3.5	2.1	1.0	7.2	5.2	1.0	0.9	12.2	4.2	6.5	1.5	68.9	14.8	45.8	8.4
large-v3_b5	4.7	3.0	0.7	0.9	7.2	5.2	1.1	0.9	10.8	4.2	5.1	1.5	62.3	15.4	38.8	8.2
cohere_b1	4.9	2.7	0.4	1.8	8.2	5.6	1.7	0.8	8.1	4.0	1.2	2.9	30.4	13.5	3.7	13.3
cohere_b5	4.8	2.6	0.4	1.8	8.2	5.6	1.7	0.8	8.1	3.9	1.2	3.0	29.2	12.6	2.0	14.6

cohere-ja_b1	5.2	2.8	0.4	1.9	6.9	4.9	1.1	0.8	6.8	3.9	1.5	1.4	24.9	14.0	3.2	7.7
cohere-ja_b5	5.0	2.7	0.4	1.8	6.8	4.9	1.1	0.8	6.8	3.8	1.4	1.6	24.5	13.4	2.1	9.0

Downloads last month: 183

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for efwkjn/cohere-asr-ja-v0.1

Base model

CohereLabs/cohere-transcribe-03-2026

Finetuned

(6)

this model

Quantizations

1 model