---
license: apache-2.0
models: 
- CohereLabs/cohere-transcribe-03-2026
---

# Cohere Transcribe 03-2026 (ONNX)

ONNX export of [CohereLabs/cohere-transcribe-03-2026](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) for inference without PyTorch.

## Architecture

The model is split into three logical stages, all self-contained `.onnx` files:

| File | Role | Size |
| ---- | ---- | ---- |
| `encoder-0.onnx` | Conv subsampling + positional encoding + conformer layers 0-8 | ~1.3 GB |
| `encoder-1.onnx` | Conformer layers 9-16 | ~1.3 GB |
| `encoder-2.onnx` | Conformer layers 17-24 | ~1.3 GB |
| `encoder-3.onnx` | Conformer layers 25-32 + encoder-decoder projection | ~1.3 GB |
| `cross_kv.onnx` | Project encoder output to cross-attention K/V for all 8 decoder layers | ~72 MB |
| `decoder.onnx` | Autoregressive transformer decoder with KV cache | ~580 MB |

**Inference pipeline:** mel features → encoder splits (chained) → cross_kv → decoder (autoregressive loop).

## Setup

```bash
pip install onnx onnxruntime torch transformers librosa soundfile sentencepiece datasets torchcodec
```

## Export

```bash
python export_onnx.py
```

## Transcribe

```bash
python transcribe.py                # download random en/es demo samples and transcribe
python transcribe.py audio.wav
python transcribe.py audio_dir/
python transcribe.py audio.wav es   # language code
```

Output includes per-file RTF (real-time factor). RTF < 1.0 means faster than real-time.