--- license: apache-2.0 models: - CohereLabs/cohere-transcribe-03-2026 --- # Cohere Transcribe 03-2026 (ONNX) ONNX export of [CohereLabs/cohere-transcribe-03-2026](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) for inference without PyTorch. ## Architecture The model is split into three logical stages, all self-contained `.onnx` files: | File | Role | Size | | ---- | ---- | ---- | | `encoder-0.onnx` | Conv subsampling + positional encoding + conformer layers 0-8 | ~1.3 GB | | `encoder-1.onnx` | Conformer layers 9-16 | ~1.3 GB | | `encoder-2.onnx` | Conformer layers 17-24 | ~1.3 GB | | `encoder-3.onnx` | Conformer layers 25-32 + encoder-decoder projection | ~1.3 GB | | `cross_kv.onnx` | Project encoder output to cross-attention K/V for all 8 decoder layers | ~72 MB | | `decoder.onnx` | Autoregressive transformer decoder with KV cache | ~580 MB | **Inference pipeline:** mel features → encoder splits (chained) → cross_kv → decoder (autoregressive loop). ## Setup ```bash pip install onnx onnxruntime torch transformers librosa soundfile sentencepiece datasets torchcodec ``` ## Export ```bash python export_onnx.py ``` ## Transcribe ```bash python transcribe.py # download random en/es demo samples and transcribe python transcribe.py audio.wav python transcribe.py audio_dir/ python transcribe.py audio.wav es # language code ``` Output includes per-file RTF (real-time factor). RTF < 1.0 means faster than real-time.