Instructions to use fixie-ai/ultravox-v0_7-glm-4_6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fixie-ai/ultravox-v0_7-glm-4_6 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("fixie-ai/ultravox-v0_7-glm-4_6", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -128,14 +128,14 @@ Supervised speech instruction finetuning via knowledge-distillation. For more in
|
|
| 128 |
Evaluations are conducted [big bench audio](https://huggingface.co/blog/big-bench-audio-release) (audio reasoning measured in accuracy), [VoiceBench](https://github.com/MatthewCYM/VoiceBench) (overall score averaged across multiple evaluations), as well as on covost2 (speech translation measured in BLEU), and LibriSpeech (speech recognition measured in WER).
|
| 129 |
|
| 130 |
### Audio Reasoning & General Understanding
|
| 131 |
-
| | **v0_7-glm w/ reasoning** | **v0_7-glm w/o reasoning** | v0_6-llama-3_3-70b | v0_6-gemma-3-27b | v0_6-qwen-3-32b | **gpt4o-audio** |
|
| 132 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 133 |
| **big bench audio** | **97.00** | **91.80** | 85.48 | 83.84 | 84.22 | 82.80 |
|
| 134 |
| **voicebench overall** | **90.75** | **87.05** | 81.81 | – | – | 86.75 |
|
| 135 |
|
| 136 |
|
| 137 |
-
### Speech
|
| 138 |
-
| | **v0_7-glm** | v0_6-llama-3_3-70b | v0_6-gemma-3-27b | v0_6-qwen-3-32b |
|
| 139 |
| --- | ---: | ---: | ---: | ---: |
|
| 140 |
| **covost2 en_ar** | **22.89** | 18.92 | 22.68 | 16.91 |
|
| 141 |
| **covost2 en_ca** | **41.48** | 38.73 | 39.67 | 33.63 |
|
|
|
|
| 128 |
Evaluations are conducted [big bench audio](https://huggingface.co/blog/big-bench-audio-release) (audio reasoning measured in accuracy), [VoiceBench](https://github.com/MatthewCYM/VoiceBench) (overall score averaged across multiple evaluations), as well as on covost2 (speech translation measured in BLEU), and LibriSpeech (speech recognition measured in WER).
|
| 129 |
|
| 130 |
### Audio Reasoning & General Understanding
|
| 131 |
+
| | **v0_7-glm-4_6 w/ reasoning** | **v0_7-glm-4_6 w/o reasoning** | v0_6-llama-3_3-70b | v0_6-gemma-3-27b | v0_6-qwen-3-32b | **gpt4o-audio** |
|
| 132 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 133 |
| **big bench audio** | **97.00** | **91.80** | 85.48 | 83.84 | 84.22 | 82.80 |
|
| 134 |
| **voicebench overall** | **90.75** | **87.05** | 81.81 | – | – | 86.75 |
|
| 135 |
|
| 136 |
|
| 137 |
+
### Speech Translation & Recognition
|
| 138 |
+
| | **v0_7-glm-4_6** | v0_6-llama-3_3-70b | v0_6-gemma-3-27b | v0_6-qwen-3-32b |
|
| 139 |
| --- | ---: | ---: | ---: | ---: |
|
| 140 |
| **covost2 en_ar** | **22.89** | 18.92 | 22.68 | 16.91 |
|
| 141 |
| **covost2 en_ca** | **41.48** | 38.73 | 39.67 | 33.63 |
|