fixie-ai
/

ultravox-v0_7-glm-4_6

Audio-Text-to-Text

image-feature-extraction

Model card Files Files and versions

patricklifixie commited on Dec 8, 2025

Commit

92a7961

·

verified ·

1 Parent(s): 50735b6

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -102,7 +102,7 @@ Supervised speech instruction finetuning via knowledge-distillation. For more in
 ## Evaluation
-Evaluations are conducted [big bench audio](https://huggingface.co/blog/big-bench-audio-release) (audio reasoning measured in accuracy), [VoiceBench](https://github.com/MatthewCYM/VoiceBench) (overall score averaged across multiple evaluations), as well as on covost2 (speech translation measured in BLEU), and LibriSpeech (speech recognition measured in WER).
 ### Audio Reasoning & General Understanding
 | | **v0_7-glm-4_6 w/ reasoning** | **v0_7-glm-4_6 w/o reasoning** | v0_6-llama-3_3-70b | v0_6-gemma-3-27b | v0_6-qwen-3-32b | **gpt4o-audio** |
@@ -121,3 +121,12 @@ Evaluations are conducted [big bench audio](https://huggingface.co/blog/big-benc
 | **covost2 ru_en** | **50.30** | 43.73 | 49.29 | 47.08 |
 | **covost2 zh_en** | **23.85** | 17.81 | 20.88 | 22.24 |
 | **librispeech** | **2.28** | 2.55 | 2.73 | 2.88 |

 ## Evaluation
+Evaluations are conducted [big bench audio](https://huggingface.co/blog/big-bench-audio-release) (audio reasoning measured in accuracy) [^1], [VoiceBench](https://github.com/MatthewCYM/VoiceBench) (overall score averaged across multiple evaluations) [^2], as well as on covost2 (speech translation measured in BLEU), and LibriSpeech (speech recognition measured in WER).
 ### Audio Reasoning & General Understanding
 | | **v0_7-glm-4_6 w/ reasoning** | **v0_7-glm-4_6 w/o reasoning** | v0_6-llama-3_3-70b | v0_6-gemma-3-27b | v0_6-qwen-3-32b | **gpt4o-audio** |
 | **covost2 ru_en** | **50.30** | 43.73 | 49.29 | 47.08 |
 | **covost2 zh_en** | **23.85** | 17.81 | 20.88 | 22.24 |
 | **librispeech** | **2.28** | 2.55 | 2.73 | 2.88 |
+[^1]: VoiceBench prompt used:
+You are a helpful assistant. When answering questions:
+For multiple choice questions (A/B/C/D options):
+- End your response with "The answer is [X]" where X is the letter (A, B, C, or D)
+[^2]: BigBenchAudio prompt used:
+You are a helpful assistant. Answer the question at the end of your response.