Instructions to use MERaLiON/MERaLiON-2-10B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MERaLiON/MERaLiON-2-10B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="MERaLiON/MERaLiON-2-10B", trust_remote_code=True)# Load model directly from transformers import AutoModelForSpeechSeq2Seq model = AutoModelForSpeechSeq2Seq.from_pretrained("MERaLiON/MERaLiON-2-10B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -86,7 +86,8 @@ We benchmark MERaLiON-2 series models with extended [AudioBench benchmark](https
|
|
| 86 |
|
| 87 |
**Better Automatic Speech Recognition (ASR) Accuracy**
|
| 88 |
|
| 89 |
-
MERaLiON-2-10B-ASR and MERaLiON-2-10B demonstrate leading performance in Singlish, Mandarin, Malay, Tamil, and other Southeast Asian languages, while maintaining competitive results in English compared to `Whisper-large-v3`. The following table shows the average transcription `Word Error Rate` by language for the MERaLiON family and other leading AudioLLMs. The `Private Dataset` includes a collection of Singapore's locally accented speeches with code-switch.
|
|
|
|
| 90 |
|
| 91 |
<style type="text/css">
|
| 92 |
#T_0910c th {
|
|
@@ -265,6 +266,7 @@ MERaLiON-2-10B-ASR and MERaLiON-2-10B demonstrate leading performance in Singlis
|
|
| 265 |
**Better Instruction Following and Audio Understanding**
|
| 266 |
|
| 267 |
**MERaLiON-2-10B** exhibits substantial advancements in speech and audio understanding, as well as paralinguistic tasks. Notably, it adeptly handles complex instructions and responds with enhanced flexibility, effectively preserving the pre-trained knowledge from Gemma during the audio fine-tuning process. This capability enables MERaLiON-2-10B to provide detailed explanations regarding speech content and the speaker's emotional state. Furthermore, with appropriate prompt adjustments, the model can assume various roles, such as a voice assistant, virtual caregiver, or an integral component of sophisticated multi-agent AI systems and software solutions.
|
|
|
|
| 268 |
|
| 269 |
<style type="text/css">
|
| 270 |
#T_b6ba8 th {
|
|
|
|
| 86 |
|
| 87 |
**Better Automatic Speech Recognition (ASR) Accuracy**
|
| 88 |
|
| 89 |
+
MERaLiON-2-10B-ASR and MERaLiON-2-10B demonstrate leading performance in Singlish, Mandarin, Malay, Tamil, and other Southeast Asian languages, while maintaining competitive results in English compared to `Whisper-large-v3`. The following table shows the average transcription `Word Error Rate` by language for the MERaLiON family and other leading AudioLLMs. The `Private Dataset` includes a collection of Singapore's locally accented speeches with code-switch.
|
| 90 |
+
Please visit [AudioBench benchmark](https://huggingface.co/spaces/MERaLiON/AudioBench-Leaderboard) for dataset-level evaluation results.
|
| 91 |
|
| 92 |
<style type="text/css">
|
| 93 |
#T_0910c th {
|
|
|
|
| 266 |
**Better Instruction Following and Audio Understanding**
|
| 267 |
|
| 268 |
**MERaLiON-2-10B** exhibits substantial advancements in speech and audio understanding, as well as paralinguistic tasks. Notably, it adeptly handles complex instructions and responds with enhanced flexibility, effectively preserving the pre-trained knowledge from Gemma during the audio fine-tuning process. This capability enables MERaLiON-2-10B to provide detailed explanations regarding speech content and the speaker's emotional state. Furthermore, with appropriate prompt adjustments, the model can assume various roles, such as a voice assistant, virtual caregiver, or an integral component of sophisticated multi-agent AI systems and software solutions.
|
| 269 |
+
Please visit [AudioBench benchmark](https://huggingface.co/spaces/MERaLiON/AudioBench-Leaderboard) for dataset-level evaluation results.
|
| 270 |
|
| 271 |
<style type="text/css">
|
| 272 |
#T_b6ba8 th {
|