--- language: - ml license: apache-2.0 tags: - whisper - whisper.cpp - ggml - quantized - malayalam - asr - speech-recognition - on-device base_model: thennal/whisper-medium-ml model-index: - name: Whisper Medium Malayalam GGML results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: ml split: test metrics: - type: wer value: 11.49 name: WER (with normalization) - type: wer value: 38.62 name: WER (without normalization) - type: cer value: 7.33 name: CER --- # Whisper Medium Malayalam - GGML Format This is a GGML-converted version of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) optimized for use with [whisper.cpp](https://github.com/ggml-org/whisper.cpp). **Key Features:** - 🚀 Multiple quantized versions (Q4, Q5, Q8) for different use cases - 📱 Optimized for on-device, offline inference - ⚡ Up to 85% size reduction with quantization - 🎯 Malayalam language specialization - 💻 Cross-platform support (CPU, Metal, CUDA, etc.) ## Model Details - **Base Model:** OpenAI Whisper Medium - **Language:** Malayalam - **Task:** Automatic Speech Recognition (ASR) - **Format:** GGML (converted from PyTorch) - **Source:** Fine-tuned on Common Voice 11.0 dataset ## Available Model Variants This repository provides multiple quantized versions optimized for different use cases: | Model | Size | Use Case | Quality | |-------|------|----------|---------| | `ggml-model.bin` | 1.4 GB | Original conversion (F16) | Highest quality | | `ggml-model-q8_0.bin` | 785 MB | High quality, smaller size | Very high quality | | `ggml-model-q5_0.bin` | 514 MB | **Recommended** - Balanced quality/size | Good quality | | `ggml-model-q4_0.bin` | 424 MB | Smallest size, faster inference | Acceptable quality | **Recommendation:** For most users, `ggml-model-q5_0.bin` offers the best balance between quality and file size. ## Performance (from source model) - **Word Error Rate (WER):** 38.62% (without normalization) - **Character Error Rate (CER):** 7.33% - **WER with normalization:** 11.49% Note: Whisper's normalization has significant issues for Malayalam language. ## Usage with whisper.cpp ### Prerequisites ```bash git clone https://github.com/ggml-org/whisper.cpp cd whisper.cpp make ``` ### Download the model Download one of the model files from this repository and place it in the `models` directory of whisper.cpp: - **Recommended:** `ggml-model-q5_0.bin` (514 MB) - **Smallest:** `ggml-model-q4_0.bin` (424 MB) - **Highest quality:** `ggml-model-q8_0.bin` (785 MB) - **Original:** `ggml-model.bin` (1.4 GB) ### Run inference ```bash # Using the recommended Q5_0 model ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml # Or using any other variant ./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml ``` Where: - `-m` specifies the model file - `-f` specifies the input audio file (must be 16-bit WAV) - `-l ml` sets the language to Malayalam ### Additional options ```bash # Translate to English ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr # Output in different formats ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt # SubRip subtitles ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt # WebVTT subtitles ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt # Plain text ``` ## Conversion Details This model was converted from the HuggingFace transformers format to GGML using the `convert-h5-to-ggml.py` script from whisper.cpp. ### Quantization Details The quantized models were created using whisper.cpp's quantization tool: - **Q8_0**: 8-bit quantization, retains ~99% of original quality - **Q5_0**: 5-bit quantization, excellent quality/size balance (~73% size reduction) - **Q4_0**: 4-bit quantization, maximum compression (~85% size reduction) All quantized models maintain the full model architecture and can be used as drop-in replacements. ## Training Data The source model was fine-tuned on multiple Malayalam speech datasets: - [Mozilla Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (ml) - [Google FLEURS](https://huggingface.co/datasets/google/fleurs) - [IMaSC](https://huggingface.co/datasets/thennal/IMaSC) - [ULCA Malayalam](https://huggingface.co/datasets/thennal/ulca_ml) - [MSC](https://huggingface.co/datasets/thennal/msc) - [Indic TTS Malayalam](https://huggingface.co/datasets/thennal/indic_tts_ml) ## Citation If you use this model, please cite: ```bibtex @misc{whisper-medium-ml-ggml, author = {Thennal D K}, title = {Whisper Medium Malayalam - GGML Format}, year = {2024}, publisher = {HuggingFace}, journal = {HuggingFace Model Hub}, howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}}, note = {GGML conversion with quantization} } @misc{radford2022whisper, title={Robust Speech Recognition via Large-Scale Weak Supervision}, author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever}, year={2022}, eprint={2212.04356}, archivePrefix={arXiv}, primaryClass={eess.AS} } ``` ## License Apache 2.0 - Same as the original Whisper model and fine-tuned version. ## Acknowledgments This model builds upon the work of many contributors: ### Original Model & Framework - **OpenAI Whisper Team** - For the groundbreaking Whisper ASR model ([paper](https://arxiv.org/abs/2212.04356), [code](https://github.com/openai/whisper)) - **Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever** - Whisper authors ### Malayalam Fine-tuning - **[Thennal D K](https://huggingface.co/thennal)** - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace - **Original model:** [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) - Training resources: [Fine-tuning Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb) ### Datasets - **Mozilla Foundation** - Common Voice 11.0 Malayalam dataset - **Google** - FLEURS multilingual dataset - **Community contributors** - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets ### GGML Implementation - **[whisper.cpp team](https://github.com/ggml-org/whisper.cpp)** - For the efficient C/C++ implementation and GGML format - **[ggml-org](https://github.com/ggml-org)** - For the GGML machine learning library ### Tools & Frameworks - **HuggingFace Transformers** - Model training and inference framework - **PyTorch** - Deep learning framework --- **Special Thanks:** This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.