---
language:
- ml
license: apache-2.0
tags:
- whisper
- whisper.cpp
- ggml
- quantized
- malayalam
- asr
- speech-recognition
- on-device
base_model: thennal/whisper-medium-ml
model-index:
- name: Whisper Medium Malayalam GGML
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: Common Voice 11.0
      type: mozilla-foundation/common_voice_11_0
      config: ml
      split: test
    metrics:
    - type: wer
      value: 11.49
      name: WER (with normalization)
    - type: wer
      value: 38.62
      name: WER (without normalization)
    - type: cer
      value: 7.33
      name: CER
---

# Whisper Medium Malayalam - GGML Format

This is a GGML-converted version of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) optimized for use with [whisper.cpp](https://github.com/ggml-org/whisper.cpp).

**Key Features:**
- 🚀 Multiple quantized versions (Q4, Q5, Q8) for different use cases
- 📱 Optimized for on-device, offline inference
- ⚡ Up to 85% size reduction with quantization
- 🎯 Malayalam language specialization
- 💻 Cross-platform support (CPU, Metal, CUDA, etc.)

## Model Details

- **Base Model:** OpenAI Whisper Medium
- **Language:** Malayalam
- **Task:** Automatic Speech Recognition (ASR)
- **Format:** GGML (converted from PyTorch)
- **Source:** Fine-tuned on Common Voice 11.0 dataset

## Available Model Variants

This repository provides multiple quantized versions optimized for different use cases:

| Model | Size | Use Case | Quality |
|-------|------|----------|---------|
| `ggml-model.bin` | 1.4 GB | Original conversion (F16) | Highest quality |
| `ggml-model-q8_0.bin` | 785 MB | High quality, smaller size | Very high quality |
| `ggml-model-q5_0.bin` | 514 MB | **Recommended** - Balanced quality/size | Good quality |
| `ggml-model-q4_0.bin` | 424 MB | Smallest size, faster inference | Acceptable quality |

**Recommendation:** For most users, `ggml-model-q5_0.bin` offers the best balance between quality and file size.

## Performance (from source model)

- **Word Error Rate (WER):** 38.62% (without normalization)
- **Character Error Rate (CER):** 7.33%
- **WER with normalization:** 11.49%

Note: Whisper's normalization has significant issues for Malayalam language.

## Usage with whisper.cpp

### Prerequisites

```bash
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make
```

### Download the model

Download one of the model files from this repository and place it in the `models` directory of whisper.cpp:

- **Recommended:** `ggml-model-q5_0.bin` (514 MB)
- **Smallest:** `ggml-model-q4_0.bin` (424 MB)
- **Highest quality:** `ggml-model-q8_0.bin` (785 MB)
- **Original:** `ggml-model.bin` (1.4 GB)

### Run inference

```bash
# Using the recommended Q5_0 model
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml

# Or using any other variant
./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml
```

Where:
- `-m` specifies the model file
- `-f` specifies the input audio file (must be 16-bit WAV)
- `-l ml` sets the language to Malayalam

### Additional options

```bash
# Translate to English
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr

# Output in different formats
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt  # SubRip subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt  # WebVTT subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt  # Plain text
```

## Conversion Details

This model was converted from the HuggingFace transformers format to GGML using the `convert-h5-to-ggml.py` script from whisper.cpp.

### Quantization Details

The quantized models were created using whisper.cpp's quantization tool:

- **Q8_0**: 8-bit quantization, retains ~99% of original quality
- **Q5_0**: 5-bit quantization, excellent quality/size balance (~73% size reduction)
- **Q4_0**: 4-bit quantization, maximum compression (~85% size reduction)

All quantized models maintain the full model architecture and can be used as drop-in replacements.

## Training Data

The source model was fine-tuned on multiple Malayalam speech datasets:
- [Mozilla Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (ml)
- [Google FLEURS](https://huggingface.co/datasets/google/fleurs)
- [IMaSC](https://huggingface.co/datasets/thennal/IMaSC)
- [ULCA Malayalam](https://huggingface.co/datasets/thennal/ulca_ml)
- [MSC](https://huggingface.co/datasets/thennal/msc)
- [Indic TTS Malayalam](https://huggingface.co/datasets/thennal/indic_tts_ml)

## Citation

If you use this model, please cite:

```bibtex
@misc{whisper-medium-ml-ggml,
  author = {Thennal D K},
  title = {Whisper Medium Malayalam - GGML Format},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
  note = {GGML conversion with quantization}
}

@misc{radford2022whisper,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
  year={2022},
  eprint={2212.04356},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}
```

## License

Apache 2.0 - Same as the original Whisper model and fine-tuned version.

## Acknowledgments

This model builds upon the work of many contributors:

### Original Model & Framework
- **OpenAI Whisper Team** - For the groundbreaking Whisper ASR model ([paper](https://arxiv.org/abs/2212.04356), [code](https://github.com/openai/whisper))
- **Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever** - Whisper authors

### Malayalam Fine-tuning
- **[Thennal D K](https://huggingface.co/thennal)** - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace
- **Original model:** [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml)
- Training resources: [Fine-tuning Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb)

### Datasets
- **Mozilla Foundation** - Common Voice 11.0 Malayalam dataset
- **Google** - FLEURS multilingual dataset
- **Community contributors** - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets

### GGML Implementation
- **[whisper.cpp team](https://github.com/ggml-org/whisper.cpp)** - For the efficient C/C++ implementation and GGML format
- **[ggml-org](https://github.com/ggml-org)** - For the GGML machine learning library

### Tools & Frameworks
- **HuggingFace Transformers** - Model training and inference framework
- **PyTorch** - Deep learning framework

---

**Special Thanks:** This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.