---
license: gemma
language:
- en
- lg
- xog
- nyn
- nyo
- ach
- teo
tags:
- audio
- multimodal
- gemma
- gemma-4
- cactus
- on-device
- mobile
- ugandan-languages
- low-resource-languages
- speech-qa
library_name: cactus
pipeline_tag: audio-text-to-text
datasets:
- google/WaxalNLP
- Sunbird/salt
- google/fleurs
metrics:
- chrf
- bleu
- wer
- cer
base_model:
- Sunbird/sunbirdtutor-gemma-4-e2b
---

# Sunflower QA · Cactus INT4

On-device Cactus quantization of [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b), our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the [Sunflower educational assistant app](https://github.com/SunbirdAI/sunflower-app).

A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.

## What's in the bundle

This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the [Cactus](https://github.com/cactus-compute/cactus) FFI on Android, iOS, and desktop.

Quantization recipe:

| Component | Precision |
|---|---|
| Decoder weights | INT4 |
| Audio tower (Gemma 4 native) | FP16, preserved |
| Vision tower | Removed |
| Embeddings / LM head | INT4 |

Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at **~3.8 GB on disk**, with native audio understanding intact.

## Languages

| ISO 639-3 | Language |
|---|---|
| `eng` | English |
| `lug` | Luganda |
| `xog` | Lusoga |
| `nyn` | Runyankole |
| `nyo` | Lunyoro |
| `ach` | Acholi |
| `teo` | Ateso |

For per-language quality tiers (Luganda strongest, Acholi second), see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).

## Intended use

Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda.

## How to use

This checkpoint is Cactus-format only. For transformers, use the [Sunbird Tutor base model](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).

### In the Sunflower app

Open the model picker in the Sunflower app and select **Speech Q&A**. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.

### Direct via Cactus FFI

```dart
final cactus = Cactus();
await cactus.init(
  modelPath: '$documentsDir/models/sunflower-qa-cactus-int4/model.cactus',
);

final response = await cactus.completion(
  messages: [
    {
      'role': 'system',
      'content':
          'You are an educational assistant that can give explanations, '
          'transcriptions and translations in Ugandan languages.',
    },
    {
      'role': 'user',
      'content': '',
      'audio_path': '/path/to/16khz_mono.wav',
    },
  ],
);
```

The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim.

### Prompt routing per mode

| Mode             | System prompt                      | User content                                     |
| ---------------- | ---------------------------------- | ------------------------------------------------ |
| Answer (default) | Educational assistant string above | empty (`""`)                                     |
| Transcribe       | Educational assistant string above | `"Transcribe this audio."`                       |
| Translate        | Educational assistant string above | `"Translate this audio into {target language}."` |
| Explain          | Educational assistant string above | `"Explain what was said in this audio."`         |

The canonical runtime strings live in [`lib/model_settings_sheet.dart`](https://github.com/SunbirdAI/sunflower-app/blob/master/lib/model_settings_sheet.dart) inside the Sunflower app.

## Performance

Measured on Pixel 10 CPU, five-second voice turn:

| Metric                              | Value  |
| ----------------------------------- | ------ |
| Time to first token                 | ~2 s   |
| End-to-end (audio in → text answer) | ~12 s  |
| Bundle size on disk                 | 3.8 GB |

## Training

This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b) and the [training repository](https://github.com/SunbirdAI/sunbird-tutor-modelling).

## Limitations

Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input.

## Related artifacts

- [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b): transformers-format base model.
- [SunbirdAI/sunbird-tutor-modelling](https://github.com/SunbirdAI/sunbird-tutor-modelling): training code, data pipeline, evaluation harness.
- [SunbirdAI/sunflower-app](https://github.com/SunbirdAI/sunflower-app): Android app that ships this bundle on-device.

## Acknowledgements

Built by the Sunbird AI team. Base model: [Sunbird Tutor](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). Foundation model: Google's Gemma 4 E2B. Inference engine: [Cactus](https://github.com/cactus-compute/cactus).

## Citation

```bibtex
@misc{sunflower-qa-cactus-2026,
  author = {Sunbird AI},
  title  = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)},
  year   = {2026},
  url    = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4}
}
```

Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.