Sunflower QA · Cactus INT4
On-device Cactus quantization of Sunbird/sunbirdtutor-gemma-4-e2b, our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the Sunflower educational assistant app.
A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.
What's in the bundle
This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the Cactus FFI on Android, iOS, and desktop.
Quantization recipe:
| Component | Precision |
|---|---|
| Decoder weights | INT4 |
| Audio tower (Gemma 4 native) | FP16, preserved |
| Vision tower | Removed |
| Embeddings / LM head | INT4 |
Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at ~3.8 GB on disk, with native audio understanding intact.
Languages
| ISO 639-3 | Language |
|---|---|
eng |
English |
lug |
Luganda |
xog |
Lusoga |
nyn |
Runyankole |
nyo |
Lunyoro |
ach |
Acholi |
teo |
Ateso |
For per-language quality tiers (Luganda strongest, Acholi second), see the Sunbird Tutor base model card.
Intended use
Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda.
How to use
This checkpoint is Cactus-format only. For transformers, use the Sunbird Tutor base model.
In the Sunflower app
Open the model picker in the Sunflower app and select Speech Q&A. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.
Direct via Cactus FFI
final cactus = Cactus();
await cactus.init(
modelPath: '$documentsDir/models/sunflower-qa-cactus-int4/model.cactus',
);
final response = await cactus.completion(
messages: [
{
'role': 'system',
'content':
'You are an educational assistant that can give explanations, '
'transcriptions and translations in Ugandan languages.',
},
{
'role': 'user',
'content': '',
'audio_path': '/path/to/16khz_mono.wav',
},
],
);
The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim.
Prompt routing per mode
| Mode | System prompt | User content |
|---|---|---|
| Answer (default) | Educational assistant string above | empty ("") |
| Transcribe | Educational assistant string above | "Transcribe this audio." |
| Translate | Educational assistant string above | "Translate this audio into {target language}." |
| Explain | Educational assistant string above | "Explain what was said in this audio." |
The canonical runtime strings live in lib/model_settings_sheet.dart inside the Sunflower app.
Performance
Measured on Pixel 10 CPU, five-second voice turn:
| Metric | Value |
|---|---|
| Time to first token | ~2 s |
| End-to-end (audio in → text answer) | ~12 s |
| Bundle size on disk | 3.8 GB |
Training
This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the Sunbird Tutor base model card and the training repository.
Limitations
Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input.
Related artifacts
- Sunbird/sunbirdtutor-gemma-4-e2b: transformers-format base model.
- SunbirdAI/sunbird-tutor-modelling: training code, data pipeline, evaluation harness.
- SunbirdAI/sunflower-app: Android app that ships this bundle on-device.
Acknowledgements
Built by the Sunbird AI team. Base model: Sunbird Tutor. Foundation model: Google's Gemma 4 E2B. Inference engine: Cactus.
Citation
@misc{sunflower-qa-cactus-2026,
author = {Sunbird AI},
title = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)},
year = {2026},
url = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4}
}
Built for the Kaggle Gemma 4 Good Hackathon, 2026.
Model tree for ak3ra/sunflower-qa-cactus-int4
Base model
Sunbird/sunbirdtutor-gemma-4-e2b