Sunflower QA · Cactus INT4

On-device Cactus quantization of Sunbird/sunbirdtutor-gemma-4-e2b, our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the Sunflower educational assistant app.

A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.

What's in the bundle

This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the Cactus FFI on Android, iOS, and desktop.

Quantization recipe:

Component Precision
Decoder weights INT4
Audio tower (Gemma 4 native) FP16, preserved
Vision tower Removed
Embeddings / LM head INT4

Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at ~3.8 GB on disk, with native audio understanding intact.

Languages

ISO 639-3 Language
eng English
lug Luganda
xog Lusoga
nyn Runyankole
nyo Lunyoro
ach Acholi
teo Ateso

For per-language quality tiers (Luganda strongest, Acholi second), see the Sunbird Tutor base model card.

Intended use

Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda.

How to use

This checkpoint is Cactus-format only. For transformers, use the Sunbird Tutor base model.

In the Sunflower app

Open the model picker in the Sunflower app and select Speech Q&A. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.

Direct via Cactus FFI

final cactus = Cactus();
await cactus.init(
  modelPath: '$documentsDir/models/sunflower-qa-cactus-int4/model.cactus',
);

final response = await cactus.completion(
  messages: [
    {
      'role': 'system',
      'content':
          'You are an educational assistant that can give explanations, '
          'transcriptions and translations in Ugandan languages.',
    },
    {
      'role': 'user',
      'content': '',
      'audio_path': '/path/to/16khz_mono.wav',
    },
  ],
);

The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim.

Prompt routing per mode

Mode System prompt User content
Answer (default) Educational assistant string above empty ("")
Transcribe Educational assistant string above "Transcribe this audio."
Translate Educational assistant string above "Translate this audio into {target language}."
Explain Educational assistant string above "Explain what was said in this audio."

The canonical runtime strings live in lib/model_settings_sheet.dart inside the Sunflower app.

Performance

Measured on Pixel 10 CPU, five-second voice turn:

Metric Value
Time to first token ~2 s
End-to-end (audio in → text answer) ~12 s
Bundle size on disk 3.8 GB

Training

This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the Sunbird Tutor base model card and the training repository.

Limitations

Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input.

Related artifacts

Acknowledgements

Built by the Sunbird AI team. Base model: Sunbird Tutor. Foundation model: Google's Gemma 4 E2B. Inference engine: Cactus.

Citation

@misc{sunflower-qa-cactus-2026,
  author = {Sunbird AI},
  title  = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)},
  year   = {2026},
  url    = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4}
}

Built for the Kaggle Gemma 4 Good Hackathon, 2026.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ak3ra/sunflower-qa-cactus-int4

Finetuned
(1)
this model

Datasets used to train ak3ra/sunflower-qa-cactus-int4