Sunflower QA · Cactus INT4

On-device Cactus quantization of Sunbird/sunbirdtutor-gemma-4-e2b, our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the Sunflower educational assistant app.

A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.

What's in the bundle

This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the Cactus FFI on Android, iOS, and desktop.

Quantization recipe:

Component	Precision
Decoder weights	INT4
Audio tower (Gemma 4 native)	FP16, preserved
Vision tower	Removed
Embeddings / LM head	INT4

Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at ~3.8 GB on disk, with native audio understanding intact.

Languages

ISO 639-3	Language
`eng`	English
`lug`	Luganda
`xog`	Lusoga
`nyn`	Runyankole
`nyo`	Lunyoro
`ach`	Acholi
`teo`	Ateso

For per-language quality tiers (Luganda strongest, Acholi second), see the Sunbird Tutor base model card.

Intended use

Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda.

How to use

This checkpoint is Cactus-format only. For transformers, use the Sunbird Tutor base model.

In the Sunflower app

Open the model picker in the Sunflower app and select Speech Q&A. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.

Direct via Cactus FFI

final cactus = Cactus();
await cactus.init(
  modelPath: '$documentsDir/models/sunflower-qa-cactus-int4/model.cactus',
);

final response = await cactus.completion(
  messages: [
    {
      'role': 'system',
      'content':
          'You are an educational assistant that can give explanations, '
          'transcriptions and translations in Ugandan languages.',
    },
    {
      'role': 'user',
      'content': '',
      'audio_path': '/path/to/16khz_mono.wav',
    },
  ],
);

The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim.

Prompt routing per mode

Mode	System prompt	User content
Answer (default)	Educational assistant string above	empty (`""`)
Transcribe	Educational assistant string above	`"Transcribe this audio."`
Translate	Educational assistant string above	`"Translate this audio into {target language}."`
Explain	Educational assistant string above	`"Explain what was said in this audio."`

The canonical runtime strings live in lib/model_settings_sheet.dart inside the Sunflower app.

Performance

Measured on Pixel 10 CPU, five-second voice turn:

Metric	Value
Time to first token	~2 s
End-to-end (audio in → text answer)	~12 s
Bundle size on disk	3.8 GB

Training

This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the Sunbird Tutor base model card and the training repository.

Limitations

Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input.

Related artifacts

Sunbird/sunbirdtutor-gemma-4-e2b: transformers-format base model.
SunbirdAI/sunbird-tutor-modelling: training code, data pipeline, evaluation harness.
SunbirdAI/sunflower-app: Android app that ships this bundle on-device.

Acknowledgements

Built by the Sunbird AI team. Base model: Sunbird Tutor. Foundation model: Google's Gemma 4 E2B. Inference engine: Cactus.

Citation

@misc{sunflower-qa-cactus-2026,
  author = {Sunbird AI},
  title  = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)},
  year   = {2026},
  url    = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4}
}

Built for the Kaggle Gemma 4 Good Hackathon, 2026.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ak3ra/sunflower-qa-cactus-int4

Base model

Sunbird/sunbirdtutor-gemma-4-e2b

Finetuned

(1)

this model

ak3ra
/

sunflower-qa-cactus-int4