--- license: gemma language: - en - lg - xog - nyn - nyo - ach - teo tags: - audio - multimodal - gemma - gemma-4 - cactus - on-device - mobile - ugandan-languages - low-resource-languages - speech-qa library_name: cactus pipeline_tag: audio-text-to-text datasets: - google/WaxalNLP - Sunbird/salt - google/fleurs metrics: - chrf - bleu - wer - cer base_model: - Sunbird/sunbirdtutor-gemma-4-e2b --- # Sunflower QA · Cactus INT4 On-device Cactus quantization of [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b), our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the [Sunflower educational assistant app](https://github.com/SunbirdAI/sunflower-app). A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required. ## What's in the bundle This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the [Cactus](https://github.com/cactus-compute/cactus) FFI on Android, iOS, and desktop. Quantization recipe: | Component | Precision | |---|---| | Decoder weights | INT4 | | Audio tower (Gemma 4 native) | FP16, preserved | | Vision tower | Removed | | Embeddings / LM head | INT4 | Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at **~3.8 GB on disk**, with native audio understanding intact. ## Languages | ISO 639-3 | Language | |---|---| | `eng` | English | | `lug` | Luganda | | `xog` | Lusoga | | `nyn` | Runyankole | | `nyo` | Lunyoro | | `ach` | Acholi | | `teo` | Ateso | For per-language quality tiers (Luganda strongest, Acholi second), see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). ## Intended use Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda. ## How to use This checkpoint is Cactus-format only. For transformers, use the [Sunbird Tutor base model](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). ### In the Sunflower app Open the model picker in the Sunflower app and select **Speech Q&A**. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline. ### Direct via Cactus FFI ```dart final cactus = Cactus(); await cactus.init( modelPath: '$documentsDir/models/sunflower-qa-cactus-int4/model.cactus', ); final response = await cactus.completion( messages: [ { 'role': 'system', 'content': 'You are an educational assistant that can give explanations, ' 'transcriptions and translations in Ugandan languages.', }, { 'role': 'user', 'content': '', 'audio_path': '/path/to/16khz_mono.wav', }, ], ); ``` The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim. ### Prompt routing per mode | Mode | System prompt | User content | | ---------------- | ---------------------------------- | ------------------------------------------------ | | Answer (default) | Educational assistant string above | empty (`""`) | | Transcribe | Educational assistant string above | `"Transcribe this audio."` | | Translate | Educational assistant string above | `"Translate this audio into {target language}."` | | Explain | Educational assistant string above | `"Explain what was said in this audio."` | The canonical runtime strings live in [`lib/model_settings_sheet.dart`](https://github.com/SunbirdAI/sunflower-app/blob/master/lib/model_settings_sheet.dart) inside the Sunflower app. ## Performance Measured on Pixel 10 CPU, five-second voice turn: | Metric | Value | | ----------------------------------- | ------ | | Time to first token | ~2 s | | End-to-end (audio in → text answer) | ~12 s | | Bundle size on disk | 3.8 GB | ## Training This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b) and the [training repository](https://github.com/SunbirdAI/sunbird-tutor-modelling). ## Limitations Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input. ## Related artifacts - [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b): transformers-format base model. - [SunbirdAI/sunbird-tutor-modelling](https://github.com/SunbirdAI/sunbird-tutor-modelling): training code, data pipeline, evaluation harness. - [SunbirdAI/sunflower-app](https://github.com/SunbirdAI/sunflower-app): Android app that ships this bundle on-device. ## Acknowledgements Built by the Sunbird AI team. Base model: [Sunbird Tutor](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). Foundation model: Google's Gemma 4 E2B. Inference engine: [Cactus](https://github.com/cactus-compute/cactus). ## Citation ```bibtex @misc{sunflower-qa-cactus-2026, author = {Sunbird AI}, title = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)}, year = {2026}, url = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4} } ``` Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.