ak3ra
/

sunflower-qa-cactus-int4

@@ -36,24 +36,24 @@ base_model:
 # Sunflower QA · Cactus INT4
-On-device Cactus quantization of [`https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b`](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). A Gemma 4 E2B speech-Q&A fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the [Sunflower educational assistant app](https://github.com/SunbirdAI/sunflower_app).
 A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.
 ## What's in the bundle
-This is a **Cactus-format** quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the [Cactus](https://github.com/cactus-compute/cactus) FFI on Android, iOS, and desktop.
 Quantization recipe:
 | Component | Precision |
 |---|---|
 | Decoder weights | INT4 |
-| Audio tower (Gemma 4 native) | **FP16, preserved** |
 | Vision tower | Removed |
 | Embeddings / LM head | INT4 |
-Gemma 4's audio tower is precision-sensitive — quantizing it down collapses Luganda speech recognition. We kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. Result: bundle shrinks by about a third, landing at **~3.8 GB on disk**, while native audio understanding stays intact.
 ## Languages
@@ -67,20 +67,19 @@ Gemma 4's audio tower is precision-sensitive — quantizing it down collapses Lu
 | `ach` | Acholi |
 | `teo` | Ateso |
 ## Intended use
-- Educational Q&A for primary/secondary classroom science topics in Ugandan languages.
-- Spoken-language input via Gemma 4's audio tower (16 kHz mono PCM).
-- Text output in the user's chosen language.
-- Fully on-device — designed for mid-range Tecno / Infinix-class Android devices.
 ## How to use
-This checkpoint is **Cactus-format only**. For transformers, use the base model [`Sunbird/sunbirdtutor-gemma-4-e2b`](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
 ### In the Sunflower app
-Open the model picker in the Sunflower app and select **Speech Q&A** — this bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.
 ### Direct via Cactus FFI
@@ -107,7 +106,7 @@ final response = await cactus.completion(
 );
 ```
-The system prompt above is the **exact** string the base model was trained with. Drift here degrades quality — use it verbatim.
 ### Prompt routing per mode
@@ -118,7 +117,7 @@ The system prompt above is the **exact** string the base model was trained with.
 | Translate        | Educational assistant string above | `"Translate this audio into {target language}."` |
 | Explain          | Educational assistant string above | `"Explain what was said in this audio."`         |
-See [`lib/model_settings_sheet.dart`](https://github.com/SunbirdAI/sunflower_app/blob/master/lib/model_settings_sheet.dart) in the Sunflower repo for the canonical strings.
 ## Performance
@@ -132,22 +131,31 @@ Measured on Pixel 10 CPU, five-second voice turn:
 ## Training
-This is a quantization, not a retrain. For training data, methodology, and base-model evaluation, see the [base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
 ## Limitations
-- Answer quality on novel topics tracks the base model — quantization does not change behaviour, only footprint.
-- Decoder-side INT4 introduces small drift on long contexts; for contexts past a couple of thousand tokens, prefer the FP16 base.
-- Audio inference assumes clean 16 kHz mono PCM; robustness to heavy classroom background noise has not been benchmarked.
-- Token-level repetition can occur on out-of-distribution questions — a known base-model characteristic, not introduced by the quant.
-- Vision is removed. This bundle cannot accept image input.
 ## Acknowledgements
-- **Base model**: [jq](https://huggingface.co/jq) — Gemma 4 E2B speech-QA fine-tune.
-- **Inference engine**: [Cactus](https://github.com/cactus-compute/cactus) — day-one Gemma 4 deployment partner with ARM-optimised kernels.
-- **Foundation model**: Google's Gemma 4 E2B.
-- **App, audio-tower preservation pattern, and Cactus quantization**: [Sunbird AI](https://sunbird.ai).
 Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.

 # Sunflower QA · Cactus INT4
+On-device Cactus quantization of [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b), our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the [Sunflower educational assistant app](https://github.com/SunbirdAI/sunflower-app).
 A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.
 ## What's in the bundle
+This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the [Cactus](https://github.com/cactus-compute/cactus) FFI on Android, iOS, and desktop.
 Quantization recipe:
 | Component | Precision |
 |---|---|
 | Decoder weights | INT4 |
+| Audio tower (Gemma 4 native) | FP16, preserved |
 | Vision tower | Removed |
 | Embeddings / LM head | INT4 |
+Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at **~3.8 GB on disk**, with native audio understanding intact.
 ## Languages
 | `ach` | Acholi |
 | `teo` | Ateso |
+For per-language quality tiers (Luganda strongest, Acholi second), see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
 ## Intended use
+Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda.
 ## How to use
+This checkpoint is Cactus-format only. For transformers, use the [Sunbird Tutor base model](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
 ### In the Sunflower app
+Open the model picker in the Sunflower app and select **Speech Q&A**. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.
 ### Direct via Cactus FFI
 );
 ```
+The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim.
 ### Prompt routing per mode
 | Translate        | Educational assistant string above | `"Translate this audio into {target language}."` |
 | Explain          | Educational assistant string above | `"Explain what was said in this audio."`         |
+The canonical runtime strings live in [`lib/model_settings_sheet.dart`](https://github.com/SunbirdAI/sunflower-app/blob/master/lib/model_settings_sheet.dart) inside the Sunflower app.
 ## Performance
 ## Training
+This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b) and the [training repository](https://github.com/SunbirdAI/sunbird-tutor-modelling).
 ## Limitations
+Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input.
+## Related artifacts
+- [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b): transformers-format base model.
+- [SunbirdAI/sunbird-tutor-modelling](https://github.com/SunbirdAI/sunbird-tutor-modelling): training code, data pipeline, evaluation harness.
+- [SunbirdAI/sunflower-app](https://github.com/SunbirdAI/sunflower-app): Android app that ships this bundle on-device.
 ## Acknowledgements
+Built by the Sunbird AI team. Base model: [Sunbird Tutor](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). Foundation model: Google's Gemma 4 E2B. Inference engine: [Cactus](https://github.com/cactus-compute/cactus).
+## Citation
+```bibtex
+@misc{sunflower-qa-cactus-2026,
+  author = {Sunbird AI},
+  title  = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)},
+  year   = {2026},
+  url    = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4}
+}
+```
 Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.