Update README.md
Browse files
README.md
CHANGED
|
@@ -36,24 +36,24 @@ base_model:
|
|
| 36 |
|
| 37 |
# Sunflower QA · Cactus INT4
|
| 38 |
|
| 39 |
-
On-device Cactus quantization of [
|
| 40 |
|
| 41 |
A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.
|
| 42 |
|
| 43 |
## What's in the bundle
|
| 44 |
|
| 45 |
-
This is a
|
| 46 |
|
| 47 |
Quantization recipe:
|
| 48 |
|
| 49 |
| Component | Precision |
|
| 50 |
|---|---|
|
| 51 |
| Decoder weights | INT4 |
|
| 52 |
-
| Audio tower (Gemma 4 native) |
|
| 53 |
| Vision tower | Removed |
|
| 54 |
| Embeddings / LM head | INT4 |
|
| 55 |
|
| 56 |
-
Gemma 4's audio tower is precision-sensitive
|
| 57 |
|
| 58 |
## Languages
|
| 59 |
|
|
@@ -67,20 +67,19 @@ Gemma 4's audio tower is precision-sensitive — quantizing it down collapses Lu
|
|
| 67 |
| `ach` | Acholi |
|
| 68 |
| `teo` | Ateso |
|
| 69 |
|
|
|
|
|
|
|
| 70 |
## Intended use
|
| 71 |
|
| 72 |
-
|
| 73 |
-
- Spoken-language input via Gemma 4's audio tower (16 kHz mono PCM).
|
| 74 |
-
- Text output in the user's chosen language.
|
| 75 |
-
- Fully on-device — designed for mid-range Tecno / Infinix-class Android devices.
|
| 76 |
|
| 77 |
## How to use
|
| 78 |
|
| 79 |
-
This checkpoint is
|
| 80 |
|
| 81 |
### In the Sunflower app
|
| 82 |
|
| 83 |
-
Open the model picker in the Sunflower app and select **Speech Q&A**
|
| 84 |
|
| 85 |
### Direct via Cactus FFI
|
| 86 |
|
|
@@ -107,7 +106,7 @@ final response = await cactus.completion(
|
|
| 107 |
);
|
| 108 |
```
|
| 109 |
|
| 110 |
-
The system prompt above is the
|
| 111 |
|
| 112 |
### Prompt routing per mode
|
| 113 |
|
|
@@ -118,7 +117,7 @@ The system prompt above is the **exact** string the base model was trained with.
|
|
| 118 |
| Translate | Educational assistant string above | `"Translate this audio into {target language}."` |
|
| 119 |
| Explain | Educational assistant string above | `"Explain what was said in this audio."` |
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
## Performance
|
| 124 |
|
|
@@ -132,22 +131,31 @@ Measured on Pixel 10 CPU, five-second voice turn:
|
|
| 132 |
|
| 133 |
## Training
|
| 134 |
|
| 135 |
-
This is a quantization, not a retrain. For training data, methodology, and
|
| 136 |
|
| 137 |
## Limitations
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
-
|
|
|
|
|
|
|
| 144 |
|
| 145 |
## Acknowledgements
|
| 146 |
|
| 147 |
-
|
| 148 |
-
- **Inference engine**: [Cactus](https://github.com/cactus-compute/cactus) — day-one Gemma 4 deployment partner with ARM-optimised kernels.
|
| 149 |
-
- **Foundation model**: Google's Gemma 4 E2B.
|
| 150 |
-
- **App, audio-tower preservation pattern, and Cactus quantization**: [Sunbird AI](https://sunbird.ai).
|
| 151 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
|
| 153 |
Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.
|
|
|
|
| 36 |
|
| 37 |
# Sunflower QA · Cactus INT4
|
| 38 |
|
| 39 |
+
On-device Cactus quantization of [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b), our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the [Sunflower educational assistant app](https://github.com/SunbirdAI/sunflower-app).
|
| 40 |
|
| 41 |
A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.
|
| 42 |
|
| 43 |
## What's in the bundle
|
| 44 |
|
| 45 |
+
This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the [Cactus](https://github.com/cactus-compute/cactus) FFI on Android, iOS, and desktop.
|
| 46 |
|
| 47 |
Quantization recipe:
|
| 48 |
|
| 49 |
| Component | Precision |
|
| 50 |
|---|---|
|
| 51 |
| Decoder weights | INT4 |
|
| 52 |
+
| Audio tower (Gemma 4 native) | FP16, preserved |
|
| 53 |
| Vision tower | Removed |
|
| 54 |
| Embeddings / LM head | INT4 |
|
| 55 |
|
| 56 |
+
Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at **~3.8 GB on disk**, with native audio understanding intact.
|
| 57 |
|
| 58 |
## Languages
|
| 59 |
|
|
|
|
| 67 |
| `ach` | Acholi |
|
| 68 |
| `teo` | Ateso |
|
| 69 |
|
| 70 |
+
For per-language quality tiers (Luganda strongest, Acholi second), see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
|
| 71 |
+
|
| 72 |
## Intended use
|
| 73 |
|
| 74 |
+
Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda.
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
## How to use
|
| 77 |
|
| 78 |
+
This checkpoint is Cactus-format only. For transformers, use the [Sunbird Tutor base model](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
|
| 79 |
|
| 80 |
### In the Sunflower app
|
| 81 |
|
| 82 |
+
Open the model picker in the Sunflower app and select **Speech Q&A**. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.
|
| 83 |
|
| 84 |
### Direct via Cactus FFI
|
| 85 |
|
|
|
|
| 106 |
);
|
| 107 |
```
|
| 108 |
|
| 109 |
+
The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim.
|
| 110 |
|
| 111 |
### Prompt routing per mode
|
| 112 |
|
|
|
|
| 117 |
| Translate | Educational assistant string above | `"Translate this audio into {target language}."` |
|
| 118 |
| Explain | Educational assistant string above | `"Explain what was said in this audio."` |
|
| 119 |
|
| 120 |
+
The canonical runtime strings live in [`lib/model_settings_sheet.dart`](https://github.com/SunbirdAI/sunflower-app/blob/master/lib/model_settings_sheet.dart) inside the Sunflower app.
|
| 121 |
|
| 122 |
## Performance
|
| 123 |
|
|
|
|
| 131 |
|
| 132 |
## Training
|
| 133 |
|
| 134 |
+
This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b) and the [training repository](https://github.com/SunbirdAI/sunbird-tutor-modelling).
|
| 135 |
|
| 136 |
## Limitations
|
| 137 |
|
| 138 |
+
Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input.
|
| 139 |
+
|
| 140 |
+
## Related artifacts
|
| 141 |
+
|
| 142 |
+
- [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b): transformers-format base model.
|
| 143 |
+
- [SunbirdAI/sunbird-tutor-modelling](https://github.com/SunbirdAI/sunbird-tutor-modelling): training code, data pipeline, evaluation harness.
|
| 144 |
+
- [SunbirdAI/sunflower-app](https://github.com/SunbirdAI/sunflower-app): Android app that ships this bundle on-device.
|
| 145 |
|
| 146 |
## Acknowledgements
|
| 147 |
|
| 148 |
+
Built by the Sunbird AI team. Base model: [Sunbird Tutor](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). Foundation model: Google's Gemma 4 E2B. Inference engine: [Cactus](https://github.com/cactus-compute/cactus).
|
|
|
|
|
|
|
|
|
|
| 149 |
|
| 150 |
+
## Citation
|
| 151 |
+
|
| 152 |
+
```bibtex
|
| 153 |
+
@misc{sunflower-qa-cactus-2026,
|
| 154 |
+
author = {Sunbird AI},
|
| 155 |
+
title = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)},
|
| 156 |
+
year = {2026},
|
| 157 |
+
url = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4}
|
| 158 |
+
}
|
| 159 |
+
```
|
| 160 |
|
| 161 |
Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.
|