ak3ra commited on
Commit
a6e8a24
·
verified ·
1 Parent(s): 9e597bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -22
README.md CHANGED
@@ -36,24 +36,24 @@ base_model:
36
 
37
  # Sunflower QA · Cactus INT4
38
 
39
- On-device Cactus quantization of [`https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b`](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). A Gemma 4 E2B speech-Q&A fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the [Sunflower educational assistant app](https://github.com/SunbirdAI/sunflower_app).
40
 
41
  A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.
42
 
43
  ## What's in the bundle
44
 
45
- This is a **Cactus-format** quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the [Cactus](https://github.com/cactus-compute/cactus) FFI on Android, iOS, and desktop.
46
 
47
  Quantization recipe:
48
 
49
  | Component | Precision |
50
  |---|---|
51
  | Decoder weights | INT4 |
52
- | Audio tower (Gemma 4 native) | **FP16, preserved** |
53
  | Vision tower | Removed |
54
  | Embeddings / LM head | INT4 |
55
 
56
- Gemma 4's audio tower is precision-sensitive — quantizing it down collapses Luganda speech recognition. We kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. Result: bundle shrinks by about a third, landing at **~3.8 GB on disk**, while native audio understanding stays intact.
57
 
58
  ## Languages
59
 
@@ -67,20 +67,19 @@ Gemma 4's audio tower is precision-sensitive — quantizing it down collapses Lu
67
  | `ach` | Acholi |
68
  | `teo` | Ateso |
69
 
 
 
70
  ## Intended use
71
 
72
- - Educational Q&A for primary/secondary classroom science topics in Ugandan languages.
73
- - Spoken-language input via Gemma 4's audio tower (16 kHz mono PCM).
74
- - Text output in the user's chosen language.
75
- - Fully on-device — designed for mid-range Tecno / Infinix-class Android devices.
76
 
77
  ## How to use
78
 
79
- This checkpoint is **Cactus-format only**. For transformers, use the base model [`Sunbird/sunbirdtutor-gemma-4-e2b`](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
80
 
81
  ### In the Sunflower app
82
 
83
- Open the model picker in the Sunflower app and select **Speech Q&A** — this bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.
84
 
85
  ### Direct via Cactus FFI
86
 
@@ -107,7 +106,7 @@ final response = await cactus.completion(
107
  );
108
  ```
109
 
110
- The system prompt above is the **exact** string the base model was trained with. Drift here degrades quality — use it verbatim.
111
 
112
  ### Prompt routing per mode
113
 
@@ -118,7 +117,7 @@ The system prompt above is the **exact** string the base model was trained with.
118
  | Translate | Educational assistant string above | `"Translate this audio into {target language}."` |
119
  | Explain | Educational assistant string above | `"Explain what was said in this audio."` |
120
 
121
- See [`lib/model_settings_sheet.dart`](https://github.com/SunbirdAI/sunflower_app/blob/master/lib/model_settings_sheet.dart) in the Sunflower repo for the canonical strings.
122
 
123
  ## Performance
124
 
@@ -132,22 +131,31 @@ Measured on Pixel 10 CPU, five-second voice turn:
132
 
133
  ## Training
134
 
135
- This is a quantization, not a retrain. For training data, methodology, and base-model evaluation, see the [base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
136
 
137
  ## Limitations
138
 
139
- - Answer quality on novel topics tracks the base model — quantization does not change behaviour, only footprint.
140
- - Decoder-side INT4 introduces small drift on long contexts; for contexts past a couple of thousand tokens, prefer the FP16 base.
141
- - Audio inference assumes clean 16 kHz mono PCM; robustness to heavy classroom background noise has not been benchmarked.
142
- - Token-level repetition can occur on out-of-distribution questions — a known base-model characteristic, not introduced by the quant.
143
- - Vision is removed. This bundle cannot accept image input.
 
 
144
 
145
  ## Acknowledgements
146
 
147
- - **Base model**: [jq](https://huggingface.co/jq) — Gemma 4 E2B speech-QA fine-tune.
148
- - **Inference engine**: [Cactus](https://github.com/cactus-compute/cactus) — day-one Gemma 4 deployment partner with ARM-optimised kernels.
149
- - **Foundation model**: Google's Gemma 4 E2B.
150
- - **App, audio-tower preservation pattern, and Cactus quantization**: [Sunbird AI](https://sunbird.ai).
151
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.
 
36
 
37
  # Sunflower QA · Cactus INT4
38
 
39
+ On-device Cactus quantization of [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b), our Gemma 4 E2B speech-QA fine-tune covering English and six Ugandan languages. Built to run fully offline on mid-range Android phones inside the [Sunflower educational assistant app](https://github.com/SunbirdAI/sunflower-app).
40
 
41
  A child taps the mic, asks a science question in Luganda, Acholi, Ateso, Lusoga, Lunyoro, or Runyankole, and gets an answer streaming back in the same language. One model, one forward pass, no internet required.
42
 
43
  ## What's in the bundle
44
 
45
+ This is a Cactus-format quantization, not a transformers checkpoint. The bundle is a packed binary plus tokenizer metadata, ready to load through the [Cactus](https://github.com/cactus-compute/cactus) FFI on Android, iOS, and desktop.
46
 
47
  Quantization recipe:
48
 
49
  | Component | Precision |
50
  |---|---|
51
  | Decoder weights | INT4 |
52
+ | Audio tower (Gemma 4 native) | FP16, preserved |
53
  | Vision tower | Removed |
54
  | Embeddings / LM head | INT4 |
55
 
56
+ Gemma 4's audio tower is precision-sensitive. Quantizing it down collapses Luganda speech recognition, so we kept it at FP16, dropped the vision tower entirely, and pushed everything else to four bits. The bundle is about a third smaller than the FP16 base, landing at **~3.8 GB on disk**, with native audio understanding intact.
57
 
58
  ## Languages
59
 
 
67
  | `ach` | Acholi |
68
  | `teo` | Ateso |
69
 
70
+ For per-language quality tiers (Luganda strongest, Acholi second), see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
71
+
72
  ## Intended use
73
 
74
+ Educational Q&A for primary school science topics in Ugandan languages, with spoken-language input via Gemma 4's audio tower (16 kHz mono PCM) and text output in the user's chosen language. Designed to run fully on-device on the kind of mid-range Tecno and Infinix Android phones common in Uganda.
 
 
 
75
 
76
  ## How to use
77
 
78
+ This checkpoint is Cactus-format only. For transformers, use the [Sunbird Tutor base model](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b).
79
 
80
  ### In the Sunflower app
81
 
82
+ Open the model picker in the Sunflower app and select **Speech Q&A**. This bundle is the default. First-launch download is ~3.8 GB; everything after that is offline.
83
 
84
  ### Direct via Cactus FFI
85
 
 
106
  );
107
  ```
108
 
109
+ The system prompt above is the exact string the base model was trained with. Drift here degrades quality, so use it verbatim.
110
 
111
  ### Prompt routing per mode
112
 
 
117
  | Translate | Educational assistant string above | `"Translate this audio into {target language}."` |
118
  | Explain | Educational assistant string above | `"Explain what was said in this audio."` |
119
 
120
+ The canonical runtime strings live in [`lib/model_settings_sheet.dart`](https://github.com/SunbirdAI/sunflower-app/blob/master/lib/model_settings_sheet.dart) inside the Sunflower app.
121
 
122
  ## Performance
123
 
 
131
 
132
  ## Training
133
 
134
+ This is a quantization, not a retrain. For training data, methodology, and per-language evaluation numbers, see the [Sunbird Tutor base model card](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b) and the [training repository](https://github.com/SunbirdAI/sunbird-tutor-modelling).
135
 
136
  ## Limitations
137
 
138
+ Answer quality on novel topics tracks the base model; quantization does not change behaviour, only footprint. Decoder-side INT4 introduces small drift on long contexts, so for contexts past a couple of thousand tokens prefer the FP16 base. Audio inference assumes clean 16 kHz mono PCM, and robustness to heavy classroom background noise has not been formally benchmarked. Token-level repetition can occur on out-of-distribution questions; this is a known base-model characteristic, not introduced by the quant. Vision is removed, so this bundle cannot accept image input.
139
+
140
+ ## Related artifacts
141
+
142
+ - [Sunbird/sunbirdtutor-gemma-4-e2b](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b): transformers-format base model.
143
+ - [SunbirdAI/sunbird-tutor-modelling](https://github.com/SunbirdAI/sunbird-tutor-modelling): training code, data pipeline, evaluation harness.
144
+ - [SunbirdAI/sunflower-app](https://github.com/SunbirdAI/sunflower-app): Android app that ships this bundle on-device.
145
 
146
  ## Acknowledgements
147
 
148
+ Built by the Sunbird AI team. Base model: [Sunbird Tutor](https://huggingface.co/Sunbird/sunbirdtutor-gemma-4-e2b). Foundation model: Google's Gemma 4 E2B. Inference engine: [Cactus](https://github.com/cactus-compute/cactus).
 
 
 
149
 
150
+ ## Citation
151
+
152
+ ```bibtex
153
+ @misc{sunflower-qa-cactus-2026,
154
+ author = {Sunbird AI},
155
+ title = {Sunflower QA: Cactus INT4 quantization of Sunbird Tutor (Gemma 4 E2B Speech-QA)},
156
+ year = {2026},
157
+ url = {https://huggingface.co/Sunbird/sunflower-qa-cactus-int4}
158
+ }
159
+ ```
160
 
161
  Built for the [Kaggle Gemma 4 Good Hackathon](https://www.kaggle.com/), 2026.