Instructions to use litert-community/gemma-4-E2B-it-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use litert-community/gemma-4-E2B-it-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -20,8 +20,90 @@ Main Model Card: [google/gemma-4-E2B-it](https://huggingface.co/google/gemma-4-E
|
|
| 20 |
|
| 21 |
## Build with Gemma 4 E2B and LiteRT-LM
|
| 22 |
|
|
|
|
|
|
|
| 23 |
## Gemma 4 E2B Performance on LiteRT-LM
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
<table border="1">
|
| 26 |
<tr>
|
| 27 |
<th style="text-align: left">Backend</th>
|
|
@@ -51,7 +133,124 @@ Main Model Card: [google/gemma-4-E2B-it](https://huggingface.co/google/gemma-4-E
|
|
| 51 |
<td><p style="text-align: right">TODO</p></td>
|
| 52 |
<td><p style="text-align: right">TODO</p></td>
|
| 53 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
</table>
|
| 56 |
|
| 57 |
## Gemma 4 E2B Performance on Web
|
|
|
|
| 20 |
|
| 21 |
## Build with Gemma 4 E2B and LiteRT-LM
|
| 22 |
|
| 23 |
+
Ready to integrate this into your product? Get started [here](https://ai.google.dev/edge/litert-lm/overview).
|
| 24 |
+
|
| 25 |
## Gemma 4 E2B Performance on LiteRT-LM
|
| 26 |
|
| 27 |
+
All benchmarks were taken using 1024 prefill tokens and 256 decode tokens with a context length of 2048 tokens via LiteRT-LM. The model can support up to 32k context length. The inference on CPU is accelerated via the LiteRT XNNPACK delegate with 4 threads. Time-to-first-token does not include load time. Benchmarks were run with caches enabled and initialized. During the first run, the latency and memory usage may differ. Model size is the size of the file on disk.
|
| 28 |
+
|
| 29 |
+
CPU memory was measured using, rusage::ru_maxrss on Android, Linux and Raspberry Pi, task_vm_info::phys_footprint on iOS and MacBook and process_memory_counters::PrivateUsage on Windows.
|
| 30 |
+
|
| 31 |
+
### Android
|
| 32 |
+
|
| 33 |
+
Benchmarked on S26 Ultra.
|
| 34 |
+
|
| 35 |
+
*Note: On [supported Android devices](https://developers.google.com/ml-kit), Gemma 4 is available through Android AI Core as [Gemini Nano](https://developer.android.com/ai/gemini-nano#architecture), which is the recommended path for production applications.*
|
| 36 |
+
|
| 37 |
+
<table border="1">
|
| 38 |
+
<tr>
|
| 39 |
+
<th style="text-align: left">Backend</th>
|
| 40 |
+
<th style="text-align: left">Quantization scheme</th>
|
| 41 |
+
<th style="text-align: left">Prefill (tokens/sec)</th>
|
| 42 |
+
<th style="text-align: left">Decode (tokens/sec)</th>
|
| 43 |
+
<th style="text-align: left">Time-to-first-token (sec)</th>
|
| 44 |
+
<th style="text-align: left">Model size (MB)</th>
|
| 45 |
+
<th style="text-align: left">CPU Memory (RSS in MB)</th>
|
| 46 |
+
<th></th>
|
| 47 |
+
</tr>
|
| 48 |
+
<tr>
|
| 49 |
+
<td><p style="text-align: left">CPU</p></td>
|
| 50 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 51 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 52 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 53 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 54 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 55 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 56 |
+
</tr>
|
| 57 |
+
<tr>
|
| 58 |
+
<td><p style="text-align: left">GPU</p></td>
|
| 59 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 60 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 61 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 62 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 63 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 64 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 65 |
+
</tr>
|
| 66 |
+
</table>
|
| 67 |
+
|
| 68 |
+
### iOS
|
| 69 |
+
|
| 70 |
+
Benchmarked on iPhone 17 Pro.
|
| 71 |
+
|
| 72 |
+
<table border="1">
|
| 73 |
+
<tr>
|
| 74 |
+
<th style="text-align: left">Backend</th>
|
| 75 |
+
<th style="text-align: left">Quantization scheme</th>
|
| 76 |
+
<th style="text-align: left">Prefill (tokens/sec)</th>
|
| 77 |
+
<th style="text-align: left">Decode (tokens/sec)</th>
|
| 78 |
+
<th style="text-align: left">Time-to-first-token (sec)</th>
|
| 79 |
+
<th style="text-align: left">Model size (MB)</th>
|
| 80 |
+
<th style="text-align: left">CPU Memory (RSS in MB)</th>
|
| 81 |
+
<th></th>
|
| 82 |
+
</tr>
|
| 83 |
+
<tr>
|
| 84 |
+
<td><p style="text-align: left">CPU</p></td>
|
| 85 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 86 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 87 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 88 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 89 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 90 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 91 |
+
</tr>
|
| 92 |
+
<tr>
|
| 93 |
+
<td><p style="text-align: left">GPU</p></td>
|
| 94 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 95 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 96 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 97 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 98 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 99 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 100 |
+
</tr>
|
| 101 |
+
</table>
|
| 102 |
+
|
| 103 |
+
### Linux
|
| 104 |
+
|
| 105 |
+
Benchmarked on NVIDIA GeForce RTX 4090.
|
| 106 |
+
|
| 107 |
<table border="1">
|
| 108 |
<tr>
|
| 109 |
<th style="text-align: left">Backend</th>
|
|
|
|
| 133 |
<td><p style="text-align: right">TODO</p></td>
|
| 134 |
<td><p style="text-align: right">TODO</p></td>
|
| 135 |
</tr>
|
| 136 |
+
</table>
|
| 137 |
+
|
| 138 |
+
### MacBook
|
| 139 |
+
|
| 140 |
+
Benchmarked on MacBook Pro M4.
|
| 141 |
|
| 142 |
+
<table border="1">
|
| 143 |
+
<tr>
|
| 144 |
+
<th style="text-align: left">Backend</th>
|
| 145 |
+
<th style="text-align: left">Quantization scheme</th>
|
| 146 |
+
<th style="text-align: left">Prefill (tokens/sec)</th>
|
| 147 |
+
<th style="text-align: left">Decode (tokens/sec)</th>
|
| 148 |
+
<th style="text-align: left">Time-to-first-token (sec)</th>
|
| 149 |
+
<th style="text-align: left">Model size (MB)</th>
|
| 150 |
+
<th style="text-align: left">CPU Memory (RSS in MB)</th>
|
| 151 |
+
<th></th>
|
| 152 |
+
</tr>
|
| 153 |
+
<tr>
|
| 154 |
+
<td><p style="text-align: left">CPU</p></td>
|
| 155 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 156 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 157 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 158 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 159 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 160 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 161 |
+
</tr>
|
| 162 |
+
<tr>
|
| 163 |
+
<td><p style="text-align: left">GPU</p></td>
|
| 164 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 165 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 166 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 167 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 168 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 169 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 170 |
+
</tr>
|
| 171 |
+
</table>
|
| 172 |
+
|
| 173 |
+
### Windows
|
| 174 |
+
|
| 175 |
+
<table border="1">
|
| 176 |
+
<tr>
|
| 177 |
+
<th style="text-align: left">Backend</th>
|
| 178 |
+
<th style="text-align: left">Quantization scheme</th>
|
| 179 |
+
<th style="text-align: left">Prefill (tokens/sec)</th>
|
| 180 |
+
<th style="text-align: left">Decode (tokens/sec)</th>
|
| 181 |
+
<th style="text-align: left">Time-to-first-token (sec)</th>
|
| 182 |
+
<th style="text-align: left">Model size (MB)</th>
|
| 183 |
+
<th style="text-align: left">CPU Memory (RSS in MB)</th>
|
| 184 |
+
<th></th>
|
| 185 |
+
</tr>
|
| 186 |
+
<tr>
|
| 187 |
+
<td><p style="text-align: left">CPU</p></td>
|
| 188 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 189 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 190 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 191 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 192 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 193 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 194 |
+
</tr>
|
| 195 |
+
<tr>
|
| 196 |
+
<td><p style="text-align: left">GPU</p></td>
|
| 197 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 198 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 199 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 200 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 201 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 202 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 203 |
+
</tr>
|
| 204 |
+
</table>
|
| 205 |
+
|
| 206 |
+
### IoT
|
| 207 |
+
|
| 208 |
+
Raspberry Pi 5 16GB
|
| 209 |
+
|
| 210 |
+
<table border="1">
|
| 211 |
+
<tr>
|
| 212 |
+
<th style="text-align: left">Backend</th>
|
| 213 |
+
<th style="text-align: left">Quantization scheme</th>
|
| 214 |
+
<th style="text-align: left">Prefill (tokens/sec)</th>
|
| 215 |
+
<th style="text-align: left">Decode (tokens/sec)</th>
|
| 216 |
+
<th style="text-align: left">Time-to-first-token (sec)</th>
|
| 217 |
+
<th style="text-align: left">Model size (MB)</th>
|
| 218 |
+
<th style="text-align: left">CPU Memory (RSS in MB)</th>
|
| 219 |
+
<th></th>
|
| 220 |
+
</tr>
|
| 221 |
+
<tr>
|
| 222 |
+
<td><p style="text-align: left">CPU</p></td>
|
| 223 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 224 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 225 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 226 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 227 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 228 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 229 |
+
</tr>
|
| 230 |
+
</table>
|
| 231 |
+
|
| 232 |
+
Qualcomm IQ-8275 EVK
|
| 233 |
+
|
| 234 |
+
<table border="1">
|
| 235 |
+
<tr>
|
| 236 |
+
<th style="text-align: left">Backend</th>
|
| 237 |
+
<th style="text-align: left">Quantization scheme</th>
|
| 238 |
+
<th style="text-align: left">Prefill (tokens/sec)</th>
|
| 239 |
+
<th style="text-align: left">Decode (tokens/sec)</th>
|
| 240 |
+
<th style="text-align: left">Time-to-first-token (sec)</th>
|
| 241 |
+
<th style="text-align: left">Model size (MB)</th>
|
| 242 |
+
<th style="text-align: left">CPU Memory (RSS in MB)</th>
|
| 243 |
+
<th></th>
|
| 244 |
+
</tr>
|
| 245 |
+
<tr>
|
| 246 |
+
<td><p style="text-align: left">CPU</p></td>
|
| 247 |
+
<td><p style="text-align: left">TODO</p></td>
|
| 248 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 249 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 250 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 251 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 252 |
+
<td><p style="text-align: right">TODO</p></td>
|
| 253 |
+
</tr>
|
| 254 |
</table>
|
| 255 |
|
| 256 |
## Gemma 4 E2B Performance on Web
|