LiteRT-LM

Inquiry about TPU support on Pixel 10 (Tensor G5) for Gemma 4 E2B

#32
by YoshiJPN - opened
LiteRT Community (FKA TFLite) org

Hi LiteRT Community team,

First of all, thank you so much for your continuous work and for maintaining this repository!

I recently noticed that the "gemma-4-E2B-it_Google_Tensor_G5.litertlm" file has been uploaded here.
Since the filename specifically mentions "Tensor G5," I was wondering: does this mean it is currently possible to utilize the TPU via LiteRT-LM on the Pixel 10 series?

I have been trying to run Gemma 4 E2B on a Pixel 10, but I was quite surprised to find that the GPU and CPU performance is lower than I initially expected.
I am highly anticipating that leveraging the TPU will significantly improve the inference speed.
If TPU execution is indeed supported for this model on the Pixel 10, could you please share some guidance, examples, or documentation on how to properly set it up and use it?

Thank you in advance for your time and help!

LiteRT Community (FKA TFLite) org

It's great to hear your interests in running Gemma on Pixel 10 TPU

For a general instruction of how to set up LiteRT-LM, see it here: https://developers.google.com/edge/litert/next/litert_lm_npu#tensor

Yes Tensor_G5 is pixel 10. You can download the model and run it on your device. Download the file, put it in a folder, I named mine models, located at /storage/emulated/0/Models, then in Google edge gallery at the top left select the icon, then select models, scroll down to the bottom, select import, then select only the npu as the accelerator and upload the model. Then select ai chat, you should see Gemma4 E2b G5 as an option, select it and try it out.

LiteRT Community (FKA TFLite) org

Thanks for the quick reply.
I'll give it a try!

LiteRT Community (FKA TFLite) org
edited 10 days ago

Hi,
I am testing the Google AI Edge Gallery app on a Pixel 10 Pro via Remote Device Streaming.

App crash on Pixel 10 Pro (Tensor G5) with Gemma-4-E2B - Missing NPU Dispatch Library

I followed the instructions to import the Gemma-4-E2B model tailored for the Google Tensor G5 (gemma-4-E2B-it_Google_Tensor_G5.litertlm) and attempted to run it using the NPU as the accelerator.
However, the app crashes immediately when starting the AI Chat.

Symptom & Logcat Analysis

While the model seems to import and load successfully, the crash occurs when initializing the NPU delegate.
The logcat indicates that the dispatch library for the Tensor G5 NPU cannot be found:

06-15 06:26:47.859 23088 23105 I native  : I0000 00:00:1781530007.859187  23105 model_resources_litert_lm.cc:68] litert model size: 1899251616
06-15 06:26:47.863 23088 23105 I tflite  : Initialized TensorFlow Lite runtime.
06-15 06:26:47.865 23088 23105 E litert  : [litert_dispatch.cc:112] No dispatch library found in /data/app/~~AySovX2HxNrJ5U6SGpouTA==/com.google.ai.edge. 
  gallery-GZDDTSZ18cF9xUczq21X4A==/lib/arm64  
06-15 06:26:47.867 23088 23105 E litert  : [dispatch_delegate.cc:115] Failed to initialize Dispatch API: ERROR:
  [third_party/odml/litert/litert/runtime/dispatch/dispatch_delegate.cc:176]  
06-15 06:26:47.867 23088 23105 E litert  : [dispatch_delegate.cc:130] Failed to create a dispatch delegate kernel: No usable Dispatch runtime found
06-15 06:26:47.867 23088 23105 F libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 23105 (DefaultDispatch), pid 23088 (ai.edge.gallery)

Feedback & Questions

I reviewed the documentation regarding LiteRT-LM NPU compilation for Tensor https://developers.google.com/edge/litert/next/litert_lm_npu#tensor, which mentions building the Dispatch API.
However, manually building the shared library ( libLiteRtDispatch_GoogleTensor.so ) and injecting it into the gallery app is extremely difficult for general testing.
I actually attempted this build process earlier but could not successfully complete it.

  1. Is the Tensor G5 dispatch library supposed to be pre-packaged in the Google AI Edge Gallery app?
  2. If not, could you provide an updated APK/build of the AI Edge Gallery that already contains the Tensor G5 dispatch libraries?
  3. Or, is there a simpler way to verify Gemma-4-E2B NPU execution on the Pixel 10 Pro without having to manually build and bundle the dispatch library?

Thank you for your help.

Best regards,

This guy made a fork of edge gallery that should run your g5 models, Google Tensor runtime is packaged in his app - https://github.com/jegly/Box

LiteRT Community (FKA TFLite) org

Thank you!
I've been keeping an eye on that repository.
I'll give it a try.

LiteRT Community (FKA TFLite) org

@xThr45hx
Thank you for suggesting the jegly/box repo.

I attempted to load the gemma-4-E2B-it_Google_Tensor_G5.litertlm model using it, but I encountered an issue:
I could only select "CPU" as the accelerator.
The "NPU" option was not available for selection.

Are there any specific configurations or extra steps required to enable NPU execution in this app?
Also, may I ask if you have had the chance to successfully test this specific setup on the NPU yourself?

Any insights would be greatly appreciated.
Thanks again for your help!

In box app, on main screen top right you selected import, then at the bottom of the screen import model from local model file, select you model folder, select your model, select your settings.

I suggest trying Gemma3 1b It g5 as a test, only select npu as accelerator, nothing else and see if that works. Make sure you deselect CPU.

Gemma4 e2b it is multimodal meaning it has more than one model and is a more complicated compile. I'm not sure if any other model in their litertlm container is npu accelerated other than their prefill_decode, some parts may use GPU, some CPU, my only other suggestion would be try every combination, Npu only, npu/GPU, npu/CPU, npu/GPU/CPU.

I personally don't have a pixel 10. I have a pixel 9 pro xl and have been experimenting with compiling my own npu models, Google tensor ml sdks compiler is also very picky and it's a beta, I have only managed to compile a few models myself. I may a embeddinggemma g4 and too many different variations of gemma3 1b it, on my pixel 9 models more than 24 layers (gemma3 has 26) error no matter what I try. And it's very picky with quants. I have a working and coherent AoT compiled 24 layer w4w8 mixed g4 with stock satetensor (mobile uses 4bit qat) and its slower than stock GPU/CPU but that's the furthest I've gotten with it and it required - rank-2
FC surgery (darwinn op-set fix) a work around Claude made for a compiler bug (not sure if anyone publicly besides me got around the compile bug).
My only working fully 26l NPU model uses JiT mix4blk8 recipe with 4bit qat weights, I only select npu accelerator, only 33 percent of ops get npu delegation the rest fallback to CPU. (JiT on Google's office docs says not supported with Google tensor).

So other than testing in Litertlm cli, Box is the easiest app with the runtimes included. With I could help you more.

LiteRT Community (FKA TFLite) org

Thank you for the detailed comment! It was really helpful and insightful.

I haven't been able to get the G5-compatible litertlm file running just yet.
Since my main goal is simply to utilize the TPU/NPU for computations, I decided to pivot slightly and use the AICore beta instead.

I found the Box app a bit tricky to use for my needs, so I ended up making some slight modifications to the official gallery app.
So far, I've successfully confirmed it working on both a Pixel 9 Pro Fold and a Galaxy S26.

That being said, using it doesn't necessarily result in a massive speedup, which was a bit surprising.
Still, it was quite an interesting result in its own right!

I will go ahead and close this thread now.
Thanks again for your time and help!

YoshiJPN changed discussion status to closed
LiteRT Community (FKA TFLite) org

The AI Edge Gallery app just released support for Pixel 10 TPU today. The Gemma 3 1B and Function models should have Pixel 10 TPU support. The Gemma 4 E2B model can also run on Pixel 10 TPU it should be side loaded by a HuggingFace URL. This side loading support was also released today.

Sign up or log in to comment