Instructions to use litert-community/gemma-4-E2B-it-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use litert-community/gemma-4-E2B-it-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Model with QAT?
How can we have it in litertlm format?
It's pointing to mobile models, not here: https://huggingface.co/collections/google/gemma-4-qat-mobile
It's pointing to mobile models, not here: https://huggingface.co/collections/google/gemma-4-qat-mobile
can we convert it to litertlm?
The .litertlm models on this card already use the QAT that is discussed in the blog post. The most popular file, gemma-4-E2B-it.litertlm, uses a mixture of int2, int4 and int8 to keep it small, fast and efficient.
@marissaw Does it mean gemma 4 - E2B without audio engine can be executed and consume less than 1gb?
https://huggingface.co/developerabu/gemma-4-e2b-text-only-litertlm
I unpacked and repack only text weight
Thank you @developerabu !
@marissaw Does it mean gemma 4 - E2B without audio engine can be executed and consume less than 1gb?
The audio, vision and drafter models should all be loaded on-demand. When running the benchmarks for this model card, we ran in a text-only mode so none of the optional models should have been loaded. This means that the memory numbers in the model card should reflect what you are asking for.
The answer is, it depends on if you are running on CPU or GPU, which operating system you are using, which CPU/GPU vendor(s) your device has and how you define memory usage. For example, the model running on S26 Ultra on GPU only uses 676 MB of rusage::ru_maxrss. However, there are other device set ups and definitions of memory usage that could cause the memory to be higher than 1 GB. I'd recommend looking at the model card for more information.
Also, the memory usage depends on how long of a context length you would like to use.