Instructions to use litert-community/gemma-4-E2B-it-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use litert-community/gemma-4-E2B-it-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -89,7 +89,7 @@ It uses the Gemma quantization scheme that employs a mixture of 2bit, 4bit and 8
|
|
| 89 |
|
| 90 |
## Gemma 4 E2B on Web
|
| 91 |
|
| 92 |
-
Running Gemma inference on the web is currently supported through [LLM Inference Engine](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js) and uses the *gemma-4-E2B-it-web.task* model file.
|
| 93 |
|
| 94 |
Benchmarked in Chrome on a MacBook Pro 2024 (Apple M4 Max) with 1024 prefill tokens and 256 decode tokens, but the model can support context lengths up to 128K.
|
| 95 |
|
|
|
|
| 89 |
|
| 90 |
## Gemma 4 E2B on Web
|
| 91 |
|
| 92 |
+
Running Gemma inference on the web is currently supported through [LLM Inference Engine](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js) and uses the *gemma-4-E2B-it-web.task* model file. Try it out [live in your browser](https://huggingface.co/spaces/tylermullen/Gemma4) (Chrome with WebGPU recommended). To start developing with it, download [the web model](https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/blob/main/gemma-4-E2B-it-web.task) and run with our [sample web page](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/js/README.md), or follow the [guide](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js) to add it to your own app.
|
| 93 |
|
| 94 |
Benchmarked in Chrome on a MacBook Pro 2024 (Apple M4 Max) with 1024 prefill tokens and 256 decode tokens, but the model can support context lengths up to 128K.
|
| 95 |
|