Instructions to use allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite \ model.litertlm \ --prompt="Write me a poem"
- LiteRT
How to use allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
G4-E4B-Musica-v1 LiteRT-LM
Quantized LiteRT-LM export of AuriAetherwiing/G4-E4B-Musica-v1 for on-device text-generation experiments.
This is a text-only export. The source checkpoint is a Gemma4 conditional generation model with image/audio/video configuration, but this upload contains the LiteRT Hugging Face text-generation export path only.
Use
Use model.litertlm with LiteRT-LM:
litert-lm run \
--from-huggingface-repo=allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite \
model.litertlm \
--prompt="Write a short musical scene."
The separate .tflite files are included for inspection or custom runtimes. They are not the easiest entry point for LiteRT-LM.
Files
model.litertlm- bundled LiteRT-LM package containing the quantized prefill/decode model, token embedder, Gemma4 per-layer embedder, tokenizer, and LLM metadata.model_quantized.tflite- prefill/decode model, signaturesprefill_64anddecode.embedder_quantized.tflite- token embedder, signaturesprefill_embedder_64anddecode_embedder.per_layer_embedder_quantized.tflite- Gemma4 per-layer embedder, signaturesprefill_per_layer_embedder_64anddecode_per_layer_embedder.tokenizer.json,tokenizer_config.json,chat_template.jinja- tokenizer and prompt formatting assets.source_config.json,source_generation_config.json,source_processor_config.json- source model config snapshots used for conversion reference.conversion_info.json- command, versions, and validation notes.
Conversion
Converted locally with litert_torch.generative.export_hf from litert-torch 0.10.0 using:
python -m litert_torch.generative.export_hf \
agent_workspace/g4-e4b-musica-v1-full \
agent_workspace/g4-e4b-musica-v1-litert-text \
--task=text_generation \
--externalize_embedder=True \
'--prefill_lengths=[64]' \
--cache_length=256 \
--bundle_litert_lm=False \
--keep_temporary_files=True
Quantization recipe: dynamic_wi8_afp32.
The .litertlm package was built from the quantized TFLite artifacts with a chunked-copy workaround for a current Python builder issue where one-shot os.sendfile can truncate sections larger than 2 GiB.
The bundled chat template is the official LiteRT-LM Gemma4 template from litert-community/gemma-4-E4B-it-litert-lm. The source model template used Jinja map method calls that are not accepted by the mobile LiteRT-LM template evaluator.
Validation
The exported quantized files were loaded with ai_edge_litert.interpreter.Interpreter, and their expected signatures were present. No end-to-end generation quality evaluation was run after conversion.
The model.litertlm header was parsed after packaging. It contains five sections: LLM metadata, compressed Hugging Face tokenizer, quantized prefill/decode model, quantized token embedder, and quantized per-layer embedder. The TFLite section byte sizes match the source component files.
The bundled LLM metadata was checked to ensure the template no longer contains map.get calls.
Limitations
- Text-generation export only.
- Manual
.tfliteruntime integration must wire the external token embedder and Gemma4 per-layer embedder together with the main prefill/decode model. This is already packaged insidemodel.litertlm.
- Downloads last month
- 92
Model tree for allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite
Base model
google/gemma-4-E4B