G4-E4B-Musica-v1 LiteRT-LM

Quantized LiteRT-LM export of AuriAetherwiing/G4-E4B-Musica-v1 for on-device text-generation experiments.

This is a text-only export. The source checkpoint is a Gemma4 conditional generation model with image/audio/video configuration, but this upload contains the LiteRT Hugging Face text-generation export path only.

Use

Use model.litertlm with LiteRT-LM:

litert-lm run \
  --from-huggingface-repo=allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite \
  model.litertlm \
  --prompt="Write a short musical scene."

The separate .tflite files are included for inspection or custom runtimes. They are not the easiest entry point for LiteRT-LM.

Files

  • model.litertlm - bundled LiteRT-LM package containing the quantized prefill/decode model, token embedder, Gemma4 per-layer embedder, tokenizer, and LLM metadata.
  • model_quantized.tflite - prefill/decode model, signatures prefill_64 and decode.
  • embedder_quantized.tflite - token embedder, signatures prefill_embedder_64 and decode_embedder.
  • per_layer_embedder_quantized.tflite - Gemma4 per-layer embedder, signatures prefill_per_layer_embedder_64 and decode_per_layer_embedder.
  • tokenizer.json, tokenizer_config.json, chat_template.jinja - tokenizer and prompt formatting assets.
  • source_config.json, source_generation_config.json, source_processor_config.json - source model config snapshots used for conversion reference.
  • conversion_info.json - command, versions, and validation notes.

Conversion

Converted locally with litert_torch.generative.export_hf from litert-torch 0.10.0 using:

python -m litert_torch.generative.export_hf \
  agent_workspace/g4-e4b-musica-v1-full \
  agent_workspace/g4-e4b-musica-v1-litert-text \
  --task=text_generation \
  --externalize_embedder=True \
  '--prefill_lengths=[64]' \
  --cache_length=256 \
  --bundle_litert_lm=False \
  --keep_temporary_files=True

Quantization recipe: dynamic_wi8_afp32.

The .litertlm package was built from the quantized TFLite artifacts with a chunked-copy workaround for a current Python builder issue where one-shot os.sendfile can truncate sections larger than 2 GiB.

The bundled chat template is the official LiteRT-LM Gemma4 template from litert-community/gemma-4-E4B-it-litert-lm. The source model template used Jinja map method calls that are not accepted by the mobile LiteRT-LM template evaluator.

Validation

The exported quantized files were loaded with ai_edge_litert.interpreter.Interpreter, and their expected signatures were present. No end-to-end generation quality evaluation was run after conversion.

The model.litertlm header was parsed after packaging. It contains five sections: LLM metadata, compressed Hugging Face tokenizer, quantized prefill/decode model, quantized token embedder, and quantized per-layer embedder. The TFLite section byte sizes match the source component files.

The bundled LLM metadata was checked to ensure the template no longer contains map.get calls.

Limitations

  • Text-generation export only.
  • Manual .tflite runtime integration must wire the external token embedder and Gemma4 per-layer embedder together with the main prefill/decode model. This is already packaged inside model.litertlm.
Downloads last month
92
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for allura-forge/G4-E4B-Musica-v1-LiteRT-TFLite

Quantized
(6)
this model