gemma-4-26B-A4B-it-MXFP4_MOE StorageLLM MoE JUJU Runtime

Original upstream model: https://huggingface.co/google/gemma-4-26B-A4B-it Quantized GGUF source repo: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF Quantized GGUF source path: gemma-4-26B-A4B-it-MXFP4_MOE.gguf Runtime code: https://github.com/jujumelona/storage.llm Runtime/model-card license: MIT

This Hugging Face repo is a StorageLLM MoE/JUJU runtime model package. It is not a PPL-only artifact bucket. PPL is only a correctness validation step after the runtime package is generated.

What Users Must Download

Download the whole Hugging Face repo for normal use. Do not download only *.juju; that gives the engine weights without the tokenizer, config, runtime metadata, validation files, and performance planning sidecars.

hf download storagejuju/gemma-4-26b-a4b-it-mxfp4-moe-juju --local-dir <model_root>

If you must use include filters, include every pattern below:

hf download storagejuju/gemma-4-26b-a4b-it-mxfp4-moe-juju --local-dir <model_root>   --include "*.juju"   --include "*.juju.idx"   --include "*.juju.verify.json"   --include "verify/*.json"   --include "runtime_assets_manifest.json"   --include "storagellm_runtime_contract.json"   --include "storagellm_performance_metadata_manifest.json"   --include "metadata/**"   --include "README.md"   --include "config.json"   --include "generation_config.json"   --include "tokenizer.json"   --include "tokenizer_config.json"   --include "special_tokens_map.json"   --include "added_tokens.json"   --include "chat_template.jinja"   --include "tokenizer.model"   --include "sentencepiece.bpe.model"   --include "tiktoken.model"   --include "vocab.json"   --include "merges.txt"   --include "processor_config.json"   --include "preprocessor_config.json"   --include "image_processor_config.json"   --include "feature_extractor.json"   --include "video_preprocessor_config.json"   --include "audio_config.json"   --include "tokenization_*.py"   --include "configuration_*.py"   --include "modeling_*.py"   --include "processing_*.py"   --include "*_processor.py"   --include "*_processing.py"   --include "*_utils.py"

The engine consumes structured sidecar JSON/YAML/TOML files during model_root load. They are not decoration: config, runtime manifests, graph/priority/prefetch/residency, QKV, offload policy, validation, and metadata JSON are merged into the runtime metadata path so tokenizer, attention, router, rope, embedding, GraphIR, tensor layout, KV, final norm, LM head, and planning code can see the same contract.

The runtime needs these groups:

JUJU package:

<original_shard_stem>.juju
<original_shard_stem>.juju.idx
<original_shard_stem>.juju.verify.json
runtime_assets_manifest.json
storagellm_runtime_contract.json

Text/API assets:

config.json
generation_config.json
tokenizer.json or tokenizer model file
tokenizer_config.json
chat_template.jinja
special_tokens_map.json and added_tokens.json when present

Processor/custom-code assets when the model needs them:

processor_config.json, preprocessor_config.json, image_processor_config.json
feature_extractor.json, video_preprocessor_config.json, audio_config.json
tokenization_*.py, configuration_*.py, modeling_*.py, processing_*.py, *_processor.py

Engine/runtime performance metadata:

metadata/storagellm/*run_summary*.json
metadata/gguf/*.json, metadata/safetensors/*.json, metadata/sidecar/*.json
verify/*.juju.verify.json
storagellm_performance_metadata_manifest.json

Generated sidecar upload policy:

Only README.md may be uploaded as Markdown; it is the Hugging Face model card.
README.md is not runtime metadata and is never used as an engine contract.
Runtime/performance sidecars are structured JSON/YAML/TOML only.
Generated analysis/performance .md, .pdf, .txt, .csv, .html, and .ipynb files are blocked from upload.

Runtime flags embedded in the JUJU contract: none.

Download note: download the full HF repo, not only *.juju, so the engine also gets tokenizer/config/chat-template/processor/custom-code assets and the StorageLLM performance metadata sidecars. The notebook sends SOURCE_HF_TOKEN when set, otherwise it uses the same HF_TOKEN used for upload. This keeps the flow working if the source repo becomes gated and the token has accepted access.

Downloads last month: 523

Model tree for storagejuju/gemma-4-26b-a4b-it-mxfp4-moe-juju

Base model

google/gemma-4-26B-A4B

Finetuned

google/gemma-4-26B-A4B-it

Finetuned

(114)

this model