gemma-4-26B-A4B-it-MXFP4_MOE StorageLLM MoE JUJU Runtime

Original upstream model: https://huggingface.co/google/gemma-4-26B-A4B-it Quantized GGUF source repo: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF Quantized GGUF source path: gemma-4-26B-A4B-it-MXFP4_MOE.gguf Runtime code: https://github.com/jujumelona/storage.llm Runtime/model-card license: MIT

This Hugging Face repo is a StorageLLM MoE/JUJU runtime model package. It is not a PPL-only artifact bucket. PPL is only a correctness validation step after the runtime package is generated.

What Users Must Download

Download the whole Hugging Face repo for normal use. Do not download only *.juju; that gives the engine weights without the tokenizer, config, runtime metadata, validation files, and performance planning sidecars.

hf download storagejuju/gemma-4-26b-a4b-it-mxfp4-moe-juju --local-dir <model_root>

If you must use include filters, include every pattern below:

hf download storagejuju/gemma-4-26b-a4b-it-mxfp4-moe-juju --local-dir <model_root>   --include "*.juju"   --include "*.juju.idx"   --include "*.juju.verify.json"   --include "verify/*.json"   --include "runtime_assets_manifest.json"   --include "storagellm_runtime_contract.json"   --include "storagellm_performance_metadata_manifest.json"   --include "metadata/**"   --include "README.md"   --include "config.json"   --include "generation_config.json"   --include "tokenizer.json"   --include "tokenizer_config.json"   --include "special_tokens_map.json"   --include "added_tokens.json"   --include "chat_template.jinja"   --include "tokenizer.model"   --include "sentencepiece.bpe.model"   --include "tiktoken.model"   --include "vocab.json"   --include "merges.txt"   --include "processor_config.json"   --include "preprocessor_config.json"   --include "image_processor_config.json"   --include "feature_extractor.json"   --include "video_preprocessor_config.json"   --include "audio_config.json"   --include "tokenization_*.py"   --include "configuration_*.py"   --include "modeling_*.py"   --include "processing_*.py"   --include "*_processor.py"   --include "*_processing.py"   --include "*_utils.py"

The engine consumes structured sidecar JSON/YAML/TOML files during model_root load. They are not decoration: config, runtime manifests, graph/priority/prefetch/residency, QKV, offload policy, validation, and metadata JSON are merged into the runtime metadata path so tokenizer, attention, router, rope, embedding, GraphIR, tensor layout, KV, final norm, LM head, and planning code can see the same contract.

The runtime needs these groups:

JUJU package:

  • <original_shard_stem>.juju
  • <original_shard_stem>.juju.idx
  • <original_shard_stem>.juju.verify.json
  • runtime_assets_manifest.json
  • storagellm_runtime_contract.json

Text/API assets:

  • config.json
  • generation_config.json
  • tokenizer.json or tokenizer model file
  • tokenizer_config.json
  • chat_template.jinja
  • special_tokens_map.json and added_tokens.json when present

Processor/custom-code assets when the model needs them:

  • processor_config.json, preprocessor_config.json, image_processor_config.json
  • feature_extractor.json, video_preprocessor_config.json, audio_config.json
  • tokenization_*.py, configuration_*.py, modeling_*.py, processing_*.py, *_processor.py

Engine/runtime performance metadata:

  • metadata/storagellm/*run_summary*.json
  • metadata/gguf/*.json, metadata/safetensors/*.json, metadata/sidecar/*.json
  • verify/*.juju.verify.json
  • storagellm_performance_metadata_manifest.json

Generated sidecar upload policy:

  • Only README.md may be uploaded as Markdown; it is the Hugging Face model card.
  • README.md is not runtime metadata and is never used as an engine contract.
  • Runtime/performance sidecars are structured JSON/YAML/TOML only.
  • Generated analysis/performance .md, .pdf, .txt, .csv, .html, and .ipynb files are blocked from upload.

Runtime flags embedded in the JUJU contract: none.

Download note: download the full HF repo, not only *.juju, so the engine also gets tokenizer/config/chat-template/processor/custom-code assets and the StorageLLM performance metadata sidecars. The notebook sends SOURCE_HF_TOKEN when set, otherwise it uses the same HF_TOKEN used for upload. This keeps the flow working if the source repo becomes gated and the token has accepted access.

Downloads last month
523
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for storagejuju/gemma-4-26b-a4b-it-mxfp4-moe-juju

Finetuned
(114)
this model