Instructions to use attashe/Bernini-Wan2.2-fp8-scaled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Wan2.2
How to use attashe/Bernini-Wan2.2-fp8-scaled with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
license: apache-2.0
base_model:
- ByteDance/Bernini-Diffusers
tags:
- wan2.2
- bernini
- fp8
- comfyui
- text-to-video
Bernini (full) Wan2.2 renderer — fp8 e4m3 scaled (ComfyUI)
The two DiT renderer transformers of the full ByteDance Bernini
pipeline (diff_dec = high-noise expert, diff_dec_low = low-noise expert),
quantized to fp8 e4m3 scaled in the ComfyUI format.
The layout is byte-for-byte structurally identical to
Comfy-Org/Bernini-R's
wan2.2_bernini_r_*_fp8_scaled.safetensors (verified: same 1815 keys, shapes,
dtypes, and __metadata__) — the difference is only the weights, which here are
the full Bernini renderer (jointly trained with the MLLM planner) rather than
the renderer-only Bernini-R.
Files
| File | model_type | size |
|---|---|---|
wan2.2_bernini_high_noise_fp8_scaled.safetensors |
bernini_high |
~15.5 GB |
wan2.2_bernini_low_noise_fp8_scaled.safetensors |
bernini_low |
~15.5 GB |
Drop them into ComfyUI/models/diffusion_models/ and use them anywhere the
Bernini-R fp8_scaled files work (same model_type, same keys).
Quantization details
- Format marker per quantized weight:
comfy_quant = {"format": "float8_e4m3fn"}. - Quantized Linears:
self_attn.{q,k,v,o},cross_attn.{q,k,v}(cross-attnokept in fp16),ffn.0,ffn.2— 9 per block × 40 = 360 weights per expert. - For each quantized weight
W:scale = max(|W|)/448,W_fp8 = (W/scale).clamp(±448).to(float8_e4m3fn), stored alongside a scalarweight_scale(fp32). Dequant:W ≈ W_fp8.to(dtype) * weight_scale. - Everything else (norms,
modulation,patch_embedding,text/time_embedding,time_projection,head, all biases) is kept in fp16. - Mean per-tensor reconstruction error ≈ 2.2%.
- Source: extracted from
ByteDance/Bernini-Diffusers(bernini/checkpoint, fp32), with diffusersWanTransformer3DModelkeys remapped to the original Wan / ComfyUI naming.
License: Apache-2.0, inherited from the upstream Bernini release.