Instructions to use attashe/Bernini-Wan2.2-fp8-scaled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Wan2.2
How to use attashe/Bernini-Wan2.2-fp8-scaled with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 2,210 Bytes
1ca5552 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | ---
license: apache-2.0
base_model:
- ByteDance/Bernini-Diffusers
tags:
- wan2.2
- bernini
- fp8
- comfyui
- text-to-video
---
# Bernini (full) Wan2.2 renderer — fp8 e4m3 *scaled* (ComfyUI)
The two DiT renderer transformers of the **full** [ByteDance Bernini](https://github.com/bytedance/Bernini)
pipeline (`diff_dec` = high-noise expert, `diff_dec_low` = low-noise expert),
quantized to **fp8 `e4m3` scaled** in the ComfyUI format.
The layout is **byte-for-byte structurally identical** to
[`Comfy-Org/Bernini-R`](https://huggingface.co/Comfy-Org/Bernini-R)'s
`wan2.2_bernini_r_*_fp8_scaled.safetensors` (verified: same 1815 keys, shapes,
dtypes, and `__metadata__`) — the difference is only the weights, which here are
the **full Bernini** renderer (jointly trained with the MLLM planner) rather than
the renderer-only Bernini-R.
## Files
| File | model_type | size |
|------|------------|------|
| `wan2.2_bernini_high_noise_fp8_scaled.safetensors` | `bernini_high` | ~15.5 GB |
| `wan2.2_bernini_low_noise_fp8_scaled.safetensors` | `bernini_low` | ~15.5 GB |
Drop them into `ComfyUI/models/diffusion_models/` and use them anywhere the
Bernini-R fp8_scaled files work (same `model_type`, same keys).
## Quantization details
- Format marker per quantized weight: `comfy_quant = {"format": "float8_e4m3fn"}`.
- Quantized Linears: `self_attn.{q,k,v,o}`, `cross_attn.{q,k,v}` (cross-attn `o`
kept in fp16), `ffn.0`, `ffn.2` — 9 per block × 40 = 360 weights per expert.
- For each quantized weight `W`: `scale = max(|W|)/448`,
`W_fp8 = (W/scale).clamp(±448).to(float8_e4m3fn)`, stored alongside a scalar
`weight_scale` (fp32). Dequant: `W ≈ W_fp8.to(dtype) * weight_scale`.
- Everything else (norms, `modulation`, `patch_embedding`, `text/time_embedding`,
`time_projection`, `head`, all biases) is kept in **fp16**.
- Mean per-tensor reconstruction error ≈ **2.2%**.
- Source: extracted from [`ByteDance/Bernini-Diffusers`](https://huggingface.co/ByteDance/Bernini-Diffusers)
(`bernini/` checkpoint, fp32), with diffusers `WanTransformer3DModel` keys
remapped to the original Wan / ComfyUI naming.
License: Apache-2.0, inherited from the upstream Bernini release.
|