--- license: apache-2.0 base_model: - ByteDance/Bernini-R tags: - bernini - quantized - wan --- # Bernini Renderer — Mixed INT8+INT4 Quantized Weights **Distribution format:** 13 GB per model (85% reduction from 79 GB FP32 originals) ## Files | File | Size | Description | |------|------|-------------| | `bernini_renderer_high.mixed-int8-int4p.safetensors` | 13 GB | high model | `bernini_renderer_low.mixed-int8-int4p.safetensors` | 13 GB | low model | `config.json` | — | Model architecture config | | `model_high.safetensors.index.json` | — | Weight map for high variant | | `model_low.safetensors.index.json` | — | Weight map for low variant | | `load_mixed.py` | — | Python loader with dequantization | ## Quantization Strategy Mixed-precision per-channel asymmetric quantization: | Component | Format | Rationale | |-----------|--------|-----------| | T5 text encoder weights | **INT8** (per-channel) | Preserves prompt encoding fidelity | | Diffusion transformer attention/FFN | **Packed INT4** (per-channel) | Bulk compression — 75% size reduction on largest tensors | | Embedding tables (T5 + diffusion) | **Packed INT4** (per-channel) | Lookup tables tolerate aggressive quantization | | Layer norms, biases, scale_shift_tables | **FP32** (unchanged) | Small tensors where precision matters | ## Quality Tested at 10 and 40 generation steps — visually indistinguishable from FP32 originals on standard prompts. Sharp details preserved, no artifacts or quality degradation observed. ## Usage ```python from load_mixed import load_bernini_mixed # Load high-res variant (dequantizes to FP32 in memory) state_dict = load_bernini_mixed( "bernini_renderer_high.mixed-int8-int4p.safetensors", torch_dtype=torch.float16 # or torch.float32 ) ``` ## Original Model ByteDance/Bernini: https://huggingface.co/ByteDance/Bernini Quantization performed by ultimo-intento, June 2026.