---
license: apache-2.0
base_model:
- ByteDance/Bernini-R
tags:
- bernini
- quantized
- wan
---
# Bernini Renderer — Mixed INT8+INT4 Quantized Weights

**Distribution format:** 13 GB per model (85% reduction from 79 GB FP32 originals)

## Files

| File | Size | Description |
|------|------|-------------|
| `bernini_renderer_high.mixed-int8-int4p.safetensors` | 13 GB | high model
| `bernini_renderer_low.mixed-int8-int4p.safetensors` | 13 GB | low model 
| `config.json` | — | Model architecture config |
| `model_high.safetensors.index.json` | — | Weight map for high variant |
| `model_low.safetensors.index.json` | — | Weight map for low variant |
| `load_mixed.py` | — | Python loader with dequantization |

## Quantization Strategy

Mixed-precision per-channel asymmetric quantization:

| Component | Format | Rationale |
|-----------|--------|-----------|
| T5 text encoder weights | **INT8** (per-channel) | Preserves prompt encoding fidelity |
| Diffusion transformer attention/FFN | **Packed INT4** (per-channel) | Bulk compression — 75% size reduction on largest tensors |
| Embedding tables (T5 + diffusion) | **Packed INT4** (per-channel) | Lookup tables tolerate aggressive quantization |
| Layer norms, biases, scale_shift_tables | **FP32** (unchanged) | Small tensors where precision matters |

## Quality

Tested at 10 and 40 generation steps — visually indistinguishable from FP32 originals on standard prompts. Sharp details preserved, no artifacts or quality degradation observed.

## Usage

```python
from load_mixed import load_bernini_mixed

# Load high-res variant (dequantizes to FP32 in memory)
state_dict = load_bernini_mixed(
    "bernini_renderer_high.mixed-int8-int4p.safetensors",
    torch_dtype=torch.float16  # or torch.float32
)
```

## Original Model

ByteDance/Bernini: https://huggingface.co/ByteDance/Bernini

Quantization performed by ultimo-intento, June 2026.