---
language:
- en
license: other
base_model:
- krea/Krea-2-Raw
base_model_relation: quantized
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- safetensors
- text-to-image
- krea2
- sdnq
- uint4
- 4-bit
- quantized
---

# Krea 2 Raw SDNQ UINT4

SDNQ UINT4 quantization of [krea/Krea-2-Raw](https://huggingface.co/krea/Krea-2-Raw) for Diffusers `Krea2Pipeline`.

![Original vs SDNQ comparison](assets/original_vs_sdnq_raw.webp)

## What Is Quantized

Selected recipe: `uint4-static-transformer-only`.

Quantized components: `transformer`.
Tokenizer, scheduler, and non-selected pipeline components are copied from the original Diffusers pipeline.

The initial smoke sweep also tried SDNQ packing for the text encoder, but standard Diffusers/Transformers loading rejected the packed `Qwen3VLModel` text-encoder weights. This release therefore keeps the text encoder loadable in bf16 and quantizes the Krea transformer only.

## Benchmark Setup

- Pipeline: `Krea2Pipeline`
- Resolution: 1024x1024
- Steps: 52
- Guidance scale: 3.5
- Seed base: 61000
- Distilled mode: `false`
- Torch dtype: bfloat16
- Attention backend: diffusers native attention
- Prompt set: 10 prompts covering simple scenes, public-domain style stress tests, tricky composition, long Latin text, long Cyrillic text, and mixed Latin/Cyrillic diagrams
- Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition on a disposable RunPod pod with local container disk

## Benchmark Summary

| Model | Load | First gen | Hot mean | Hot max | Load GPU peak | Gen GPU peak | Torch peak |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| original | 8.442 s | 163.733 s | 163.403 s | 163.422 s | 33487 MB | 44154 MB | 42717.1767578125 MB |
| uint4-static-transformer-only | 5.954 s | 160.935 s | 157.422 s | 157.457 s | 16041 MB | 26788 MB | 25272.396484375 MB |

Storage size of this release directory: 15.38 GB. Quantized local checkpoint size before packaging: 15.36 GB.

Raw per-prompt metrics are available in `benchmark/*.csv` and `benchmark/*.jsonl`. The combined benchmark summary is in `benchmark/summary.json`.

## Usage

```bash
pip install -U git+https://github.com/huggingface/diffusers.git transformers accelerate safetensors huggingface_hub sdnq
```

```python
import torch
from diffusers import Krea2Pipeline
from sdnq.loader import apply_sdnq_options_to_model

repo_id = "WaveCut/Krea-2-Raw-SDNQ-uint4"
device = "cuda"

pipe = Krea2Pipeline.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    is_distilled=False,
)

for name in ['transformer']:
    module = getattr(pipe, name, None)
    if module is not None:
        setattr(
            pipe,
            name,
            apply_sdnq_options_to_model(module, dtype=torch.bfloat16, use_quantized_matmul=True),
        )

pipe.to(device)
image = pipe(
    prompt="A clean technical poster with readable labels",
    height=1024,
    width=1024,
    num_inference_steps=52,
    guidance_scale=3.5,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]
image.save("krea2-sdnq.png")
```

## Quantization Recipe

```json
{
  "dynamic_loss_threshold": null,
  "modules": [
    "transformer"
  ],
  "name": "uint4-static-transformer-only",
  "quant_conv": false,
  "quant_embedding": false,
  "svd_rank": 32,
  "svd_steps": 32,
  "use_dynamic_quantization": false,
  "use_svd": false,
  "weights_dtype": "uint4"
}
```

The checkpoint was produced by loading the original Diffusers pipeline, applying `sdnq_post_load_quant` only to the listed pipeline components, and saving with `save_sdnq_model(..., is_pipeline=True)`.

## Limitations

- This is a quantized derivative and inherits the base model behavior, limits, and license terms.
- The comparison set is a deployment smoke benchmark, not a preference study or FID evaluation.
- Long text, small labels, and mixed Cyrillic/Latin diagrams should be inspected manually before production use.
- Benchmark numbers depend on GPU, driver, PyTorch, Diffusers, SDNQ, and CUDA versions.

## License

This repository contains a quantized derivative of `krea/Krea-2-Raw`. Upstream license material copied during packaging: `LICENSE.pdf`. Review the upstream Krea model card and license before use or redistribution.