--- language: - en license: other base_model: - krea/Krea-2-Raw base_model_relation: quantized library_name: diffusers pipeline_tag: text-to-image tags: - diffusers - safetensors - text-to-image - krea2 - sdnq - uint4 - 4-bit - quantized --- # Krea 2 Raw SDNQ UINT4 SDNQ UINT4 quantization of [krea/Krea-2-Raw](https://huggingface.co/krea/Krea-2-Raw) for Diffusers `Krea2Pipeline`. ![Original vs SDNQ comparison](assets/original_vs_sdnq_raw.webp) ## What Is Quantized Selected recipe: `uint4-static-transformer-only`. Quantized components: `transformer`. Tokenizer, scheduler, and non-selected pipeline components are copied from the original Diffusers pipeline. The initial smoke sweep also tried SDNQ packing for the text encoder, but standard Diffusers/Transformers loading rejected the packed `Qwen3VLModel` text-encoder weights. This release therefore keeps the text encoder loadable in bf16 and quantizes the Krea transformer only. ## Benchmark Setup - Pipeline: `Krea2Pipeline` - Resolution: 1024x1024 - Steps: 52 - Guidance scale: 3.5 - Seed base: 61000 - Distilled mode: `false` - Torch dtype: bfloat16 - Attention backend: diffusers native attention - Prompt set: 10 prompts covering simple scenes, public-domain style stress tests, tricky composition, long Latin text, long Cyrillic text, and mixed Latin/Cyrillic diagrams - Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition on a disposable RunPod pod with local container disk ## Benchmark Summary | Model | Load | First gen | Hot mean | Hot max | Load GPU peak | Gen GPU peak | Torch peak | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | original | 8.442 s | 163.733 s | 163.403 s | 163.422 s | 33487 MB | 44154 MB | 42717.1767578125 MB | | uint4-static-transformer-only | 5.954 s | 160.935 s | 157.422 s | 157.457 s | 16041 MB | 26788 MB | 25272.396484375 MB | Storage size of this release directory: 15.38 GB. Quantized local checkpoint size before packaging: 15.36 GB. Raw per-prompt metrics are available in `benchmark/*.csv` and `benchmark/*.jsonl`. The combined benchmark summary is in `benchmark/summary.json`. ## Usage ```bash pip install -U git+https://github.com/huggingface/diffusers.git transformers accelerate safetensors huggingface_hub sdnq ``` ```python import torch from diffusers import Krea2Pipeline from sdnq.loader import apply_sdnq_options_to_model repo_id = "WaveCut/Krea-2-Raw-SDNQ-uint4" device = "cuda" pipe = Krea2Pipeline.from_pretrained( repo_id, torch_dtype=torch.bfloat16, is_distilled=False, ) for name in ['transformer']: module = getattr(pipe, name, None) if module is not None: setattr( pipe, name, apply_sdnq_options_to_model(module, dtype=torch.bfloat16, use_quantized_matmul=True), ) pipe.to(device) image = pipe( prompt="A clean technical poster with readable labels", height=1024, width=1024, num_inference_steps=52, guidance_scale=3.5, generator=torch.Generator(device=device).manual_seed(0), ).images[0] image.save("krea2-sdnq.png") ``` ## Quantization Recipe ```json { "dynamic_loss_threshold": null, "modules": [ "transformer" ], "name": "uint4-static-transformer-only", "quant_conv": false, "quant_embedding": false, "svd_rank": 32, "svd_steps": 32, "use_dynamic_quantization": false, "use_svd": false, "weights_dtype": "uint4" } ``` The checkpoint was produced by loading the original Diffusers pipeline, applying `sdnq_post_load_quant` only to the listed pipeline components, and saving with `save_sdnq_model(..., is_pipeline=True)`. ## Limitations - This is a quantized derivative and inherits the base model behavior, limits, and license terms. - The comparison set is a deployment smoke benchmark, not a preference study or FID evaluation. - Long text, small labels, and mixed Cyrillic/Latin diagrams should be inspected manually before production use. - Benchmark numbers depend on GPU, driver, PyTorch, Diffusers, SDNQ, and CUDA versions. ## License This repository contains a quantized derivative of `krea/Krea-2-Raw`. Upstream license material copied during packaging: `LICENSE.pdf`. Review the upstream Krea model card and license before use or redistribution.