HiDream-O1-Image-Dev SDNQ - Dynamic UINT4 threshold 1e-2, fixed

Fastest balanced fixed variant. Dynamic quantization keeps the known artifact-producing down/output projections unquantized and raises difficult layers automatically.

This repository is part of the fixed SDNQ 4-bit HiDream O1 quantization set. The previous broad 4-bit recipes produced a visible tiled/grid artifact. The fix keeps the sensitive decoder projection path in higher precision, especially model.language_model.layers.*.mlp.down_proj.weight.

Comparison

Benchmarks were run on an NVIDIA RTX PRO 6000 Blackwell Workstation Edition with the HiDream O1 repository inference path, BF16 runtime, 28 flash-scheduler steps, and flash attention disabled for parity. The requested 1024x1024 size is snapped by O1 to 2048x2048.

Model Best for Avg gen s Gen time vs BF16 Peak alloc GiB VRAM saved Param storage GiB Storage saved Quantized layers Quantized params B
Original BF16 Baseline quality/reference 8.20 - 17.11 - 16.40 - - 0.00
Dynamic UINT4 threshold 1e-2, fixed Fast balanced 9.03 +10% 10.60 +38% 9.87 +40% int5:31, uint4:265 5.10
Static UINT4 + SVD r32, o/down BF16 guard Safe default 9.17 +12% 10.46 +39% 9.71 +41% uint4:296 5.10
Static UINT4 + SVD r32, down_proj BF16 Minimal fix 9.24 +13% 9.66 +44% 8.92 +46% uint4:332 5.71
Static UINT4 + SVD r32, last 8 o/down BF16 Lowest VRAM 9.45 +15% 7.98 +53% 7.23 +56% uint4:352 6.98
Static UINT4 + SVD r32, last 16 o/down BF16 Memory/quality 9.35 +14% 8.70 +49% 7.94 +52% uint4:336 6.44

Variant Strengths

  • Dynamic UINT4 threshold 1e-2, fixed: Fastest balanced fixed variant. Dynamic quantization keeps the known artifact-producing down/output projections unquantized and raises difficult layers automatically.
  • Static UINT4 + SVD r32, o/down BF16 guard: Conservative default. Keeps both attention output and MLP down projections in BF16, the visually accepted fix for the tiled-grid artifact.
  • Static UINT4 + SVD r32, down_proj BF16: Smallest root-cause fix. Only MLP down projections are kept in BF16 beyond the standard output/embed skips; this isolates down_proj as the main grid culprit.
  • Static UINT4 + SVD r32, last 8 o/down BF16: Most memory-efficient clean-looking compromise from the matrix. It protects only the last 8 decoder layers' o/down projections.
  • Static UINT4 + SVD r32, last 16 o/down BF16: Safer memory-efficient compromise. It protects the last 16 decoder layers' o/down projections and keeps much lower storage than the full o/down guard.

This Variant

  • Source model: HiDream-ai/HiDream-O1-Image-Dev
  • Source snapshot: 833d408a57a7c1e399757c7f2f174670726fd43c
  • Recipe: pub_dynamic_uint4_th1e2_fixed
  • SDNQ layer counts: {"int5": 31, "uint4": 265}
  • Quantized parameter counts: {"int5": 151750656, "uint4": 4949600256}
  • Benchmark average generation time: 9.03s
  • Benchmark peak allocated VRAM: 10.60 GiB
  • Saved parameter storage: 9.87 GiB
  • 10-demo average generation time: 9.44s
  • 10-demo peak allocated VRAM: 10.60 GiB

Demo Comparisons

Each image in comparison/ is a pairwise original BF16 output next to this quantized variant with the same prompt, seed, and sampler settings.

Contact sheet

Usage

pip install sdnq torch transformers diffusers accelerate einops pillow scipy torchvision
git clone https://github.com/HiDream-ai/HiDream-O1-Image
cd HiDream-O1-Image
import torch
import sdnq
from transformers import AutoProcessor
from models.qwen3_vl_transformers import Qwen3VLForConditionalGeneration

model_id = "WaveCut/HiDream-O1-Image-Dev-SDNQ-4bit-dynamic-uint4-th1e-2"
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="cuda",
).eval()

Files

  • quantization_config.json - saved SDNQ config.
  • quantization_summary.json - quantized layer/parameter/storage summary.
  • benchmark_summary.json - matrix metrics plus 10-demo generation metrics.
  • comparison/00.jpg ... comparison/09.jpg - pairwise original vs quantized comparisons.
  • comparison/contact_sheet.jpg - compact overview of all 10 comparisons.
Downloads last month
78
Safetensors
Model size
6B params
Tensor type
BF16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WaveCut/HiDream-O1-Image-Dev-SDNQ-4bit-dynamic-uint4-th1e-2

Quantized
(7)
this model

Collection including WaveCut/HiDream-O1-Image-Dev-SDNQ-4bit-dynamic-uint4-th1e-2