Text-to-Image
Diffusers
Safetensors
English
Flux2KleinPipeline
image-generation
image-editing
flux
flux2
Flux2KleinPipeline
sdnq
4-bit precision
uint4
quantized
Instructions to use WaveCut/FLUX.2-klein-9B-SDNQ-uint4-static with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/FLUX.2-klein-9B-SDNQ-uint4-static with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/FLUX.2-klein-9B-SDNQ-uint4-static", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| language: | |
| - en | |
| license: other | |
| license_name: flux-non-commercial-license | |
| license_link: LICENSE.md | |
| base_model: | |
| - black-forest-labs/FLUX.2-klein-9B | |
| base_model_relation: quantized | |
| library_name: diffusers | |
| pipeline_tag: text-to-image | |
| tags: | |
| - image-generation | |
| - image-editing | |
| - flux | |
| - flux2 | |
| - Flux2KleinPipeline | |
| - sdnq | |
| - 4-bit | |
| - uint4 | |
| - quantized | |
| - diffusers | |
| # FLUX.2 Klein 9B SDNQ UINT4 Static | |
| Static UINT4 SDNQ quantization of | |
| [black-forest-labs/FLUX.2-klein-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B). | |
| This checkpoint was selected as a practical deployment-oriented variant because | |
| it was the fastest option in the A40 benchmark and used substantially less VRAM | |
| than the original BF16 pipeline, while visual quality differences were minor in | |
| the prompt-following stress comparison. | |
| Related checkpoint: for a quality-oriented dynamic SVD alternative with a | |
| modest latency and VRAM tradeoff, see | |
| [WaveCut/FLUX.2-klein-9B-SDNQ-float4_e4m0fnu-dynamic-th0p01-svd-r128-s32](https://huggingface.co/WaveCut/FLUX.2-klein-9B-SDNQ-float4_e4m0fnu-dynamic-th0p01-svd-r128-s32). | |
|  | |
| The image above is a compressed WebP version of a 1:1 comparison canvas. It | |
| contains the original FLUX.2 Klein 9B, the previous SDNQ baseline, this | |
| `uint4-static` checkpoint, and a quality-oriented dynamic SVD candidate across | |
| text-heavy prompts including an additional Russian-only chalkboard prompt. | |
| ## Why This Variant | |
| We compared broad SDNQ 4-bit recipes across speed, VRAM, and visual quality. | |
| This `uint4-static` recipe was chosen because it gives the best deployment | |
| tradeoff: | |
| - Lowest latency among the final candidates in the single-process benchmark. | |
| - Low runtime VRAM in a 1024x1024, 4-step image-generation pipeline. | |
| - Much smaller full-pipeline checkpoint footprint than the original BF16 | |
| FLUX.2 Klein 9B checkpoint in the measured setup. | |
| - Visual differences versus the baseline and the original model were small in | |
| the stress set, including long text, signs, labels, small details, and a | |
| Russian chalkboard prompt. | |
| ## Benchmark Setup | |
| Measurements below use a single NVIDIA A40 test host and a consistent | |
| `Flux2KleinPipeline` inference harness. | |
| - GPU: NVIDIA A40 46 GB | |
| - Resolution: 1024x1024 | |
| - Steps: 4 | |
| - Guidance scale: 0.0 | |
| - Torch dtype: bfloat16 | |
| - Quantized matmul: enabled for SDNQ inference comparisons | |
| - Batch/concurrency: single process | |
| These are deployment-oriented measurements for one hardware/software setup. | |
| ## Candidate Benchmark | |
| Single-process inference metrics for the final candidate set: | |
| | Variant | Warm avg | GPU peak | CUDA allocated | | |
| | --- | ---: | ---: | ---: | | |
| | `uint4-static` | 3.826 s | 14.8 GB | 14.1 GB | | |
| | `int4-dynamic-th0p1-svd-r16-s32-g128` | 4.020 s | 14.3 GB | 13.5 GB | | |
| | `uint4-static-svd-r32-s32` | 4.070 s | 14.7 GB | 13.9 GB | | |
| | `float4_e4m0fnu-dynamic-th0p1-svd-r16-s32` | 4.116 s | 16.0 GB | 15.3 GB | | |
| | `float4_e4m0fnu-dynamic-th0p01-svd-r128-s32` | 4.185 s | 17.2 GB | 16.5 GB | | |
| ## Stress Comparison | |
| This stress set contains 9 prompts with signs, chalkboards, posters, labels, | |
| timetables, small props, and a Russian-only chalkboard prompt. Each row was run | |
| twice; the table reports the warm run average. | |
| | Model | Warm avg | GPU peak | CUDA allocated | Prompt count | | |
| | --- | ---: | ---: | ---: | ---: | | |
| | Original `FLUX.2-klein-9B` BF16 pipeline | 4.244 s | 36.3 GB | 35.6 GB | 9 | | |
| | Previous SDNQ baseline | 4.079 s | 15.2 GB | 14.5 GB | 9 | | |
| | This `uint4-static` checkpoint | 3.866 s | 14.8 GB | 14.1 GB | 9 | | |
| | Dynamic SVD r128 quality candidate | 4.182 s | 17.2 GB | 16.5 GB | 9 | | |
| The model-card image is a WebP copy optimized from the full-resolution | |
| comparison canvas: | |
| | WebP quality | Size | RGB PSNR | Luma SSIM-like score | | |
| | ---: | ---: | ---: | ---: | | |
| | 85 | 5.72 MB | 46.93 dB | 0.999977 | | |
| The source JPEG canvas was about 13 MB; this WebP version is smaller while | |
| remaining visually close to the original artifact. | |
| ## Model Size | |
| Approximate full-pipeline folder sizes in the measured setup: | |
| | Checkpoint | Folder size | | |
| | --- | ---: | | |
| | Original `black-forest-labs/FLUX.2-klein-9B` | 52.9 GB | | |
| | Previous SDNQ baseline | 12.6 GB | | |
| | This `uint4-static` checkpoint | 12.2 GB | | |
| | Dynamic SVD r128 candidate | 14.7 GB | | |
| ## Usage | |
| Install current Diffusers and SDNQ: | |
| ```bash | |
| pip install git+https://github.com/huggingface/diffusers.git | |
| pip install sdnq | |
| ``` | |
| Run with `Flux2KleinPipeline`: | |
| ```python | |
| import torch | |
| from diffusers import Flux2KleinPipeline | |
| from sdnq import SDNQConfig # registers SDNQ support in diffusers/transformers | |
| from sdnq.common import use_torch_compile as triton_is_available | |
| from sdnq.loader import apply_sdnq_options_to_model | |
| repo_id = "WaveCut/FLUX.2-klein-9B-SDNQ-uint4-static" | |
| device = "cuda" | |
| pipe = Flux2KleinPipeline.from_pretrained( | |
| repo_id, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| if triton_is_available and torch.cuda.is_available(): | |
| pipe.transformer = apply_sdnq_options_to_model( | |
| pipe.transformer, | |
| use_quantized_matmul=True, | |
| ) | |
| pipe.text_encoder = apply_sdnq_options_to_model( | |
| pipe.text_encoder, | |
| use_quantized_matmul=True, | |
| ) | |
| pipe.to(device) | |
| prompt = "A clean editorial poster with large readable text: OPEN SOURCE IMAGE MODEL" | |
| image = pipe( | |
| prompt=prompt, | |
| height=1024, | |
| width=1024, | |
| num_inference_steps=4, | |
| guidance_scale=0.0, | |
| generator=torch.Generator(device=device).manual_seed(0), | |
| ).images[0] | |
| image.save("flux2-klein-sdnq-uint4-static.png") | |
| ``` | |
| The same pipeline also supports image editing: | |
| ```python | |
| from diffusers.utils import load_image | |
| input_image = load_image("input.png") | |
| image = pipe( | |
| image=input_image, | |
| prompt="Turn the handwritten sign into a clean printed sign while preserving the scene", | |
| height=1024, | |
| width=1024, | |
| num_inference_steps=4, | |
| guidance_scale=0.0, | |
| generator=torch.Generator(device=device).manual_seed(1), | |
| ).images[0] | |
| image.save("flux2-klein-sdnq-uint4-static-edit.png") | |
| ``` | |
| If your GPU has less VRAM, replace `pipe.to(device)` with | |
| `pipe.enable_model_cpu_offload()`. | |
| ## Quantization Recipe | |
| This checkpoint was produced with SDNQ post-load quantization over the | |
| `transformer` and `text_encoder` components of FLUX.2 Klein 9B. | |
| Recipe: | |
| ```python | |
| variant = { | |
| "weights_dtype": "uint4", | |
| "use_dynamic_quantization": False, | |
| "dynamic_loss_threshold": None, | |
| "use_svd": False, | |
| "svd_rank": 32, # unused because use_svd is False | |
| "svd_steps": 8, # unused because use_svd is False | |
| "group_size": 0, | |
| "dequantize_fp32": False, | |
| "quantized_matmul_dtype": None, | |
| "use_quantized_matmul": False, | |
| "use_stochastic_rounding": False, | |
| } | |
| ``` | |
| Minimal quantization sketch: | |
| ```python | |
| import torch | |
| from diffusers import Flux2KleinPipeline | |
| from sdnq import sdnq_post_load_quant | |
| from sdnq.loader import save_sdnq_model | |
| base_model = "black-forest-labs/FLUX.2-klein-9B" | |
| pipe = Flux2KleinPipeline.from_pretrained( | |
| base_model, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| common_kwargs = dict( | |
| weights_dtype="uint4", | |
| torch_dtype=torch.bfloat16, | |
| group_size=0, | |
| svd_rank=32, | |
| svd_steps=8, | |
| dynamic_loss_threshold=None, | |
| use_svd=False, | |
| quant_conv=False, | |
| quant_embedding=False, | |
| use_quantized_matmul=False, | |
| use_quantized_matmul_conv=False, | |
| use_dynamic_quantization=False, | |
| use_stochastic_rounding=False, | |
| dequantize_fp32=False, | |
| non_blocking=True, | |
| add_skip_keys=True, | |
| quantization_device="cuda", | |
| return_device="cuda", | |
| ) | |
| pipe.transformer = sdnq_post_load_quant(pipe.transformer, **common_kwargs) | |
| pipe.text_encoder = sdnq_post_load_quant(pipe.text_encoder, **common_kwargs) | |
| save_sdnq_model( | |
| pipe, | |
| "FLUX.2-klein-9B-SDNQ-uint4-static", | |
| max_shard_size="5GB", | |
| is_pipeline=True, | |
| ) | |
| ``` | |
| ## Limitations | |
| - This is a quantized derivative of FLUX.2 Klein 9B; it inherits the base | |
| model's limitations and acceptable-use requirements. | |
| - Text rendering can still be inaccurate, especially for long strings or small | |
| background text. | |
| - The quality comparison here is visual prompt-following evaluation, not a | |
| large-scale human preference or FID benchmark. | |
| - Benchmarks were run on an A40 test host and should be validated again for | |
| your exact serving stack. | |
| ## License | |
| This model is a quantized derivative of | |
| [black-forest-labs/FLUX.2-klein-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) | |
| and follows the FLUX Non-Commercial License. Please review `LICENSE.md` and the | |
| Black Forest Labs acceptable-use policy before use. | |