transformerlab
/

ideogram-4-gguf-q4_k

@@ -20,20 +20,24 @@ also need the **Qwen3-VL text encoder and VAE** from the base repo [`ideogram-ai
 and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
 The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`).
-## Why this one
 Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same
 on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick /
 +2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.
-## Method
-Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact
-against the gguf-py reference decoder); non-linear tensors kept F16.
-## Numbers (preliminary — single n=50 slice)
 - Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
 - Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
 - Full-battery validation is in progress.
 ## How to run (self-contained)
 Everything you need is in this repo. The GGUF is the **quantized DiT only**, so

 and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
 The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`).
+## Why Q4_K
 Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same
 on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick /
 +2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.
+## Samples
+![image (8)](https://cdn-uploads.huggingface.co/production/uploads/6316131329411a6864b13751/1gGu1ZK500Sw4F02Qofil.png)
+## Benchmarks (preliminary — single n=50 slice)
 - Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
 - Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
 - Full-battery validation is in progress.
+## Method
+Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact
+against the gguf-py reference decoder); non-linear tensors kept F16.
 ## How to run (self-contained)
 Everything you need is in this repo. The GGUF is the **quantized DiT only**, so