Add samples and rearrange model card
Browse files
README.md
CHANGED
|
@@ -20,20 +20,24 @@ also need the **Qwen3-VL text encoder and VAE** from the base repo [`ideogram-ai
|
|
| 20 |
and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
|
| 21 |
The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`).
|
| 22 |
|
| 23 |
-
## Why
|
| 24 |
Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same
|
| 25 |
on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick /
|
| 26 |
+2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.
|
| 27 |
|
| 28 |
-
##
|
| 29 |
-
|
| 30 |
-
|
| 31 |
|
| 32 |
-
##
|
| 33 |
- Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
|
| 34 |
- Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
|
| 35 |
- Full-battery validation is in progress.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
## How to run (self-contained)
|
| 38 |
|
| 39 |
Everything you need is in this repo. The GGUF is the **quantized DiT only**, so
|
|
|
|
| 20 |
and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
|
| 21 |
The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`).
|
| 22 |
|
| 23 |
+
## Why Q4_K
|
| 24 |
Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same
|
| 25 |
on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick /
|
| 26 |
+2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.
|
| 27 |
|
| 28 |
+
## Samples
|
| 29 |
+
|
| 30 |
+

|
| 31 |
|
| 32 |
+
## Benchmarks (preliminary — single n=50 slice)
|
| 33 |
- Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
|
| 34 |
- Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
|
| 35 |
- Full-battery validation is in progress.
|
| 36 |
|
| 37 |
+
## Method
|
| 38 |
+
Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact
|
| 39 |
+
against the gguf-py reference decoder); non-linear tensors kept F16.
|
| 40 |
+
|
| 41 |
## How to run (self-contained)
|
| 42 |
|
| 43 |
Everything you need is in this repo. The GGUF is the **quantized DiT only**, so
|