dadmobile commited on
Commit
93e149c
·
verified ·
1 Parent(s): a77a580

Add samples and rearrange model card

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -20,20 +20,24 @@ also need the **Qwen3-VL text encoder and VAE** from the base repo [`ideogram-ai
20
  and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
21
  The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`).
22
 
23
- ## Why this one
24
  Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same
25
  on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick /
26
  +2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.
27
 
28
- ## Method
29
- Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact
30
- against the gguf-py reference decoder); non-linear tensors kept F16.
31
 
32
- ## Numbers (preliminary — single n=50 slice)
33
  - Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
34
  - Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
35
  - Full-battery validation is in progress.
36
 
 
 
 
 
37
  ## How to run (self-contained)
38
 
39
  Everything you need is in this repo. The GGUF is the **quantized DiT only**, so
 
20
  and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
21
  The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`).
22
 
23
+ ## Why Q4_K
24
  Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same
25
  on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick /
26
  +2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.
27
 
28
+ ## Samples
29
+
30
+ ![image (8)](https://cdn-uploads.huggingface.co/production/uploads/6316131329411a6864b13751/1gGu1ZK500Sw4F02Qofil.png)
31
 
32
+ ## Benchmarks (preliminary — single n=50 slice)
33
  - Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
34
  - Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
35
  - Full-battery validation is in progress.
36
 
37
+ ## Method
38
+ Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact
39
+ against the gguf-py reference decoder); non-linear tensors kept F16.
40
+
41
  ## How to run (self-contained)
42
 
43
  Everything you need is in this repo. The GGUF is the **quantized DiT only**, so