Ideogram 4 — GGUF Q4_K (Transformer Lab)

A GGUF Q4_K (4.5 bits/weight) quantization of the Ideogram 4 DiT, sized for consumer GPUs.

⚠️ Not a llama.cpp / stable-diffusion.cpp file. Despite the .gguf extension, this loads only via the included PyTorch gguf_loader.py + the ideogram4 pipeline. It is not compatible with llama.cpp, stable-diffusion.cpp, Ollama, etc.

ℹ️ Quantized DiT only. This checkpoint is the DiT (both CFG branches). To generate you also need the Qwen3-VL text encoder and VAE from the base repo ideogram-ai/ideogram-4-fp8 and the custom inference code at github.com/ideogram-oss/ideogram4. The quantization recipe and loader are included in this repo (recipe-q4_k.json, gguf_loader.py).

Why Q4_K

Q4_K is the Pareto winner on the quality-vs-memory frontier: at 10.4 GB (the same on-disk size class as the published NF4 build) it beats NF4 on quality by +0.84 Pick / +2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.

Samples

Benchmarks (preliminary — single n=50 slice)

Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
Full-battery validation is in progress.

Method

Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact against the gguf-py reference decoder); non-linear tensors kept F16.

How to run (self-contained)

Everything you need is in this repo. The GGUF is the quantized DiT only, so step 1 fetches the text encoder + VAE + the inference package.

# 1) one-time: install the ideogram4 package + download the base components
#    (needs your own access to the GATED base repo ideogram-ai/ideogram-4-fp8)
python download_deps.py

# 2) generate
python usage.py "a poster that says HELLO"

Files here:

ideogram4-q4_k.gguf — the Q4_K quantized DiT (both CFG branches).
gguf_loader.py — loads + dequantizes the GGUF into the pipeline (reference impl).
download_deps.py, usage.py — setup + a minimal generation example.
recipe-q4_k.json — the exact quantization recipe / tensor layout.

gguf_loader.py is a reference: the dequant math is validated bit-exact, but the standalone loader hasn't been GPU-tested end-to-end yet — verify before production use. This is not a llama.cpp / stable-diffusion.cpp file; it loads only via this PyTorch path + the ideogram4 pipeline.

License

Derived from Ideogram 4 under its non-commercial, research-only license. See LICENSE.

Downloads last month: 288

GGUF

Model size

19B params

Architecture

ideogram4

Hardware compatibility

Model tree for transformerlab/ideogram-4-gguf-q4_k

Base model

ideogram-ai/ideogram-4-fp8

Quantized

(10)

this model