Ideogram 4 β€” GGUF Q4_K (Transformer Lab)

A GGUF Q4_K (4.5 bits/weight) quantization of the Ideogram 4 DiT, sized for consumer GPUs.

⚠️ Not a llama.cpp / stable-diffusion.cpp file. Despite the .gguf extension, this loads only via the included PyTorch gguf_loader.py + the ideogram4 pipeline. It is not compatible with llama.cpp, stable-diffusion.cpp, Ollama, etc.

ℹ️ Quantized DiT only. This checkpoint is the DiT (both CFG branches). To generate you also need the Qwen3-VL text encoder and VAE from the base repo ideogram-ai/ideogram-4-fp8 and the custom inference code at github.com/ideogram-oss/ideogram4. The quantization recipe and loader are included in this repo (recipe-q4_k.json, gguf_loader.py).

Why Q4_K

Q4_K is the Pareto winner on the quality-vs-memory frontier: at 10.4 GB (the same on-disk size class as the published NF4 build) it beats NF4 on quality by +0.84 Pick / +2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.

Samples

image (8)

Benchmarks (preliminary β€” single n=50 slice)

  • Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
  • Latency ~203 s/img (48 steps, 1024Β², RTX 3090); ~23% slower than NF4.
  • Full-battery validation is in progress.

Method

Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact against the gguf-py reference decoder); non-linear tensors kept F16.

How to run (self-contained)

Everything you need is in this repo. The GGUF is the quantized DiT only, so step 1 fetches the text encoder + VAE + the inference package.

# 1) one-time: install the ideogram4 package + download the base components
#    (needs your own access to the GATED base repo ideogram-ai/ideogram-4-fp8)
python download_deps.py

# 2) generate
python usage.py "a poster that says HELLO"

Files here:

  • ideogram4-q4_k.gguf β€” the Q4_K quantized DiT (both CFG branches).
  • gguf_loader.py β€” loads + dequantizes the GGUF into the pipeline (reference impl).
  • download_deps.py, usage.py β€” setup + a minimal generation example.
  • recipe-q4_k.json β€” the exact quantization recipe / tensor layout.

gguf_loader.py is a reference: the dequant math is validated bit-exact, but the standalone loader hasn't been GPU-tested end-to-end yet β€” verify before production use. This is not a llama.cpp / stable-diffusion.cpp file; it loads only via this PyTorch path + the ideogram4 pipeline.

License

Derived from Ideogram 4 under its non-commercial, research-only license. See LICENSE.

Downloads last month
288
GGUF
Model size
19B params
Architecture
ideogram4
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for transformerlab/ideogram-4-gguf-q4_k

Quantized
(10)
this model