| --- |
| license: apache-2.0 |
| tags: |
| - text-to-image |
| - int8 |
| - quantized |
| - convrot |
| - comfyui |
| - z-image |
| base_model: |
| - Tongyi-MAI/Z-Image-Turbo |
| - Tongyi-MAI/Z-Image |
| --- |
| |
| # Z-Image Turbo + Base — INT8 ConvRot (with Qwen 3 4B Text Encoder) |
|  |
|
|
| INT8 row-wise quantized versions of **Z-Image Base** and **Z-Image Turbo**, using ConvRot (Hadamard-rotation outlier suppression) for improved quantization fidelity, plus a matching INT8 ConvRot quantization of the Qwen 3 4B text encoder. Converted with [convert_to_quant](https://github.com/silveroxides/convert_to_quant) for native ComfyUI compatibility. |
|
|
| ## Files |
|
|
| | File | Description | |
| |---|---| |
| | `z_image_int8_convrot.safetensors` | Z-Image Base, INT8 + ConvRot | |
| | `z_image_turbo_int8_convrot.safetensors` | Z-Image Turbo, INT8 + ConvRot | |
| | `qwen_3_4b_int8_convrot.safetensors` | Qwen 3 4B text encoder, INT8 + ConvRot | |
|
|
| ## Why ConvRot + Row-Wise Scaling |
|
|
| ConvRot applies a group-wise Hadamard rotation to suppress weight outliers before quantization, improving INT8 fidelity versus plain per-tensor or per-row quantization alone. Critically, **these conversions use `--scaling_mode row`, not `tensor`**. Tensor-wise scaling computes a single scale factor for an entire weight matrix; even a small number of outlier values forces that global scale to widen, coarsening quantization precision across the rest of the matrix. In testing, this combination (ConvRot + tensor-wise scaling) produced visibly fuzzy, detail-smoothed output. Switching to row-wise scaling — which computes an independent scale per row, isolating outliers to the rows that contain them — resolved this and produced output sharpness matching or exceeding plain INT8 row-wise quantization. |
| |
| If you encounter other ConvRot-quantized models with soft or "waxy" output, this scaling mode mismatch is the most likely culprit. |
| |
| ## Quantization Recipe |
| |
| ``` |
| ctq -i <model>.safetensors -o <model>-int8-convrot.safetensors \ |
| --int8 --scaling_mode row --simple --low-memory \ |
| --convrot --convrot-group-size 64 \ |
| --zimage --comfy_quant --save-quant-metadata |
| ``` |
| |
| The Qwen 3 4B text encoder was converted with the same flags, omitting `--zimage` (no architecture-specific preset needed for this text encoder; verify its native hidden dimensions divide cleanly by the chosen group size before quantizing). |
| |
| ### Why group size 64 |
| |
| ComfyUI's `comfy_kitchen` runtime requires the ConvRot Hadamard block size to be a power of 4 (4, 16, 64, 256, 1024…), not merely a power of 2. A group size of 64 was chosen because it divides cleanly into every 2D weight dimension in the Z-Image architecture, requiring no manual layer exclusions. |
| |
| ## Usage in ComfyUI |
| |
| Load `z_image_int8_convrot.safetensors` or `z_image_turbo_int8_convrot.safetensors` with a standard `UNETLoader` node, and `qwen_3_4b_int8_convrot.safetensors` with a `CLIPLoader` node (type: your Z-Image workflow's text encoder type). No special ConvRot-aware nodes are required; the rotation metadata is embedded via `--save-quant-metadata` and read automatically by ComfyUI's mixed-precision quantization ops. |
| |
| ## Hardware |
| |
| Converted and tested on an RTX 3070 (8GB VRAM) using `--low-memory` streaming conversion. |
| |
| ## Credits |
| |
| Quantization tooling: [silveroxides/convert_to_quant](https://github.com/silveroxides/convert_to_quant) |
| Base models: [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image), [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) |