# Z-Image Base NVFP4 Quantized Models NVFP4 (4-bit NormalFloat) quantized versions of [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) Base model for ComfyUI. These quantizations offer different trade-offs between quality and file size, allowing you to choose the best option for your hardware and quality requirements. ## 📊 Model Variants | Variant | NVFP4 Layers | What stays in BF16 | Size | Quality | |---------|--------------|-------------------|------|---------| | **Ultra** | 60 | Attention + layers 0-4 & 25-29 | ~8.0 GB | ⭐⭐⭐⭐⭐ | | **Quality** | 90 | All attention (qkv, out) | ~6.5 GB | ⭐⭐⭐ | | **Mixed** | 180 | Refiners, embedders, final layer | ~4.5 GB | ⭐ | | **Full** | 204 | Only critical embedders | ~3.5 GB | ⭐ | > **Original BF16 model size: 12.3 GB** ## 🎯 Which variant should I use? - **Ultra**: Best quality, closest to original BF16. Use if you have enough VRAM and want maximum fidelity. - **Quality**: Excellent quality with significant size reduction. Recommended for most users. - **Mixed**: Poor quality not recommended. - **Full**: Poor quality not recommended. ## 🔧 Technical Details ### Quantization Strategy The key insight is that **attention layers** (qkv, out) are much more sensitive to quantization than **feed_forward layers** (w1, w2, w3). - **Ultra** only quantizes feed_forward in middle layers (5-24), keeping first/last layers and all attention in BF16 - **Quality** quantizes all feed_forward but keeps all attention in BF16 - **Mixed** quantizes everything in the 30 main transformer layers - **Full** additionally quantizes context_refiner, noise_refiner, and t_embedder ### NVFP4 Format Structure Each quantized layer contains: - `{layer}.weight`: uint8 (2 FP4 values packed per byte) - `{layer}.weight_scale`: float8_e4m3fn, 2D (per-block scale, 16-element blocks) - `{layer}.weight_scale_2`: float32, scalar (per-tensor scale) - `{layer}.input_scale`: float32, scalar (activation scale) ## 💻 Usage in ComfyUI ### Requirements > ⚠️ **NVFP4 requires specific hardware and software!** - **GPU**: NVIDIA Blackwell series (RTX 5080 / 5090) - NVFP4 is a Blackwell-exclusive feature - **PyTorch**: 2.9.0+ with CUDA 13.0 (`cu130`) - older versions do not support NVFP4 - **ComfyUI**: Latest version (updated regularly) - **comfy-kitchen**: >= 0.2.7 ### Recommended Settings ``` Model: z-image-base-nvfp4_[variant].safetensors Steps: 28-50 CFG Scale: 3.0-5.0 ``` > ⚠️ **Note**: This is Z-Image **Base**, not Turbo. Use 28-50 steps with CFG guidance, not 8 steps like Turbo. ## 📝 Model Architecture Z-Image Base is a 6B parameter diffusion transformer based on the NextDiT architecture: - 30 main transformer layers - 2 context refiner layers - 2 noise refiner layers - Hidden dimension: 3840 - Attention heads: 30 - Supports CFG (Classifier-Free Guidance) ## 🙏 Credits - Original model: [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) by Alibaba - Quantization format: ComfyUI NVFP4 implementation - Conversion script: Custom Python script using ComfyUI's TensorCoreNVFP4Layout ## 📄 License Please refer to the original model's license at [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image).