# Z-Image Base NVFP4 Quantized Models

NVFP4 (4-bit NormalFloat) quantized versions of [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) Base model for ComfyUI.

These quantizations offer different trade-offs between quality and file size, allowing you to choose the best option for your hardware and quality requirements.

## 📊 Model Variants

| Variant | NVFP4 Layers | What stays in BF16 | Size | Quality |
|---------|--------------|-------------------|------|---------|
| **Ultra** | 60 | Attention + layers 0-4 & 25-29 | ~8.0 GB | ⭐⭐⭐⭐⭐ |
| **Quality** | 90 | All attention (qkv, out) | ~6.5 GB | ⭐⭐⭐ |
| **Mixed** | 180 | Refiners, embedders, final layer | ~4.5 GB | ⭐ |
| **Full** | 204 | Only critical embedders | ~3.5 GB | ⭐ |

> **Original BF16 model size: 12.3 GB**

## 🎯 Which variant should I use?

- **Ultra**: Best quality, closest to original BF16. Use if you have enough VRAM and want maximum fidelity.
- **Quality**: Excellent quality with significant size reduction. Recommended for most users.
- **Mixed**: Poor quality not recommended.
- **Full**: Poor quality not recommended.

## 🔧 Technical Details

### Quantization Strategy

The key insight is that **attention layers** (qkv, out) are much more sensitive to quantization than **feed_forward layers** (w1, w2, w3). 

- **Ultra** only quantizes feed_forward in middle layers (5-24), keeping first/last layers and all attention in BF16
- **Quality** quantizes all feed_forward but keeps all attention in BF16
- **Mixed** quantizes everything in the 30 main transformer layers
- **Full** additionally quantizes context_refiner, noise_refiner, and t_embedder

### NVFP4 Format Structure

Each quantized layer contains:
- `{layer}.weight`: uint8 (2 FP4 values packed per byte)
- `{layer}.weight_scale`: float8_e4m3fn, 2D (per-block scale, 16-element blocks)
- `{layer}.weight_scale_2`: float32, scalar (per-tensor scale)
- `{layer}.input_scale`: float32, scalar (activation scale)

## 💻 Usage in ComfyUI

### Requirements

> ⚠️ **NVFP4 requires specific hardware and software!**

- **GPU**: NVIDIA Blackwell series (RTX 5080 / 5090) - NVFP4 is a Blackwell-exclusive feature
- **PyTorch**: 2.9.0+ with CUDA 13.0 (`cu130`) - older versions do not support NVFP4
- **ComfyUI**: Latest version (updated regularly)
- **comfy-kitchen**: >= 0.2.7

### Recommended Settings

```
Model: z-image-base-nvfp4_[variant].safetensors
Steps: 28-50
CFG Scale: 3.0-5.0
```

> ⚠️ **Note**: This is Z-Image **Base**, not Turbo. Use 28-50 steps with CFG guidance, not 8 steps like Turbo.

## 📝 Model Architecture

Z-Image Base is a 6B parameter diffusion transformer based on the NextDiT architecture:

- 30 main transformer layers
- 2 context refiner layers
- 2 noise refiner layers  
- Hidden dimension: 3840
- Attention heads: 30
- Supports CFG (Classifier-Free Guidance)

## 🙏 Credits

- Original model: [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) by Alibaba
- Quantization format: ComfyUI NVFP4 implementation
- Conversion script: Custom Python script using ComfyUI's TensorCoreNVFP4Layout

## 📄 License

Please refer to the original model's license at [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image).