--- license: apache-2.0 language: - en base_model: Tongyi-MAI/Z-Image-Turbo base_model_relation: quantized pipeline_tag: text-to-image library_name: diffusers tags: - image - image-generation - z-image - z-image-turbo - fp8 - z-image-fp8 - z-image-turbo-fp8 --- # Z-Image-Turbo (FP8 E5M2 & E4M3FN) This is a quantization of [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) to **FP8 E5M2** and **FP8 E4M3FN**. **License & Usage:** This model strictly follows the original licensing terms and usage restrictions. Please refer to the [original model card](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) for details. ## Files in this repo - Full Diffusers pipeline copied from `Tongyi-MAI/Z-Image` (cached snapshot) with FP8 transformer weights. - Default transformer weights: `transformer/diffusion_pytorch_model.safetensors` (E4M3FN). - Alternate transformer weights: `transformer/diffusion_pytorch_model_e5m2.safetensors` (E5M2). To switch variants, load the pipeline and replace the transformer weights from the alternate file in `transformer/`. ## Requirements - PyTorch with CUDA support (tested with 2.10.0+cu130) - Diffusers (latest `main` recommended) - For FP8 execution: NVIDIA Transformer Engine (TE) built for your CUDA + Python version ## FP8 execution (Transformer Engine) The sample script `create-image.py` uses NVIDIA Transformer Engine (TE) to run FP8 kernels on supported GPUs (e.g., Blackwell). Install TE in your environment and run the script from this repo directory. ## BF16 fallback (no FP8 kernels) For GPUs without FP8 kernel support (or if TE is unavailable), use `create-image-bf16.py`. It loads the same FP8 weights but casts to BF16 for compute so it runs everywhere (at lower speed vs true FP8). ## Usage - After downloading, the scripts default to `MODEL_ID=ykarout/Z-Image-Turbo-FP8-Full`. - To force local loading, set `USE_LOCAL=1`.