--- license: apache-2.0 language: - en - zh pipeline_tag: image-to-video tags: - video generation - diffusion-single-file - comfyui - distillation - LoRA - quantization - nvfp4 library_name: diffusers inference: parameters: num_inference_steps: 4 base_model: - lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v base_model_relation: quantized --- # Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-NVFP4

## Overview This is a **partial NVFP4 quantization** of [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) by lightx2v, produced using [convert_to_quant](https://github.com/silveroxides/convert_to_quant) by [silveroxides](https://huggingface.co/silveroxides). [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) is an image-to-video generation model built on [Wan2.1-I2V-14B-480P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P). It applies step distillation and classifier-free guidance distillation to reduce inference to **4 steps** without CFG, cutting generation time substantially while preserving output quality. ### IMPORTANT Since NVFP4 is only supported on NVIDIA Blackwell architecture GPUs, running this model requires a Blackwell GPU with its corresponding support enabled in torch, along with a recent version of ComfyUI and [comfy-kitchen](https://github.com/Comfy-Org/comfy-kitchen) built against CUDA 13.

## Quantization The model weights have been partially quantized to **NVFP4** (NVIDIA Floating Point 4-bit) and **MXFP8**, quantization formats supported on NVIDIA Blackwell architecture GPUs. The quantization format assigned to each layer is based on a sensitivity analysis performed with a custom script, which scores each weight tensor using excess kurtosis, dynamic range, and aspect ratio. Thresholds are derived automatically from the model's own score distribution. The analysis yields the following `convert_to_quant` parameters. This conversion takes about 4 hours on an RTX 5060 resulting in a 9.76 GiB safetensors file. ```bash $ convert_to_quant -i "${1}" \ --nvfp4 --wan --comfy_quant --save-quant-metadata \ --custom-type mxfp8 \ --custom-layers "blocks\.(1|2|3)\.cross_attn\.k\.weight|blocks\.(6|8|9|10)\.cross_attn\.k\.weight|blocks\.(0|1|2|3)\.cross_attn\.v\.weight|blocks\.(6)\.cross_attn\.q\.weight|blocks\.(6|14)\.cross_attn\.o\.weight|blocks\.(0|1|2|3)\.cross_attn\.v_img\.weight|blocks\.(0)\.self_attn\.k\.weight|blocks\.(7|9|10|12|13|14)\.self_attn\.k\.weight|blocks\.(19)\.self_attn\.q\.weight|blocks\.(0|1|2|3)\.ffn\.0\.weight|blocks\.(36|37|38|39)\.ffn\.0\.weight" \ --exclude-layers "blocks\.(4|5|7)\.cross_attn\.k\.weight|blocks\.(0)\.cross_attn\.q\.weight|blocks\.(5|7|9|10|11|12|19|20)\.cross_attn\.o\.weight|blocks\.(8|11|33)\.self_attn\.k\.weight|blocks\.(38)\.self_attn\.k\.weight|blocks\.(14|16|17)\.self_attn\.q\.weight" \ --num-iter 6000 \ --top-p 0.35 \ --calib-samples 8192 \ --extract-lora --lora-rank 64 \ --lora-target "ffn\.(0|2)\.weight|self_attn\.(v|o)\.weight" \ -o "${1%%.safetensors}-nvfp4.safetensors" ``` A rank-64 LoRA is also generated that can be used to minimise the effects of the resulting quantization. The table below details the quantization format applied per layer type across block ranges: | **Layer** | **BF16** | **MXFP8** | **NVFP4** | |:----:|:-------:|:--------:|:--------:| | `cross_attn.k` | 7.5% | 17.5% | 75.0% | | `cross_attn.k_img` | — | — | **100%** | | `cross_attn.norm_k` | **100%** | — | — | | `cross_attn.norm_k_img` | **100%** | — | — | | `cross_attn.norm_q` | **100%** | — | — | | `cross_attn.o` | 20.0% | 5.0% | 75.0% | | `cross_attn.q` | 2.5% | 2.5% | 95.0% | | `cross_attn.v` | — | 10.0% | 90.0% | | `cross_attn.v_img` | — | 10.0% | 90.0% | | `ffn.0` | — | 20.0% | 80.0% | | `ffn.2` | — | — | **100%** | | `norm3` | **100%** | — | — | | `self_attn.k` | 10.0% | 17.5% | 72.5% | | `self_attn.norm_k` | **100%** | — | — | | `self_attn.norm_q` | **100%** | — | — | | `self_attn.o` | — | — | **100%** | | `self_attn.q` | 7.5% | 2.5% | 90.0% | | `self_attn.v` | — | — | **100%** | | **Total** | **36.0%** | **4.7%** | **59.3%** | ## Inference The model can be used in ComfyUI with the following parameters, based on the distilled model's own recommendations: | Parameter | Value | |-----------|-------| | Shift | 5.0 | | Sampler | LCM | | Scheduler | normal | | CFG | 1.0 | | Steps | 4 | The combinations euler/simple and heun/linear_quadratic (sampler/scheduler) are also known to produce good results. The model is designed to generate 81 frames and is compatible with LoRAs. Sampling completes in under 60 seconds on an RTX 5060, making it possible to produce a full 81-frame video in under two minutes; with RIFE, those 81 frames convert to a 10-second video. Abrupt camera movements or fast subject motion may produce artifacts. This is an inherent limitation of applying aggressive quantization to an already distilled model. ## License Agreement This model is licensed under the [Apache 2.0 License](LICENSE.txt). You retain full ownership of your generated content, but are solely responsible for its use in compliance with the license terms and applicable laws. ## Acknowledgements Big kudos to the contributors to the [Wan2.1](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) and [Self-Forcing](https://huggingface.co/gdhe17/Self-Forcing/tree/main) repositories for their open research, and to [silveroxides](https://huggingface.co/silveroxides) for their quantization tools.