Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-NVFP4

Commit

69be876

verified ·

1 Parent(s): 70811ef

Update README.md

Expand quantization section with layer assignment table, sensitivity analysis methodology, and convert_to_quant parameters. Revise overview, license, and acknowledgements.

Files changed (1) hide show

README.md +25 -5

README.md CHANGED Viewed

@@ -27,9 +27,9 @@ base_model_relation: quantized
 <p>
 ## Overview
-This is a **partial NVFP4 quantization** of [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) by lightx2v.
-Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v is an advanced image-to-video generation model built upon the Wan2.1-I2V-14B-480P foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 steps) and without classifier-free guidance, substantially reducing video generation time while maintaining quality.
 <div style="display: flex; align-items: center; gap: 16px;">
   <img src="assets/wan21_input_cat.png" width="45%"/>
@@ -38,10 +38,30 @@ Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v is an advanced image-to-vide
 </div>
 ## Quantization
-The model weights have been partially quantized to **NVFP4** (NVIDIA Floating Point 4-bit), a quantization format supported on NVIDIA Blackwell architecture GPUs. Only a subset of the model layers have been quantized; the remaining layers are kept at their original precision to preserve output quality.
 ## License Agreement
-The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the [license](LICENSE.txt).
 ## Acknowledgements
-Many thanks to the contributors to the [Wan2.1](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B), [Self-Forcing](https://huggingface.co/gdhe17/Self-Forcing/tree/main) repositories, for their open research.

 <p>
 ## Overview
+This is a **partial NVFP4 quantization** of [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) by lightx2v, produced using [convert_to_quant](https://github.com/silveroxides/convert_to_quant) by [silveroxides](https://huggingface.co/silveroxides).
+[Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) is an image-to-video generation model built on [Wan2.1-I2V-14B-480P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P). It applies step distillation and classifier-free guidance distillation to reduce inference to **4 steps** without CFG, cutting generation time substantially while preserving output quality.
 <div style="display: flex; align-items: center; gap: 16px;">
   <img src="assets/wan21_input_cat.png" width="45%"/>
 </div>
 ## Quantization
+The model weights have been partially quantized to **NVFP4** (NVIDIA Floating Point 4-bit), a quantization format supported on NVIDIA Blackwell architecture GPUs. Out of the 480 layers eligible for quantization, only a subset has been quantized to NVFP4; the remaining eligible layers are quantized to **FP8** to preserve output quality.
+The quantization format assigned to each layer is based on a sensitivity analysis performed with a custom script, which scores each weight tensor using excess kurtosis, dynamic range, and aspect ratio. Thresholds are derived automatically from the model's own score distribution.
+The analysis yields the following `convert_to_quant` parameters. This conversion takes about 140 minutes on an RTX 5060 resulting in a 11.11 GiB safetensors file.
+```bash
+$ convert_to_quant \
+  -i Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-bf16.safetensors \
+  --nvfp4 --wan --comfy_quant --save-quant-metadata \
+  --custom-layers "blocks\.(0|1|2|3)\.cross_attn\.k\.weight|blocks\.(0|1|2|3)\.cross_attn\.v\.weight|blocks\.(0|1|2|3)\.cross_attn\.q\.weight|blocks\.(0|1|2|3)\.cross_attn\.o\.weight|blocks\.(0|1|2|3)\.cross_attn\.v_img\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.cross_attn\.v_img\.weight|blocks\.(0|1|2|3)\.self_attn\.k\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.self_attn\.k\.weight|blocks\.(36|37|38|39)\.self_attn\.k\.weight|blocks\.(0|1|2|3)\.self_attn\.v\.weight|blocks\.(0|1|2|3)\.self_attn\.o\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.self_attn\.o\.weight|blocks\.(0|1|2|3)\.ffn\.0\.weight|blocks\.(36|37|38|39)\.ffn\.0\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.ffn\.2\.weight" \
+  --custom-type fp8 \
+  -o Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-nvfp4_p68.safetensors
+```
+The table below details the quantization format applied per layer type across block ranges:
+| Blocks | self_attn.q | self_attn.k | self_attn.v | self_attn.o | cross_attn.q | cross_attn.k | cross_attn.v | cross_attn.o | cross_attn.k_img | cross_attn.v_img | ffn.0 | ffn.2 |
+|--------|-------------|-------------|-------------|-------------|--------------|--------------|--------------|--------------|------------------|------------------|-------|-------|
+| 0–3    | NVFP4 | FP8 | FP8 | FP8 | FP8 | FP8 | FP8 | FP8 | NVFP4 | FP8 | NVFP4 | NVFP4 |
+| 4–9    | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (50/50) | NVFP4 | FP8 | FP8 | NVFP4 | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (50/50) |
+| 10–15  | NVFP4 | FP8 | NVFP4 | NVFP4 | NVFP4 | FP8 | FP8/NVFP4 (50/50) | NVFP4 | NVFP4 | FP8 | FP8/NVFP4 (50/50) | FP8/NVFP4 (67/33) |
+| 16–22  | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (29/71) | NVFP4 | FP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | FP8/NVFP4 (57/43) | FP8/NVFP4 (43/57) |
+| 23–39  | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (12/88) | NVFP4 | FP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | FP8/NVFP4 (35/65) | NVFP4 |
 ## License Agreement
+This model is licensed under the [Apache 2.0 License](LICENSE.txt). You retain full ownership of your generated content, but are solely responsible for its use in compliance with the license terms and applicable laws.
 ## Acknowledgements
+Big kudos to the contributors to the [Wan2.1](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) and [Self-Forcing](https://huggingface.co/gdhe17/Self-Forcing/tree/main) repositories for their open research, and to [silveroxides](https://huggingface.co/silveroxides) for their quantization tools.