InsecureErasure commited on
Commit
69be876
·
verified ·
1 Parent(s): 70811ef

Update README.md

Browse files

Expand quantization section with layer assignment table, sensitivity analysis methodology, and convert_to_quant parameters. Revise overview, license, and acknowledgements.

Files changed (1) hide show
  1. README.md +25 -5
README.md CHANGED
@@ -27,9 +27,9 @@ base_model_relation: quantized
27
  <p>
28
 
29
  ## Overview
30
- This is a **partial NVFP4 quantization** of [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) by lightx2v.
31
 
32
- Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v is an advanced image-to-video generation model built upon the Wan2.1-I2V-14B-480P foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 steps) and without classifier-free guidance, substantially reducing video generation time while maintaining quality.
33
 
34
  <div style="display: flex; align-items: center; gap: 16px;">
35
  <img src="assets/wan21_input_cat.png" width="45%"/>
@@ -38,10 +38,30 @@ Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v is an advanced image-to-vide
38
  </div>
39
 
40
  ## Quantization
41
- The model weights have been partially quantized to **NVFP4** (NVIDIA Floating Point 4-bit), a quantization format supported on NVIDIA Blackwell architecture GPUs. Only a subset of the model layers have been quantized; the remaining layers are kept at their original precision to preserve output quality.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## License Agreement
44
- The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the [license](LICENSE.txt).
45
 
46
  ## Acknowledgements
47
- Many thanks to the contributors to the [Wan2.1](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B), [Self-Forcing](https://huggingface.co/gdhe17/Self-Forcing/tree/main) repositories, for their open research.
 
27
  <p>
28
 
29
  ## Overview
30
+ This is a **partial NVFP4 quantization** of [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) by lightx2v, produced using [convert_to_quant](https://github.com/silveroxides/convert_to_quant) by [silveroxides](https://huggingface.co/silveroxides).
31
 
32
+ [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) is an image-to-video generation model built on [Wan2.1-I2V-14B-480P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P). It applies step distillation and classifier-free guidance distillation to reduce inference to **4 steps** without CFG, cutting generation time substantially while preserving output quality.
33
 
34
  <div style="display: flex; align-items: center; gap: 16px;">
35
  <img src="assets/wan21_input_cat.png" width="45%"/>
 
38
  </div>
39
 
40
  ## Quantization
41
+ The model weights have been partially quantized to **NVFP4** (NVIDIA Floating Point 4-bit), a quantization format supported on NVIDIA Blackwell architecture GPUs. Out of the 480 layers eligible for quantization, only a subset has been quantized to NVFP4; the remaining eligible layers are quantized to **FP8** to preserve output quality.
42
+
43
+ The quantization format assigned to each layer is based on a sensitivity analysis performed with a custom script, which scores each weight tensor using excess kurtosis, dynamic range, and aspect ratio. Thresholds are derived automatically from the model's own score distribution.
44
+
45
+ The analysis yields the following `convert_to_quant` parameters. This conversion takes about 140 minutes on an RTX 5060 resulting in a 11.11 GiB safetensors file.
46
+ ```bash
47
+ $ convert_to_quant \
48
+ -i Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-bf16.safetensors \
49
+ --nvfp4 --wan --comfy_quant --save-quant-metadata \
50
+ --custom-layers "blocks\.(0|1|2|3)\.cross_attn\.k\.weight|blocks\.(0|1|2|3)\.cross_attn\.v\.weight|blocks\.(0|1|2|3)\.cross_attn\.q\.weight|blocks\.(0|1|2|3)\.cross_attn\.o\.weight|blocks\.(0|1|2|3)\.cross_attn\.v_img\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.cross_attn\.v_img\.weight|blocks\.(0|1|2|3)\.self_attn\.k\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.self_attn\.k\.weight|blocks\.(36|37|38|39)\.self_attn\.k\.weight|blocks\.(0|1|2|3)\.self_attn\.v\.weight|blocks\.(0|1|2|3)\.self_attn\.o\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.self_attn\.o\.weight|blocks\.(0|1|2|3)\.ffn\.0\.weight|blocks\.(36|37|38|39)\.ffn\.0\.weight|blocks\.(4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35)\.ffn\.2\.weight" \
51
+ --custom-type fp8 \
52
+ -o Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-nvfp4_p68.safetensors
53
+ ```
54
+ The table below details the quantization format applied per layer type across block ranges:
55
+ | Blocks | self_attn.q | self_attn.k | self_attn.v | self_attn.o | cross_attn.q | cross_attn.k | cross_attn.v | cross_attn.o | cross_attn.k_img | cross_attn.v_img | ffn.0 | ffn.2 |
56
+ |--------|-------------|-------------|-------------|-------------|--------------|--------------|--------------|--------------|------------------|------------------|-------|-------|
57
+ | 0–3 | NVFP4 | FP8 | FP8 | FP8 | FP8 | FP8 | FP8 | FP8 | NVFP4 | FP8 | NVFP4 | NVFP4 |
58
+ | 4–9 | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (50/50) | NVFP4 | FP8 | FP8 | NVFP4 | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (50/50) |
59
+ | 10–15 | NVFP4 | FP8 | NVFP4 | NVFP4 | NVFP4 | FP8 | FP8/NVFP4 (50/50) | NVFP4 | NVFP4 | FP8 | FP8/NVFP4 (50/50) | FP8/NVFP4 (67/33) |
60
+ | 16–22 | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (29/71) | NVFP4 | FP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | FP8/NVFP4 (57/43) | FP8/NVFP4 (43/57) |
61
+ | 23–39 | NVFP4 | FP8 | NVFP4 | FP8/NVFP4 (12/88) | NVFP4 | FP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | FP8/NVFP4 (35/65) | NVFP4 |
62
 
63
  ## License Agreement
64
+ This model is licensed under the [Apache 2.0 License](LICENSE.txt). You retain full ownership of your generated content, but are solely responsible for its use in compliance with the license terms and applicable laws.
65
 
66
  ## Acknowledgements
67
+ Big kudos to the contributors to the [Wan2.1](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) and [Self-Forcing](https://huggingface.co/gdhe17/Self-Forcing/tree/main) repositories for their open research, and to [silveroxides](https://huggingface.co/silveroxides) for their quantization tools.