Duplicated from lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v

InsecureErasure
/

Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-NVFP4

Diffusion Single File

video generation

Model card Files Files and versions

InsecureErasure commited on 25 days ago

Commit

d97fcc8

·

verified ·

1 Parent(s): f4d4916

Update README.md

Files changed (1) hide show

README.md +22 -14

README.md CHANGED Viewed

@@ -63,20 +63,28 @@ $ convert_to_quant -i "${1}" \
 A rank-64 LoRA is also generated that can be used to minimise the effects of the resulting quantization.
 The table below details the quantization format applied per layer type across block ranges:
-| Layer | 0–3 | 4–9 | 10–15 | 16–22 | 23–29 | 30–35 | 36–39 |
-|-------|-----|-----|-------|-------|-------|-------|-------|
-| self_attn.q | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| self_attn.k | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| self_attn.v | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| self_attn.o | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| cross_attn.q | BF16 (25%) / NVFP4 (75%) | MXFP8 (17%) / NVFP4 (83%) | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| cross_attn.k | MXFP8 (75%) / NVFP4 (25%) | BF16 (50%) / MXFP8 (50%) | MXFP8 (17%) / NVFP4 (83%) | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| cross_attn.v | MXFP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| cross_attn.o | NVFP4 | BF16 (50%) / MXFP8 (17%) / NVFP4 (33%) | BF16 (50%) / MXFP8 (17%) / NVFP4 (33%) | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| cross_attn.k_img | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| cross_attn.v_img | MXFP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
-| ffn.0 | MXFP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | MXFP8 |
-| ffn.2 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
 ## Inference
 The model can be used in ComfyUI with the following parameters, based on the distilled model's own recommendations:

 A rank-64 LoRA is also generated that can be used to minimise the effects of the resulting quantization.
 The table below details the quantization format applied per layer type across block ranges:
+| **Layer** | **BF16** | **MXFP8** | **NVFP4** |
+|:----:|:-------:|:--------:|:--------:|
+| `cross_attn.k` | 3.3% | 15.2% | 81.5% |
+| `cross_attn.k_img` | — | — | **100%** |
+| `cross_attn.o` | 9.2% | 4.6% | 86.2% |
+| `cross_attn.q` | 1.0% | 2.0% | 96.9% |
+| `cross_attn.v` | — | 8.2% | 91.8% |
+| `cross_attn.v_img` | — | 8.2% | 91.8% |
+| `ffn.0` | — | 16.7% | 83.3% |
+| `ffn.2` | — | — | **100%** |
+| `self_attn.k` | 4.4% | 15.5% | 80.1% |
+| `self_attn.o` | — | — | **100%** |
+| `self_attn.q` | 3.2% | 2.1% | 94.7% |
+| `self_attn.v` | — | — | **100%** |
+| *(block biases)* | **100%** | — | — |
+| `cross_attn.norm_k` | **100%** | — | — |
+| `cross_attn.norm_k_img` | **100%** | — | — |
+| `cross_attn.norm_q` | **100%** | — | — |
+| `norm3` | **100%** | — | — |
+| `self_attn.norm_k` | **100%** | — | — |
+| `self_attn.norm_q` | **100%** | — | — |
+| **Total** | **13.6%** | **5.2%** | **81.2%** |
 ## Inference
 The model can be used in ComfyUI with the following parameters, based on the distilled model's own recommendations: