InsecureErasure commited on
Commit
d97fcc8
Β·
verified Β·
1 Parent(s): f4d4916

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -14
README.md CHANGED
@@ -63,20 +63,28 @@ $ convert_to_quant -i "${1}" \
63
  A rank-64 LoRA is also generated that can be used to minimise the effects of the resulting quantization.
64
 
65
  The table below details the quantization format applied per layer type across block ranges:
66
- | Layer | 0–3 | 4–9 | 10–15 | 16–22 | 23–29 | 30–35 | 36–39 |
67
- |-------|-----|-----|-------|-------|-------|-------|-------|
68
- | self_attn.q | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
69
- | self_attn.k | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
70
- | self_attn.v | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
71
- | self_attn.o | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
72
- | cross_attn.q | BF16 (25%) / NVFP4 (75%) | MXFP8 (17%) / NVFP4 (83%) | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
73
- | cross_attn.k | MXFP8 (75%) / NVFP4 (25%) | BF16 (50%) / MXFP8 (50%) | MXFP8 (17%) / NVFP4 (83%) | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
74
- | cross_attn.v | MXFP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
75
- | cross_attn.o | NVFP4 | BF16 (50%) / MXFP8 (17%) / NVFP4 (33%) | BF16 (50%) / MXFP8 (17%) / NVFP4 (33%) | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
76
- | cross_attn.k_img | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
77
- | cross_attn.v_img | MXFP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
78
- | ffn.0 | MXFP8 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | MXFP8 |
79
- | ffn.2 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 | NVFP4 |
 
 
 
 
 
 
 
 
80
 
81
  ## Inference
82
  The model can be used in ComfyUI with the following parameters, based on the distilled model's own recommendations:
 
63
  A rank-64 LoRA is also generated that can be used to minimise the effects of the resulting quantization.
64
 
65
  The table below details the quantization format applied per layer type across block ranges:
66
+ | **Layer** | **BF16** | **MXFP8** | **NVFP4** |
67
+ |:----:|:-------:|:--------:|:--------:|
68
+ | `cross_attn.k` | 3.3% | 15.2% | 81.5% |
69
+ | `cross_attn.k_img` | β€” | β€” | **100%** |
70
+ | `cross_attn.o` | 9.2% | 4.6% | 86.2% |
71
+ | `cross_attn.q` | 1.0% | 2.0% | 96.9% |
72
+ | `cross_attn.v` | β€” | 8.2% | 91.8% |
73
+ | `cross_attn.v_img` | β€” | 8.2% | 91.8% |
74
+ | `ffn.0` | β€” | 16.7% | 83.3% |
75
+ | `ffn.2` | β€” | β€” | **100%** |
76
+ | `self_attn.k` | 4.4% | 15.5% | 80.1% |
77
+ | `self_attn.o` | β€” | β€” | **100%** |
78
+ | `self_attn.q` | 3.2% | 2.1% | 94.7% |
79
+ | `self_attn.v` | β€” | β€” | **100%** |
80
+ | *(block biases)* | **100%** | β€” | β€” |
81
+ | `cross_attn.norm_k` | **100%** | β€” | β€” |
82
+ | `cross_attn.norm_k_img` | **100%** | β€” | β€” |
83
+ | `cross_attn.norm_q` | **100%** | β€” | β€” |
84
+ | `norm3` | **100%** | β€” | β€” |
85
+ | `self_attn.norm_k` | **100%** | β€” | β€” |
86
+ | `self_attn.norm_q` | **100%** | β€” | β€” |
87
+ | **Total** | **13.6%** | **5.2%** | **81.2%** |
88
 
89
  ## Inference
90
  The model can be used in ComfyUI with the following parameters, based on the distilled model's own recommendations: