Brooooooklyn commited on
Commit
c3cd551
·
verified ·
1 Parent(s): f60dfb8

Add Unsloth GGUF comparison table to README

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -96,6 +96,17 @@ Based on Unsloth's per-tensor 99.9% KLD analysis (sorted by sensitivity, worst
96
 
97
  AWQ-correctable projections (q/k/v, in_proj_qkv/z) are quantized at 5-bit with imatrix AWQ pre-scaling via `input_layernorm`. Non-AWQ-correctable projections (o_proj, out_proj) are kept at bf16 — their inputs come from attention/GDN computation, not from a norm layer, so AWQ cannot be applied. imatrix is **required** for the unsloth recipe.
98
 
 
 
 
 
 
 
 
 
 
 
 
99
  ## Architecture
100
 
101
  Qwen3.5-9B is a decoder-only transformer with a hybrid attention design:
 
96
 
97
  AWQ-correctable projections (q/k/v, in_proj_qkv/z) are quantized at 5-bit with imatrix AWQ pre-scaling via `input_layernorm`. Non-AWQ-correctable projections (o_proj, out_proj) are kept at bf16 — their inputs come from attention/GDN computation, not from a norm layer, so AWQ cannot be applied. imatrix is **required** for the unsloth recipe.
98
 
99
+ ### Comparison with Unsloth GGUF (UD-Q3_K_XL)
100
+
101
+ | Tensor | Unsloth UD-Q3_K_XL | Ours | Gap |
102
+ |---|---|---|---|
103
+ | attn q/k/v | Q5_K + imatrix | 5-bit affine + AWQ | Small (AWQ compensates) |
104
+ | in_proj_qkv/z | Q5_K + imatrix | 5-bit affine + AWQ | Small |
105
+ | o_proj | Q5_K + imatrix | bf16 | We're larger but lossless |
106
+ | out_proj | Q5_K + imatrix | bf16 | We're larger but lossless |
107
+ | FFN gate/up | Q3_K + imatrix | 3-bit affine + AWQ | Moderate (K-quant > affine at 3-bit) |
108
+ | FFN down | Q4_K + imatrix | 4-bit affine + AWQ | Small |
109
+
110
  ## Architecture
111
 
112
  Qwen3.5-9B is a decoder-only transformer with a hybrid attention design: