mratsim commited on
Commit
32960f6
·
verified ·
1 Parent(s): 5d7edf2

Add warning about KL-div measurement with only 10 rows of 2048 tokens

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -55,6 +55,11 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
55
  The most appropriate measure for quality is KL-divergence (i.e. how well the quant reproduces the original probability distribution of token output, before samplers)\
56
  For example the 3-bit quant have lower perplexity than the original FP16.\
57
 
 
 
 
 
 
58
  | Quant | Size | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
59
  | ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
60
  | [2bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2bpw_H6) | 83 GiB | 0.65096196 | 0.75914080 | 9.36106675 | 0.7315 | 0.3852 | 0.1653 | 0.0628 | 0.0221 |
@@ -67,6 +72,11 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
67
 
68
  ### Optimized Quants
69
 
 
 
 
 
 
70
  > [!TIP]
71
  > 🛈 Despite the KL-divergence, even the 2.10bpw quant looks quite smart for creative writing.\
72
  > Succinct test on a scenario with 1 narrator and 6 leads.
 
55
  The most appropriate measure for quality is KL-divergence (i.e. how well the quant reproduces the original probability distribution of token output, before samplers)\
56
  For example the 3-bit quant have lower perplexity than the original FP16.\
57
 
58
+ > [!NOTE]
59
+ > For speed, this was measured with only 10 lines of 2048 tokens from wikitext2.
60
+ > The default is 100 lines, and according to my benchmarks for [Qwen3.5-397B](https://huggingface.co/mratsim/Qwen3.5-397B-A17B-EXL3)
61
+ > the KL-div can be much lower with 100. If you compare this to other quants, make sure you use the same number of rows.
62
+
63
  | Quant | Size | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
64
  | ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
65
  | [2bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2bpw_H6) | 83 GiB | 0.65096196 | 0.75914080 | 9.36106675 | 0.7315 | 0.3852 | 0.1653 | 0.0628 | 0.0221 |
 
72
 
73
  ### Optimized Quants
74
 
75
+ > [!NOTE]
76
+ > For speed, this was measured with only 10 lines of 2048 tokens from wikitext2.
77
+ > The default is 100 lines, and according to my benchmarks for [Qwen3.5-397B](https://huggingface.co/mratsim/Qwen3.5-397B-A17B-EXL3)
78
+ > the KL-div can be much lower with 100. If you compare this to other quants, make sure you use the same number of rows.
79
+
80
  > [!TIP]
81
  > 🛈 Despite the KL-divergence, even the 2.10bpw quant looks quite smart for creative writing.\
82
  > Succinct test on a scenario with 1 narrator and 6 leads.