---
base_model:
- zai-org/GLM-5.1
---
## Notes
- 05/02/26: IQ3_S quant coming a bit later, quantization crashed so need to redo it.

## Model
This repo contains specialized MoE-quants for zai-org/GLM-5.1. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.

| Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD |
| :--------- | :--------- | :------- | :------- | :------- | :------- |
| Q5_K_M | 520.08 GiB (5.93 BPW) | Q8_0 / Q5_K / Q5_K / Q6_K | 2.732420 ± 0.015015 | +0.3411% | 0.020247 ± 0.000173 |
| Q4_K_M | 432.80 GiB (4.93 BPW) | Q8_0 / Q4_K / Q4_K / Q5_K | 2.754593 ± 0.015142 | +1.1553% | 0.037406 ± 0.000308 |
| IQ4_XS | 336.61 GiB (3.84 BPW) | Q8_0 / IQ3_S / IQ3_S / IQ4_XS | 2.892748 ± 0.015981 | +6.2287% | 0.099818 ± 0.000754 |

![kld_graph](kld_data/01_kld_vs_filesize.png "Chart showing Pareto KLD analysis of quants")
![ppl_graph](kld_data/02_ppl_vs_filesize.png "Chart showing Pareto PPL analysis of quants")