--- base_model: - zai-org/GLM-5.1 --- ## Notes - 05/02/26: IQ3_S quant coming a bit later, quantization crashed so need to redo it. ## Model This repo contains specialized MoE-quants for zai-org/GLM-5.1. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors. | Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD | | :--------- | :--------- | :------- | :------- | :------- | :------- | | Q5_K_M | 520.08 GiB (5.93 BPW) | Q8_0 / Q5_K / Q5_K / Q6_K | 2.732420 ± 0.015015 | +0.3411% | 0.020247 ± 0.000173 | | Q4_K_M | 432.80 GiB (4.93 BPW) | Q8_0 / Q4_K / Q4_K / Q5_K | 2.754593 ± 0.015142 | +1.1553% | 0.037406 ± 0.000308 | | IQ4_XS | 336.61 GiB (3.84 BPW) | Q8_0 / IQ3_S / IQ3_S / IQ4_XS | 2.892748 ± 0.015981 | +6.2287% | 0.099818 ± 0.000754 | ![kld_graph](kld_data/01_kld_vs_filesize.png "Chart showing Pareto KLD analysis of quants") ![ppl_graph](kld_data/02_ppl_vs_filesize.png "Chart showing Pareto PPL analysis of quants")