GLM-5.1-GGUF / README.md
AesSedai's picture
Create README.md
a734c0d verified
|
Raw
History Blame
1.3 kB
metadata
base_model:
  - zai-org/GLM-5.1

Notes

  • 05/02/26: IQ3_S quant coming a bit later, quantization crashed so need to redo it.

Model

This repo contains specialized MoE-quants for zai-org/GLM-5.1. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.

Quant Size Mixture PPL 1-(Mean PPL(Q)/PPL(base)) KLD
Q5_K_M 520.08 GiB (5.93 BPW) Q8_0 / Q5_K / Q5_K / Q6_K 2.732420 ± 0.015015 +0.3411% 0.020247 ± 0.000173
Q4_K_M 432.80 GiB (4.93 BPW) Q8_0 / Q4_K / Q4_K / Q5_K 2.754593 ± 0.015142 +1.1553% 0.037406 ± 0.000308
IQ4_XS 336.61 GiB (3.84 BPW) Q8_0 / IQ3_S / IQ3_S / IQ4_XS 2.892748 ± 0.015981 +6.2287% 0.099818 ± 0.000754

kld_graph ppl_graph