---
pipeline_tag: text-generation
base_model:
- zai-org/GLM-4.7-Flash
---
This is an experimental MXFP4\_MOE quantization of the model [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash).

I have created an importance-aware MXFP4\_MOE quantization that dynamically allocates precision based on tensor importance scores from an imatrix I created with [code_tiny](https://huggingface.co/datasets/eaddario/imatrix-calibration/blob/main/code_tiny.parquet).  
This is a coding optimized quantization and is slightly larger than the mainline MXFP4\_MOE, and the way it works is that it keeps a better quantization depending on the importance of each tensor.  

![Quantization Types](quantization_types.png)

- BF16 (16-bit) for highly important tensors (>75% importance)
- Q8_0 (8-bit) for moderately important tensors (>60% importance)
- MXFP4 (4-bit) for less important tensors (<50% importance)

![Quantization per Layer Count](quant_layer_count.png)

As I've mentioned it is experimental, and still not have done any benchmark on it, to see if it's any better than mainline, but you are freely to try it out and report back!