---
license: mit
library_name: mlx
pipeline_tag: text-generation
language:
- en
base_model: zai-org/GLM-5.2
base_model_relation: quantized
tags:
- mlx
- glm_moe_dsa
- moe
- nvfp4
---

# GLM-5.2-MLX-nvfp4

An **MLX** conversion of [zai-org/GLM-5.2](https://huggingface.co/zai-org/GLM-5.2) quantized to **NVFP4** (4-bit FP4, group size 16) for Apple Silicon with [mlx-lm](https://github.com/ml-explore/mlx-lm).

This is the MLX analog of NVIDIA's [nvidia/GLM-5.2-NVFP4](https://huggingface.co/nvidia/GLM-5.2-NVFP4). NVIDIA's checkpoint stores weights in ModelOpt-packed NVFP4 that mlx-lm cannot read directly, so this build was produced by quantizing the **bf16 base** with MLX's own NVFP4 mode (`--q-mode nvfp4 --q-group-size 16`).

- **Base model:** [zai-org/GLM-5.2](https://huggingface.co/zai-org/GLM-5.2) (`GlmMoeDsaForCausalLM`, 753B total / ~40B active MoE, text-only)
- **Format:** MLX, NVFP4 (4-bit FP4, group size 16)
- **Approx. size on disk:** 390G
- **Converted with:** mlx-lm 0.31.2

## Usage

```bash
pip install -U mlx-lm
mlx_lm.generate --model pipenetwork/GLM-5.2-MLX-nvfp4 --prompt "Explain mixture-of-experts in one sentence." --max-tokens 128
```

## License

MIT, inherited from the base model.