--- license: mit library_name: mlx pipeline_tag: text-generation language: - en base_model: zai-org/GLM-5.2 base_model_relation: quantized tags: - mlx - glm_moe_dsa - moe - nvfp4 --- # GLM-5.2-MLX-nvfp4 An **MLX** conversion of [zai-org/GLM-5.2](https://huggingface.co/zai-org/GLM-5.2) quantized to **NVFP4** (4-bit FP4, group size 16) for Apple Silicon with [mlx-lm](https://github.com/ml-explore/mlx-lm). This is the MLX analog of NVIDIA's [nvidia/GLM-5.2-NVFP4](https://huggingface.co/nvidia/GLM-5.2-NVFP4). NVIDIA's checkpoint stores weights in ModelOpt-packed NVFP4 that mlx-lm cannot read directly, so this build was produced by quantizing the **bf16 base** with MLX's own NVFP4 mode (`--q-mode nvfp4 --q-group-size 16`). - **Base model:** [zai-org/GLM-5.2](https://huggingface.co/zai-org/GLM-5.2) (`GlmMoeDsaForCausalLM`, 753B total / ~40B active MoE, text-only) - **Format:** MLX, NVFP4 (4-bit FP4, group size 16) - **Approx. size on disk:** 390G - **Converted with:** mlx-lm 0.31.2 ## Usage ```bash pip install -U mlx-lm mlx_lm.generate --model pipenetwork/GLM-5.2-MLX-nvfp4 --prompt "Explain mixture-of-experts in one sentence." --max-tokens 128 ``` ## License MIT, inherited from the base model.