How to use from
Hermes Agent
Start the MLX server
# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "pipenetwork/GLM-5.2-MLX-nvfp4"
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default pipenetwork/GLM-5.2-MLX-nvfp4
Run Hermes
hermes
Quick Links

GLM-5.2-MLX-nvfp4

An MLX conversion of zai-org/GLM-5.2 quantized to NVFP4 (4-bit FP4, group size 16) for Apple Silicon with mlx-lm.

This is the MLX analog of NVIDIA's nvidia/GLM-5.2-NVFP4. NVIDIA's checkpoint stores weights in ModelOpt-packed NVFP4 that mlx-lm cannot read directly, so this build was produced by quantizing the bf16 base with MLX's own NVFP4 mode (--q-mode nvfp4 --q-group-size 16).

  • Base model: zai-org/GLM-5.2 (GlmMoeDsaForCausalLM, 753B total / ~40B active MoE, text-only)
  • Format: MLX, NVFP4 (4-bit FP4, group size 16)
  • Approx. size on disk: 390G
  • Converted with: mlx-lm 0.31.2

Usage

pip install -U mlx-lm
mlx_lm.generate --model pipenetwork/GLM-5.2-MLX-nvfp4 --prompt "Explain mixture-of-experts in one sentence." --max-tokens 128

License

MIT, inherited from the base model.

Downloads last month
818
Safetensors
Model size
743B params
Tensor type
U8
·
U32
·
BF16
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pipenetwork/GLM-5.2-MLX-nvfp4

Base model

zai-org/GLM-5.2
Quantized
(70)
this model

Collection including pipenetwork/GLM-5.2-MLX-nvfp4