pipenetwork commited on
Commit
144d946
·
verified ·
1 Parent(s): 8c620f4

Add model card

Browse files
Files changed (1) hide show
  1. README.md +32 -3
README.md CHANGED
@@ -1,7 +1,36 @@
1
  ---
2
- language: en
 
 
 
 
 
 
3
  tags:
4
  - mlx
5
- pipeline_tag: text-generation
6
- library_name: mlx
 
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ library_name: mlx
4
+ pipeline_tag: text-generation
5
+ language:
6
+ - en
7
+ base_model: zai-org/GLM-5.2
8
+ base_model_relation: quantized
9
  tags:
10
  - mlx
11
+ - glm_moe_dsa
12
+ - moe
13
+ - nvfp4
14
  ---
15
+
16
+ # GLM-5.2-MLX-nvfp4
17
+
18
+ An **MLX** conversion of [zai-org/GLM-5.2](https://huggingface.co/zai-org/GLM-5.2) quantized to **NVFP4** (4-bit FP4, group size 16) for Apple Silicon with [mlx-lm](https://github.com/ml-explore/mlx-lm).
19
+
20
+ This is the MLX analog of NVIDIA's [nvidia/GLM-5.2-NVFP4](https://huggingface.co/nvidia/GLM-5.2-NVFP4). NVIDIA's checkpoint stores weights in ModelOpt-packed NVFP4 that mlx-lm cannot read directly, so this build was produced by quantizing the **bf16 base** with MLX's own NVFP4 mode (`--q-mode nvfp4 --q-group-size 16`).
21
+
22
+ - **Base model:** [zai-org/GLM-5.2](https://huggingface.co/zai-org/GLM-5.2) (`GlmMoeDsaForCausalLM`, 753B total / ~40B active MoE, text-only)
23
+ - **Format:** MLX, NVFP4 (4-bit FP4, group size 16)
24
+ - **Approx. size on disk:** 390G
25
+ - **Converted with:** mlx-lm 0.31.2
26
+
27
+ ## Usage
28
+
29
+ ```bash
30
+ pip install -U mlx-lm
31
+ mlx_lm.generate --model pipenetwork/GLM-5.2-MLX-nvfp4 --prompt "Explain mixture-of-experts in one sentence." --max-tokens 128
32
+ ```
33
+
34
+ ## License
35
+
36
+ MIT, inherited from the base model.