CPU-Hybrid-MoE
/

GLM-5-CPU-NUMA4-AMXINT8

Text Generation

Model card Files Files and versions

Doctor-Shotgun commited on Apr 13

Commit

1c7b1f0

·

verified ·

1 Parent(s): 006207a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ python -m sglang.launch_server \
 ```
 ## Notes:
-- GLM-5 requires at least `transformers` 5.2.0, which is not the default version pinned by `sglang-kt` at the time of writing
 - Note that DSA (DeepSeek Sparse Attention) is not currently supported on non-enterprise GPU architectures, so attention will fall back to standard MLA with the specified `--attention-backend`
 - `--kt-cpuinfer` should be set to the total number of physical CPU cores across all NUMA nodes
 - `--tensor-parallel-size 1` should be set to the number of GPUs

 ```
 ## Notes:
+- `GlmMoeDsaForCausalLM` requires at least `transformers` 5.2.0, which is not the default version pinned by `sglang-kt` at the time of writing
 - Note that DSA (DeepSeek Sparse Attention) is not currently supported on non-enterprise GPU architectures, so attention will fall back to standard MLA with the specified `--attention-backend`
 - `--kt-cpuinfer` should be set to the total number of physical CPU cores across all NUMA nodes
 - `--tensor-parallel-size 1` should be set to the number of GPUs