Update README.md
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ python -m sglang.launch_server \
|
|
| 43 |
```
|
| 44 |
|
| 45 |
## Notes:
|
| 46 |
-
-
|
| 47 |
- Note that DSA (DeepSeek Sparse Attention) is not currently supported on non-enterprise GPU architectures, so attention will fall back to standard MLA with the specified `--attention-backend`
|
| 48 |
- `--kt-cpuinfer` should be set to the total number of physical CPU cores across all NUMA nodes
|
| 49 |
- `--tensor-parallel-size 1` should be set to the number of GPUs
|
|
|
|
| 43 |
```
|
| 44 |
|
| 45 |
## Notes:
|
| 46 |
+
- `GlmMoeDsaForCausalLM` requires at least `transformers` 5.2.0, which is not the default version pinned by `sglang-kt` at the time of writing
|
| 47 |
- Note that DSA (DeepSeek Sparse Attention) is not currently supported on non-enterprise GPU architectures, so attention will fall back to standard MLA with the specified `--attention-backend`
|
| 48 |
- `--kt-cpuinfer` should be set to the total number of physical CPU cores across all NUMA nodes
|
| 49 |
- `--tensor-parallel-size 1` should be set to the number of GPUs
|