SC117 commited on
Commit
e93eb5c
·
verified ·
1 Parent(s): 68b12b8

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +17 -17
  2. README_zh.md +17 -17
README.md CHANGED
@@ -65,33 +65,33 @@ base_model:
65
  <div style="padding: 16px; font-size: 13px; color: #334155; line-height: 1.7;">
66
  <p style="margin: 0 0 12px 0;">The released <a href="https://huggingface.co/InternScience/Agents-A1" target="_blank" style="color: #047857; text-decoration: none; font-weight: 700;">InternScience/Agents-A1</a> checkpoint is a <b>40-layer Qwen3.5-35B-A3B MoE</b> without MTP (Multi-Token Prediction) layers. To enable MTP acceleration in llama.cpp (which speeds up long-context generation by 10–30%), we <b>extract the 1 MTP layer from Qwen3.5-35B-A3B</b> and inject it into Agents-A1's safetensors before GGUF conversion.</p>
67
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 1 — Extract MTP tensors from Qwen3.5-35B-A3B</p>
68
- <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;"># Source: J:\Models\Qwen3.5-35B-A3B-MTP (Qwen3.5-35B-A3B + native MTP)
69
  from safetensors import safe_open
70
  import json, os
71
-
72
  src = r"J:\Models\Qwen3.5-35B-A3B-MTP"
73
  with open(os.path.join(src, "model.safetensors.index.json")) as f:
74
  idx = json.load(f)
75
  mtp_keys = [k for k in idx["weight_map"] if "mtp" in k.lower()]
76
  print(f"Found {len(mtp_keys)} MTP tensors") # 785</pre>
77
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 2 — Add as a new safetensors shard (N+1)</p>
78
- <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;"># Save 785 MTP tensors as a new shard
79
  new_shard = "model.safetensors-15-of-15.safetensors"
80
  save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
81
-
82
- # Update model.safetensors.index.json:
83
- # - metadata.total_size += new_shard_size
84
- # - weight_map: append new_shard path for each MTP key
85
- # - DO NOT modify existing 14 shards (avoid touching original data)</pre>
86
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 3 — Convert HF → BF16 GGUF with master llama.cpp</p>
87
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\llama.cpp-master\convert_hf_to_gguf.py ^
88
  J:\Models\Agents-A1 ^
89
  --outfile J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
90
  --outtype f16
91
-
92
- # Master version handles Qwen3.5MoE with MTP auto:
93
- # - Normal layers: blk.0–39
94
- # - MTP layer: blk.40.nextn.* (785 tensors)</pre>
95
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 4 — Quantize with APEX (Q4_K_M default, MTP at Q8_0)</p>
96
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\...\llama-quantize.exe ^
97
  --imatrix J:\Models\Qwen3.5-35B-A3B.imatrix.gguf ^
@@ -99,9 +99,9 @@ save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
99
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
100
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-APEX-I-&lt;tier&gt;.gguf ^
101
  Q4_K_M
102
-
103
- # APEX qwen36_35b_mtp_*.txt configs include blk.40 overrides
104
- # (Q8_0 for MTP across all tiers) — no manual patching needed.</pre>
105
  </div>
106
  </div>
107
 
@@ -126,8 +126,8 @@ save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
126
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">./llama-server -m ./models/Agents-A1-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072</pre>
127
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">vLLM</p>
128
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3
129
-
130
- # Tool-call variant
131
  vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder</pre>
132
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">SGLang</p>
133
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">python3 -m sglang.launch_server --model-path "SC117/Agents-A1-MTP-APEX-GGUF" --host 0.0.0.0 --port 30000</pre>
 
65
  <div style="padding: 16px; font-size: 13px; color: #334155; line-height: 1.7;">
66
  <p style="margin: 0 0 12px 0;">The released <a href="https://huggingface.co/InternScience/Agents-A1" target="_blank" style="color: #047857; text-decoration: none; font-weight: 700;">InternScience/Agents-A1</a> checkpoint is a <b>40-layer Qwen3.5-35B-A3B MoE</b> without MTP (Multi-Token Prediction) layers. To enable MTP acceleration in llama.cpp (which speeds up long-context generation by 10–30%), we <b>extract the 1 MTP layer from Qwen3.5-35B-A3B</b> and inject it into Agents-A1's safetensors before GGUF conversion.</p>
67
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 1 — Extract MTP tensors from Qwen3.5-35B-A3B</p>
68
+ <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">Source: J:\Models\Qwen3.5-35B-A3B-MTP (Qwen3.5-35B-A3B + native MTP)
69
  from safetensors import safe_open
70
  import json, os
71
+ ·
72
  src = r"J:\Models\Qwen3.5-35B-A3B-MTP"
73
  with open(os.path.join(src, "model.safetensors.index.json")) as f:
74
  idx = json.load(f)
75
  mtp_keys = [k for k in idx["weight_map"] if "mtp" in k.lower()]
76
  print(f"Found {len(mtp_keys)} MTP tensors") # 785</pre>
77
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 2 — Add as a new safetensors shard (N+1)</p>
78
+ <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">Save 785 MTP tensors as a new shard
79
  new_shard = "model.safetensors-15-of-15.safetensors"
80
  save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
81
+ ·
82
+ Update model.safetensors.index.json:
83
+ · metadata.total_size += new_shard_size
84
+ · weight_map: append new_shard path for each MTP key
85
+ · DO NOT modify existing 14 shards (avoid touching original data)</pre>
86
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 3 — Convert HF → BF16 GGUF with master llama.cpp</p>
87
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\llama.cpp-master\convert_hf_to_gguf.py ^
88
  J:\Models\Agents-A1 ^
89
  --outfile J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
90
  --outtype f16
91
+ ·
92
+ Master version handles Qwen3.5MoE with MTP auto:
93
+ · Normal layers: blk.0–39
94
+ · MTP layer: blk.40.nextn.* (785 tensors)</pre>
95
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">Step 4 — Quantize with APEX (Q4_K_M default, MTP at Q8_0)</p>
96
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\...\llama-quantize.exe ^
97
  --imatrix J:\Models\Qwen3.5-35B-A3B.imatrix.gguf ^
 
99
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
100
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-APEX-I-&lt;tier&gt;.gguf ^
101
  Q4_K_M
102
+ ·
103
+ APEX qwen36_35b_mtp_*.txt configs include blk.40 overrides
104
+ (Q8_0 for MTP across all tiers) — no manual patching needed.</pre>
105
  </div>
106
  </div>
107
 
 
126
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">./llama-server -m ./models/Agents-A1-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072</pre>
127
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">vLLM</p>
128
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3
129
+ ·
130
+ Tool-call variant
131
  vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder</pre>
132
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">SGLang</p>
133
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">python3 -m sglang.launch_server --model-path "SC117/Agents-A1-MTP-APEX-GGUF" --host 0.0.0.0 --port 30000</pre>
README_zh.md CHANGED
@@ -65,33 +65,33 @@ base_model:
65
  <div style="padding: 16px; font-size: 13px; color: #334155; line-height: 1.7;">
66
  <p style="margin: 0 0 12px 0;">官方发布的 <a href="https://huggingface.co/InternScience/Agents-A1" target="_blank" style="color: #047857; text-decoration: none; font-weight: 700;">InternScience/Agents-A1</a> checkpoint 是一个 <b>40 层 Qwen3.5-35B-A3B MoE</b>,不包含 MTP(Multi-Token Prediction)层。为了在 llama.cpp 中启用 MTP 加速(长上下文生成提速 10–30%),我们 <b>从 Qwen3.5-35B-A3B 中提取 1 层 MTP</b>,注入到 Agents-A1 的 safetensors 中,再转 GGUF。</p>
67
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 1 — 从 Qwen3.5-35B-A3B 提取 MTP tensor</p>
68
- <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;"># 源:J:\Models\Qwen3.5-35B-A3B-MTP(Qwen3.5-35B-A3B + 原生 MTP)
69
  from safetensors import safe_open
70
  import json, os
71
-
72
  src = r"J:\Models\Qwen3.5-35B-A3B-MTP"
73
  with open(os.path.join(src, "model.safetensors.index.json")) as f:
74
  idx = json.load(f)
75
  mtp_keys = [k for k in idx["weight_map"] if "mtp" in k.lower()]
76
  print(f"Found {len(mtp_keys)} MTP tensors") # 785</pre>
77
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 2 — 作为新分片(N+1)追加</p>
78
- <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;"># 把 785 个 MTP tensor 保存为新分片
79
  new_shard = "model.safetensors-15-of-15.safetensors"
80
  save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
81
-
82
- # 更新 model.safetensors.index.json:
83
- # - metadata.total_size += 新分片大小
84
- # - weight_map: 为每个 MTP key 追加新分片路径
85
- # - 不修改原 14 个分片(避免触碰原始数据)</pre>
86
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 3 — 用 master llama.cpp 转 BF16 GGUF</p>
87
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\llama.cpp-master\convert_hf_to_gguf.py ^
88
  J:\Models\Agents-A1 ^
89
  --outfile J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
90
  --outtype f16
91
-
92
- # master 版本自动处理 Qwen3.5MoE + MTP:
93
- # - 常规层:blk.0–39
94
- # - MTP 层:blk.40.nextn.* (785 个 tensor)</pre>
95
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 4 — 用 APEX 量化(Q4_K_M 默认,MTP 用 Q8_0)</p>
96
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\...\llama-quantize.exe ^
97
  --imatrix J:\Models\Qwen3.5-35B-A3B.imatrix.gguf ^
@@ -99,9 +99,9 @@ save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
99
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
100
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-APEX-I-&lt;档位&gt;.gguf ^
101
  Q4_K_M
102
-
103
- # APEX qwen36_35b_mtp_*.txt 配置已包含 blk.40 override
104
- # (所有档位 MTP 用 Q8_0)—— 无需手动 patch。</pre>
105
  </div>
106
  </div>
107
 
@@ -126,8 +126,8 @@ save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
126
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">./llama-server -m ./models/Agents-A1-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072</pre>
127
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">vLLM</p>
128
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3
129
-
130
- # 工具调用变体
131
  vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder</pre>
132
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">SGLang</p>
133
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">python3 -m sglang.launch_server --model-path "SC117/Agents-A1-MTP-APEX-GGUF" --host 0.0.0.0 --port 30000</pre>
 
65
  <div style="padding: 16px; font-size: 13px; color: #334155; line-height: 1.7;">
66
  <p style="margin: 0 0 12px 0;">官方发布的 <a href="https://huggingface.co/InternScience/Agents-A1" target="_blank" style="color: #047857; text-decoration: none; font-weight: 700;">InternScience/Agents-A1</a> checkpoint 是一个 <b>40 层 Qwen3.5-35B-A3B MoE</b>,不包含 MTP(Multi-Token Prediction)层。为了在 llama.cpp 中启用 MTP 加速(长上下文生成提速 10–30%),我们 <b>从 Qwen3.5-35B-A3B 中提取 1 层 MTP</b>,注入到 Agents-A1 的 safetensors 中,再转 GGUF。</p>
67
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 1 — 从 Qwen3.5-35B-A3B 提取 MTP tensor</p>
68
+ <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">源:J:\Models\Qwen3.5-35B-A3B-MTP(Qwen3.5-35B-A3B + 原生 MTP)
69
  from safetensors import safe_open
70
  import json, os
71
+ ·
72
  src = r"J:\Models\Qwen3.5-35B-A3B-MTP"
73
  with open(os.path.join(src, "model.safetensors.index.json")) as f:
74
  idx = json.load(f)
75
  mtp_keys = [k for k in idx["weight_map"] if "mtp" in k.lower()]
76
  print(f"Found {len(mtp_keys)} MTP tensors") # 785</pre>
77
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 2 — 作为新分片(N+1)追加</p>
78
+ <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">把 785 个 MTP tensor 保存为新分片
79
  new_shard = "model.safetensors-15-of-15.safetensors"
80
  save_file({k: get_tensor(k) for k in mtp_keys}, new_shard)
81
+ ·
82
+ 更新 model.safetensors.index.json:
83
+ · metadata.total_size += 新分片大小
84
+ · weight_map: 为每个 MTP key 追加新分片路径
85
+ · 不修改原 14 个分片(避免触碰原始数据)</pre>
86
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 3 — 用 master llama.cpp 转 BF16 GGUF</p>
87
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\llama.cpp-master\convert_hf_to_gguf.py ^
88
  J:\Models\Agents-A1 ^
89
  --outfile J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
90
  --outtype f16
91
+ ·
92
+ master 版本自动处理 Qwen3.5MoE + MTP:
93
+ · 常规层:blk.0–39
94
+ · MTP 层:blk.40.nextn.* (785 个 tensor)</pre>
95
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">步骤 4 — 用 APEX 量化(Q4_K_M 默认,MTP 用 Q8_0)</p>
96
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre;">F:\llama.cpp\...\llama-quantize.exe ^
97
  --imatrix J:\Models\Qwen3.5-35B-A3B.imatrix.gguf ^
 
99
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-BF16.gguf ^
100
  J:\Models\Agents-A1-MTP-GGUF\Agents-A1-MTP-APEX-I-&lt;档位&gt;.gguf ^
101
  Q4_K_M
102
+ ·
103
+ APEX qwen36_35b_mtp_*.txt 配置已包含 blk.40 override
104
+ (所有档位 MTP 用 Q8_0)—— 无需手动 patch。</pre>
105
  </div>
106
  </div>
107
 
 
126
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">./llama-server -m ./models/Agents-A1-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072</pre>
127
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">vLLM</p>
128
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3
129
+ ·
130
+ 工具调用变体
131
  vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder</pre>
132
  <p style="margin: 0 0 8px 0; font-weight: bold; color: #064e3b;">SGLang</p>
133
  <pre style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;">python3 -m sglang.launch_server --model-path "SC117/Agents-A1-MTP-APEX-GGUF" --host 0.0.0.0 --port 30000</pre>