SC117 commited on
Commit
ed2fa3a
·
verified ·
1 Parent(s): 3e807d7

docs: complete Chinese translation

Browse files
Files changed (1) hide show
  1. README_zh.md +60 -60
README_zh.md CHANGED
@@ -28,35 +28,35 @@ license: apache-2.0
28
  <span style="background: #007aff; color: white; font-size: 11px; font-weight: 600; padding: 5px 14px; border-radius: 20px;">MTP</span><span style="background: #af52de; color: white; font-size: 11px; font-weight: 600; padding: 5px 14px; border-radius: 20px;">GGUF</span>
29
  </div>
30
  <h1 style="margin: 0 0 8px 0; font-size: 32px; font-weight: 700; color: #1d1d1f; letter-spacing: -0.5px; border: none; position: relative; z-index: 1;">QwenPaw-Flash-9B-heretic-MTP</h1>
31
- <p style="margin: 8px 0 0 0; font-size: 14px; position: relative; z-index: 1;"><a href="https://huggingface.co/SC117/QwenPaw-Flash-9B-heretic-MTP-GGUF/blob/main/README.md" style="color: #007aff; text-decoration: none;">📖 English</a> | <span style="color: #86868b;">中文</span></p>
32
  </div>
33
  </div>
34
 
35
  <div style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; display: flex; flex-direction: column; gap: 20px; margin-bottom: 30px;">
36
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
37
  <div style="padding: 16px;">
38
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">QwenPaw-Flash-9B-heretic non-MTP Version: <a href="https://huggingface.co/SC117/QwenPaw-Flash-9B-heretic-GGUF" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">QwenPaw-Flash-9B-heretic-GGUF</a></p>
39
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><p align="center"></p>
40
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>🏆 BenchLocal 总: 4035/5000 (80.7%) — MTP 投机解码 Injected</b><br></p>
41
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><i>无审查 · 已消融 · Agent 优化 · 1.7-4.1× 加速</i></p>
42
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"></p></p>
43
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">以下模型的无审查版本: <b>QwenPaw-Flash-9B</b>, 使用 <a href="https://github.com/p-e-w/heretic" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Heretic</a> v1.3.0 abliteration, with <b>MTP (Multi-Token Prediction(多 token 预测) head</b> weights injected from the original Qwen3.5-9B base model.</p>
44
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">By reconstructing the MTP speculative decoding head — which was stripped during the QwenPaw fine-tuning process — this model achieves <b>up to 4.1× inference speedup</b> on real agent benchmarks while maintaining or improving accuracy.</p>
45
  </div>
46
  </div>
47
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
48
- <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>📊</span> 🏆 BenchLocal 基准测试 (With MTP)</div>
49
  <div style="padding: 16px;">
50
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #64748b; font-style: italic; border-left: 3px solid #ffedd5; padding-left: 12px;"><b>测试环境</b>: NVIDIA RTX 5070 Ti (16GB) · llama.cpp (turboquant build, <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type draft-mtp</code>) · Q6_K quant</p>
51
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #64748b; font-style: italic; border-left: 3px solid #ffedd5; padding-left: 12px;"><b>测试框架</b>: <a href="https://github.com/stevibe/benchlocal" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">BenchLocal</a> — 本地模型 Agent 评估套件</p>
52
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #64748b; font-style: italic; border-left: 3px solid #ffedd5; padding-left: 12px;"><b>测试方法</b>: 每个场景运行 <b>一次</b>, 无重试,无二次尝试</p>
53
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
54
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">基准测试</th>
55
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">得分</th>
56
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">准确率</th>
57
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">结果</th>
58
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">耗时</th>
59
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">vs No-MTP</th>
60
  </tr></thead><tbody>
61
  <tr>
62
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>ToolCall-15</b> 🛠️</td>
@@ -83,7 +83,7 @@ license: apache-2.0
83
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4.1×</b> faster</td>
84
  </tr>
85
  <tr>
86
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>总计</b></td>
87
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4035/5000</b></td>
88
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>80.7%</b></td>
89
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">36✅ 3⚠️ 11❌</td>
@@ -91,11 +91,11 @@ license: apache-2.0
91
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>1.9×</b> faster</td>
92
  </tr>
93
  </tbody></table>
94
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">Comparison: With vs Without MTP</h3>
95
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
96
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">基准测试</th>
97
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Without MTP</th>
98
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">With MTP</th>
99
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Δ Score</th>
100
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Δ Speed</th>
101
  </tr></thead><tbody>
@@ -121,51 +121,51 @@ license: apache-2.0
121
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4.1×</b></td>
122
  </tr>
123
  <tr>
124
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>总计</b></td>
125
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>3873/5000 (77.5%)</b></td>
126
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4035/5000 (80.7%)</b></td>
127
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>+162 pts</b></td>
128
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>1.9×</b></td>
129
  </tr>
130
  <tr>
131
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>总计 Time</b></td>
132
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>14.7 min</b></td>
133
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>7.8 min</b></td>
134
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">—</td>
135
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>1.9×</b></td>
136
  </tr>
137
  </tbody></table>
138
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">🛠️ ToolCall-15 — 工具调用稳定性 (100%, +6.7 pts)</h3>
139
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">MTP speculative decoding eliminated the single failure (TC-05: 相对日期/时间解析, which previously scored 0). All 15 scenarios now pass perfectly.</p>
140
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
141
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">TC-ID</th>
142
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">结果</th>
143
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">场景</th>
144
  </tr></thead><tbody>
145
  <tr>
146
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">TC-01–TC-04</td>
147
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅</td>
148
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">Simple / Multi / Nested / Type conversion</td>
149
  </tr>
150
  <tr>
151
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">TC-05</td>
152
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅</td>
153
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">相对日期/时间解析 ← <b>fixed by MTP</b></td>
154
  </tr>
155
  <tr>
156
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">TC-06–TC-15</td>
157
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅</td>
158
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">All remaining scenarios</td>
159
  </tr>
160
  </tbody></table>
161
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">🤖 HermesAgent-20 — 复杂 Agent 任务 (75.3%, −1.9 pts)</h3>
162
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">MTP decoding introduces minor noise in long-chain reasoning scenarios (~40pt drop), likely because draft tokens occasionally derail the generation path in multi-step planning tasks. However, the speed gain (1.17×) and the fact that the drop is within noise range (single-run variance was 255pts for Qwopus MTP) makes this an acceptable trade-off.</p>
163
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">🐛 BugFind-15 — Code Debugging (68.7%, +6.8 pts)</h3>
164
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">Significant improvement — MTP's faster decoding effectively prevents timeout failures (BF-12 previously hit 300s timeout, now completes in time) and the draft context helps maintain debugging focus.</p>
165
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
166
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">BF-ID</th>
167
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Without MTP</th>
168
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">With MTP</th>
169
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Δ</th>
170
  </tr></thead><tbody>
171
  <tr>
@@ -262,23 +262,23 @@ license: apache-2.0
262
  </div>
263
  </div>
264
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
265
- <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>⚡</span> MTP 投机解码</div>
266
  <div style="padding: 16px;">
267
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">What is MTP?</h3>
268
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">Multi-Token Prediction(多 token 预测 (MTP) is a speculative decoding technique where a small "draft head" predicts multiple future tokens in parallel. The main model then verifies these drafts in a single forward pass, accepting correct predictions for up to 2-4× speedup in practice.</p>
269
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">Injection Method</h3>
270
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">The original Qwen3.5-9B base model ships with a 4-layer MTP head (~243M params) in its architecture configuration. During QwenPaw fine-tuning, the MTP head weights were stripped (only the config placeholder <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">mtp_num_hidden_layers: 1</code> remained, but no actual tensors existed in the safetensors).</p>
271
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>Recovery process:</b></p>
272
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
273
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
274
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
275
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
276
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
277
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
278
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>总计 injected parameters:</b> 243.3M (2.7% of main model)</p>
279
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>MTP acceptance rate (draft-n-max=2):</b> ~50% (1083 accepted / 2166 generated across all benchmarks)</p>
280
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">Why This Works</h3>
281
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">The MTP head is a lightweight 4-layer MLP decoder that maps the main model's last hidden state to future token logits. It sits entirely in speculative decoding space the main model's weights are unchanged, so no fine-tuning or retraining is needed. The head simply needs to exist with compatible dimensions for llama.cpp's <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type draft-mtp</code> to activate.</p>
282
  </div>
283
  </div>
284
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
@@ -286,11 +286,11 @@ license: apache-2.0
286
  <div style="padding: 16px;">
287
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
288
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">模型</th>
289
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">总计</th>
290
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">ToolCall-15</th>
291
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">HermesAgent-20</th>
292
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">BugFind-15</th>
293
- <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">总计 Time</th>
294
  </tr></thead><tbody>
295
  <tr>
296
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>🐾 QwenPaw MTP 9B</b></td>
@@ -301,7 +301,7 @@ license: apache-2.0
301
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>7.8min</b> 🥇</td>
302
  </tr>
303
  <tr>
304
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">🐾 QwenPaw 9B (no MTP)</td>
305
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">3873</td>
306
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">93.3%</td>
307
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>77.2%</b> 🥇</td>
@@ -317,7 +317,7 @@ license: apache-2.0
317
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">21.3min ⚠️</td>
318
  </tr>
319
  <tr>
320
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">🧠 Qwen 35B 思考模式 开</td>
321
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">1445 (HA only)</td>
322
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">—</td>
323
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">72.3%</td>
@@ -325,7 +325,7 @@ license: apache-2.0
325
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">7.0min</td>
326
  </tr>
327
  <tr>
328
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">⚡ Qwen 35B 思考模式 关</td>
329
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">1370 (HA only)</td>
330
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">—</td>
331
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">68.5%</td>
@@ -341,13 +341,13 @@ license: apache-2.0
341
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">18.6min</td>
342
  </tr>
343
  </tbody></table>
344
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>QwenPaw MTP wins</b> on 2/3 benchmarks + total score + total time. The only benchmark it loses is BugFind-15 (to Qwopus MTP), but Qwopus suffers from severe instability (255pt variance on HermesAgent-20, with a worst-case 6.2min timeout).</p>
345
  </div>
346
  </div>
347
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
348
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>🧠</span> Model Description</div>
349
  <div style="padding: 16px;">
350
- <ul style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Base model**: <a href="https://huggingface.co/agentscope-ai/QwenPaw-Flash-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">QwenPaw-Flash-9B</a> (Qwen3.5-9B fine-tuned for autonomous agent scenarios)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">MTP head source**: <a href="https://huggingface.co/Qwen/Qwen3.5-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Qwen/Qwen3.5-9B</a> (original base model, layer 32 MTP head)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Tool**: Heretic v1.3.0 (automatic directional ablation)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Best trial**: #194 / 230 trials (abliteration)</li></ul>
351
  </div>
352
  </div>
353
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
@@ -367,7 +367,7 @@ mlp.down_proj.min_weight_distance = 17.47</p>
367
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
368
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>🏗️</span> Architecture</div>
369
  <div style="padding: 16px;">
370
- <ul style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Type**: Qwen3_5ForConditionalGeneration (multimodal with vision encoder) + MTP spec head</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Main model parameters**: ~9B</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">MTP head parameters**: ~243M (2.7% overhead)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Layers**: 32 (hybrid: Gated DeltaNet + Gated Attention) + 4 MTP decoder layers</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Context length**: 262,144 tokens</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Speculative decoding**: <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type draft-mtp</code> with <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-draft-n-max 2</code></li></ul>
371
  </div>
372
  </div>
373
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
@@ -381,22 +381,22 @@ mlp.down_proj.min_weight_distance = 17.47</p>
381
  <tr>
382
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">QwenPaw-Flash-9B-heretic-MTP-Q8_0.gguf</code></td>
383
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~9.2GB</td>
384
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">High quality, near lossless</td>
385
  </tr>
386
  <tr>
387
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf</code></td>
388
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~7.1GB</td>
389
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅ <b>Recommended, best value</b></td>
390
  </tr>
391
  <tr>
392
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">QwenPaw-Flash-9B-heretic-MTP-Q4_K_M.gguf</code></td>
393
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~5.4GB</td>
394
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">Compact</td>
395
  </tr>
396
  <tr>
397
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">mmproj-BF16</code></td>
398
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~880MB</td>
399
- <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">Vision encoder (multimodal) same as non-MTP version</td>
400
  </tr>
401
  </tbody></table>
402
  </div>
@@ -406,37 +406,37 @@ mlp.down_proj.min_weight_distance = 17.47</p>
406
  <div style="padding: 16px;">
407
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">--spec-type draft-mtp</p>
408
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">--spec-draft-n-max 2</p>
409
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">llama.cpp (with MTP speculative decoding)</h3>
410
- <p style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;"># Start server with MTP enabled
411
  llama-server -m QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf \
412
  -ngl 99 -fa on -c 8192 \
413
  --spec-type draft-mtp --spec-draft-n-max 2 \
414
  --host 0.0.0.0 --port 8088
415
 
416
- # Or with CLI
417
  llama-cli -m QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf \
418
  -ngl 99 -fa on -c 8192 \
419
  --spec-type draft-mtp --spec-draft-n-max 2 \
420
  -p "Write a Python script to..."</p>
421
- <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">llama.cpp (without MTP, fallback)</h3>
422
- <p style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;"># The model works as a normal GGUF toojust omit spec args
423
  llama-server -m QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf \
424
  -ngl 99 -fa on -c 8192 \
425
  --host 0.0.0.0 --port 8088</p>
426
  <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">LM Studio</h3>
427
- <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">Load the GGUF file directly. For MTP speculative decoding, LM Studio must support <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type</code> — if not, the model functions as a standard 9B model.</p>
428
  </div>
429
  </div>
430
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
431
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>📝</span> Notes</div>
432
  <div style="padding: 16px;">
433
- <ol style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Safety filters have been significantly reduced via abliteration</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">KL divergence is only 0.0225 — minimal impact on model intelligence</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">The original model supports multimodal (vision); GGUF versions require the mmproj file from the non-MTP release</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">BenchLocal scores measured at <b>Q6_K</b> on RTX 5070 Ti 16GB with llama.cpp (turboquant). Each scenario was run <b>一次 with no retries</b></li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">MTP acceptance rate of ~50% at draft-n-max=2 means ~25-40% wall-clock speedup on short prompts, and up to on long-generation tasks (debugging, code writing)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">BugFind-15 saw the largest improvement (4.1×) because debugging tasks are generation-heavy more tokens, more drafts accepted</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">The MTP head is a <b>lossless copy</b> from the original Qwen3.5-9B no training was involved, simply weight injection</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Agent-heavy scenarios (HermesAgent-20) see the least MTP benefit because short-turn interactions don't give the draft head enough runway</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Please use responsibly</li></ol>
434
  </div>
435
  </div>
436
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
437
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>🙏</span> Acknowledgements</div>
438
  <div style="padding: 16px;">
439
- <ul style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://github.com/p-e-w/heretic" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Heretic</a> — Automated censorship removal</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://huggingface.co/agentscope-ai/QwenPaw-Flash-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">agentscope-ai/QwenPaw-Flash-9B</a> — Base model</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://huggingface.co/Qwen/Qwen3.5-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Qwen/Qwen3.5-9B</a> — MTP head source</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://github.com/ggml-org/llama.cpp" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">llama.cpp</a> — GGUF quantization and inference</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://github.com/stevibe/benchlocal" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">BenchLocal</a> — Local model agent evaluation suite</li></ul>
440
  </div>
441
  </div>
442
  </div>
 
28
  <span style="background: #007aff; color: white; font-size: 11px; font-weight: 600; padding: 5px 14px; border-radius: 20px;">MTP</span><span style="background: #af52de; color: white; font-size: 11px; font-weight: 600; padding: 5px 14px; border-radius: 20px;">GGUF</span>
29
  </div>
30
  <h1 style="margin: 0 0 8px 0; font-size: 32px; font-weight: 700; color: #1d1d1f; letter-spacing: -0.5px; border: none; position: relative; z-index: 1;">QwenPaw-Flash-9B-heretic-MTP</h1>
31
+ <p style="margin: 8px 0 0 0; font-size: 14px; position: relative; z-index: 1;"><a href="https://huggingface.co/SC117/QwenPaw-Flash-9B-heretic-MTP-GGUF/blob/main/README.md" style="color: #007aff; text-decoration: none;">📖 English</a> | <span style="color: #86868b;">中文</span> style="color: #007aff; text-decoration: none;">📖 中文文档</a></p>
32
  </div>
33
  </div>
34
 
35
  <div style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; display: flex; flex-direction: column; gap: 20px; margin-bottom: 30px;">
36
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
37
  <div style="padding: 16px;">
38
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">QwenPaw-Flash-9B-heretic MTP 版本: <a href="https://huggingface.co/SC117/QwenPaw-Flash-9B-heretic-GGUF" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">QwenPaw-Flash-9B-heretic-GGUF</a></p>
39
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><p align="center"></p>
40
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>🏆 BenchLocal 总: 4035/5000 (80.7%) — MTP 投机解码已注入</b><br></p>
41
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><i>无审查 · 已消融 · Agent 优化 · 1.7-4.1× 加速</i></p>
42
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"></p></p>
43
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>QwenPaw-Flash-9B</b> 的无审查版本,使用 <a href="https://github.com/p-e-w/heretic" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Heretic</a> v1.3.0 消融处理,并从原始 Qwen3.5-9B 基座模型注入了 <b>MTPMulti-Token Prediction)</b>权重。</p>
44
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">通过重建在 QwenPaw 微调过程中被剥离的 MTP 投机解码头,本模型在真实 Agent 基准测试中实现了<b>最高 4.1× 推理加速</b>,同时保持或提升了准确率。</p>
45
  </div>
46
  </div>
47
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
48
+ <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>📊</span> 🏆 BenchLocal 基准测试(��用 MTP</div>
49
  <div style="padding: 16px;">
50
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #64748b; font-style: italic; border-left: 3px solid #ffedd5; padding-left: 12px;"><b>测试环境</b> NVIDIA RTX 5070 Ti (16GB) · llama.cpp (turboquant build, <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type draft-mtp</code>) · Q6_K quant</p>
51
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #64748b; font-style: italic; border-left: 3px solid #ffedd5; padding-left: 12px;"><b>测试框架</b> <a href="https://github.com/stevibe/benchlocal" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">BenchLocal</a> — 本地模型 Agent 评估套件</p>
52
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #64748b; font-style: italic; border-left: 3px solid #ffedd5; padding-left: 12px;"><b>测试方法</b>每个场景运行<b>一次</b>无重试,无二次尝试</p>
53
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
54
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">基准测试</th>
55
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">得分</th>
56
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">准确率</th>
57
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">结果</th>
58
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">耗时</th>
59
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">对比无 MTP</th>
60
  </tr></thead><tbody>
61
  <tr>
62
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>ToolCall-15</b> 🛠️</td>
 
83
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4.1×</b> faster</td>
84
  </tr>
85
  <tr>
86
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>Total</b></td>
87
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4035/5000</b></td>
88
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>80.7%</b></td>
89
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">36✅ 3⚠️ 11❌</td>
 
91
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>1.9×</b> faster</td>
92
  </tr>
93
  </tbody></table>
94
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">对比:有 MTP vs MTP</h3>
95
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
96
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">基准测试</th>
97
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;"> MTP</th>
98
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;"> MTP</th>
99
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Δ Score</th>
100
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Δ Speed</th>
101
  </tr></thead><tbody>
 
121
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4.1×</b></td>
122
  </tr>
123
  <tr>
124
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>Total</b></td>
125
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>3873/5000 (77.5%)</b></td>
126
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>4035/5000 (80.7%)</b></td>
127
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>+162 pts</b></td>
128
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>1.9×</b></td>
129
  </tr>
130
  <tr>
131
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>总耗时</b></td>
132
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>14.7 min</b></td>
133
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>7.8 min</b></td>
134
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">—</td>
135
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>1.9×</b></td>
136
  </tr>
137
  </tbody></table>
138
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">🛠️ ToolCall-15 — 工具调用稳定性 (100%, +6.7 )</h3>
139
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">MTP 投机解码消除了唯一的失败项(TC-05相对日期/时间解析,之前得分为 0)。全部 15 个场景现在完美通过。</p>
140
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
141
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">编号</th>
142
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">结果</th>
143
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">场景</th>
144
  </tr></thead><tbody>
145
  <tr>
146
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">TC-01–TC-04</td>
147
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅</td>
148
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">简单 / 多参数 / 嵌套 / 类型转换</td>
149
  </tr>
150
  <tr>
151
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">TC-05</td>
152
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅</td>
153
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">相对日期/时间解析 ← <b>已被 MTP 修复</b></td>
154
  </tr>
155
  <tr>
156
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">TC-06–TC-15</td>
157
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅</td>
158
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">所有剩余场景</td>
159
  </tr>
160
  </tbody></table>
161
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">🤖 HermesAgent-20 — 复杂 Agent 任务 (75.3%, −1.9 )</h3>
162
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">MTP 解码在长链推理场景中引入了轻微噪声(约 40 分下降),可能是因为 draft token 偶尔在多步规划任务中偏离生成路径。不过,速度提升(1.17×)以及下降幅度在噪声范围内(Qwopus MTP 单次运行方差为 255 分)使得这是一个可接受的权衡。</p>
163
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">🐛 BugFind-15 — 代码调试 (68.7%, +6.8 )</h3>
164
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">显著提升 — MTP 更快的解码有效防止了超时失败(BF-12 之前触发 300 秒超时,现在按时完成),draft 上下文有助于保持调试焦点。</p>
165
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
166
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">编号</th>
167
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;"> MTP</th>
168
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;"> MTP</th>
169
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Δ</th>
170
  </tr></thead><tbody>
171
  <tr>
 
262
  </div>
263
  </div>
264
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
265
+ <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>⚡</span> MTP Speculative Decoding</div>
266
  <div style="padding: 16px;">
267
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">什么是 MTP</h3>
268
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">Multi-Token Prediction(MTP)是一种投机解码技术,通过一个小型「draft 头」并行预测个未来 token。主模型随后在单次前向传播中验证这些预测,接受正确的预测以实现 2-4× 的实际加速。</p>
269
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">注入方法</h3>
270
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">原始 Qwen3.5-9B 基座模型在架构配置中附带了一个 4 MTP 头(约 243M 参数)。在 QwenPaw 微调过程中,MTP 头权重被剥离(仅保留了配置占位符 <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">mtp_num_hidden_layers: 1</code> ,但 safetensors 中没有实际张量)。</p>
271
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>恢复过程:</b></p>
272
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
273
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
274
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
275
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
276
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
277
  <ol style="margin: 0 0 12px 0; padding-left: 20px;"></ol>
278
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>注入参数量:</b>243.3M(主模型的 2.7%</p>
279
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>MTP 接受率 (draft-n-max=2)</b> 50%(所有基准测试中 1083 次接受 / 2166 次生成)</p>
280
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">为什么有效</h3>
281
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">MTP 头是一个轻量级 4 MLP 解码器,将主模型的最后隐藏状态映射到未来 token logits。它完全位于投机解码空间中主模型权重不变,因此无需微调或重新训练。MTP 头只需以兼容的维度存在,llama.cpp <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type draft-mtp</code> 即可激活。</p>
282
  </div>
283
  </div>
284
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
 
286
  <div style="padding: 16px;">
287
  <table style="width: 100%; border-collapse: collapse; font-size: 13px;"><thead><tr style="background: rgba(255,107,53,0.05);">
288
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">模型</th>
289
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">Total</th>
290
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">ToolCall-15</th>
291
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">HermesAgent-20</th>
292
  <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">BugFind-15</th>
293
+ <th style="padding: 8px 10px; border-bottom: 2px solid #ff6b35; text-align: left; color: #c2410c; font-weight: bold;">总耗时</th>
294
  </tr></thead><tbody>
295
  <tr>
296
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>🐾 QwenPaw MTP 9B</b></td>
 
301
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>7.8min</b> 🥇</td>
302
  </tr>
303
  <tr>
304
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">🐾 QwenPaw 9B(无 MTP</td>
305
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">3873</td>
306
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">93.3%</td>
307
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><b>77.2%</b> 🥇</td>
 
317
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">21.3min ⚠️</td>
318
  </tr>
319
  <tr>
320
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">🧠 Qwen 35B 思考模式开</td>
321
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">1445 (HA only)</td>
322
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">—</td>
323
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">72.3%</td>
 
325
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">7.0min</td>
326
  </tr>
327
  <tr>
328
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">⚡ Qwen 35B 思考模式关</td>
329
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">1370 (HA only)</td>
330
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">—</td>
331
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">68.5%</td>
 
341
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">18.6min</td>
342
  </tr>
343
  </tbody></table>
344
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;"><b>QwenPaw MTP 胜出</b>2/3 基准测试 + 总分 + 总耗时。唯一输掉的基准是 BugFind-15(输给 Qwopus MTP),但 Qwopus 存在严重不稳定性(HermesAgent-20 方差 255 分,最差情况 6.2 分钟超时)。</p>
345
  </div>
346
  </div>
347
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
348
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>🧠</span> Model Description</div>
349
  <div style="padding: 16px;">
350
+ <ul style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">基座模型** <a href="https://huggingface.co/agentscope-ai/QwenPaw-Flash-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">QwenPaw-Flash-9B</a> Qwen3.5-9B 针对自主 Agent 场景微调)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">MTP 头来源** <a href="https://huggingface.co/Qwen/Qwen3.5-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Qwen/Qwen3.5-9B</a> (原始基座模型,第 32 MTP 头)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">工具**Heretic v1.3.0(自动定向消融)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">最佳试验**#194 / 230 次试验(消融)</li></ul>
351
  </div>
352
  </div>
353
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
 
367
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
368
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>🏗️</span> Architecture</div>
369
  <div style="padding: 16px;">
370
+ <ul style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">类型**Qwen3_5ForConditionalGeneration(多模态,含视觉编码器)+ MTP 投机解码头</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">主模型参数量**~9B</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">MTP 头参数量**~243M2.7% 额外开销)</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">层数**32(混合:Gated DeltaNet + Gated Attention+ 4 MTP 解码器</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">上下文长度**262,144 tokens</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">投机解码** <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type draft-mtp</code> 配合 <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-draft-n-max 2</code></li></ul>
371
  </div>
372
  </div>
373
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
 
381
  <tr>
382
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">QwenPaw-Flash-9B-heretic-MTP-Q8_0.gguf</code></td>
383
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~9.2GB</td>
384
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">高质量,近乎无损</td>
385
  </tr>
386
  <tr>
387
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf</code></td>
388
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~7.1GB</td>
389
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">✅ <b>推荐,最佳性价比</b></td>
390
  </tr>
391
  <tr>
392
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">QwenPaw-Flash-9B-heretic-MTP-Q4_K_M.gguf</code></td>
393
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~5.4GB</td>
394
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">紧凑</td>
395
  </tr>
396
  <tr>
397
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;"><code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">mmproj-BF16</code></td>
398
  <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">~880MB</td>
399
+ <td style="padding: 7px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); color: #334155; text-align: left;">视觉编码器(多模态)与非 MTP 版本相同</td>
400
  </tr>
401
  </tbody></table>
402
  </div>
 
406
  <div style="padding: 16px;">
407
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">--spec-type draft-mtp</p>
408
  <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">--spec-draft-n-max 2</p>
409
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">llama.cpp (配合 MTP speculative decoding)</h3>
410
+ <p style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;"># Start server 配合 MTP enabled
411
  llama-server -m QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf \
412
  -ngl 99 -fa on -c 8192 \
413
  --spec-type draft-mtp --spec-draft-n-max 2 \
414
  --host 0.0.0.0 --port 8088
415
 
416
+ # Or 配合 CLI
417
  llama-cli -m QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf \
418
  -ngl 99 -fa on -c 8192 \
419
  --spec-type draft-mtp --spec-draft-n-max 2 \
420
  -p "Write a Python script to..."</p>
421
+ <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">llama.cpp (配合out MTP, fallback)</h3>
422
+ <p style="margin: 0; font-family: monospace; background: #f8fafc; padding: 10px 14px; border-radius: 6px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b; white-space: pre-wrap;"># 模型也可作为普通 GGUF 使用只需省略投机解码参数
423
  llama-server -m QwenPaw-Flash-9B-heretic-MTP-Q6_K.gguf \
424
  -ngl 99 -fa on -c 8192 \
425
  --host 0.0.0.0 --port 8088</p>
426
  <h3 style="margin: 16px 0 8px 0; font-size: 14px; color: #1e293b; font-weight: 700;">LM Studio</h3>
427
+ <p style="margin: 0 0 12px 0; font-size: 13px; color: #334155; line-height: 1.7;">直接加载 GGUF 文件。如需 MTP 投机解码,LM Studio 需要支持 <code style="background: #f8fafc; padding: 2px 6px; border-radius: 4px; border: 1px solid #e2e8f0; font-size: 12px; color: #1e293b;">--spec-type</code> — 如果不支持,模型将作为标准 9B 模型运行。</p>
428
  </div>
429
  </div>
430
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
431
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>📝</span> Notes</div>
432
  <div style="padding: 16px;">
433
+ <ol style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">安全过滤器已通过消融显著降低</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">KL 散度仅为 0.0225 — 对模型智能影响极小</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">原始模型支持多模态(视觉);GGUF 版本需要非 MTP 版本的 mmproj 文件</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">BenchLocal 分数在 <b>Q6_K</b> on RTX 5070 Ti 16GB 配合 llama.cpp (turboquant). Each scenario was run <b>once 配合 no retries</b></li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">MTP draft-n-max=2 下约 50% 的接受率意味着短提示约 25-40% 的实际加速,长生成任务(调试、代码编写)最高可达 4×</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">BugFind-15 提升最大(4.1×),因为调试任务是生成密集型token 更多,接受的 draft 更多</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">MTP 头是从原始 Qwen3.5-9B <b>无损拷贝</b> — 不涉及训练,仅是权重注入</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">Agent 密集型场景(HermesAgent-20)从 MTP 获益最少,因为短轮次交互没有给 draft 头足够的发挥空间</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;">请负责任地使用</li></ol>
434
  </div>
435
  </div>
436
  <div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02);">
437
  <div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>🙏</span> Acknowledgements</div>
438
  <div style="padding: 16px;">
439
+ <ul style="margin: 0 0 12px 0; padding-left: 20px;"><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://github.com/p-e-w/heretic" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Heretic</a> — 自动化审查移除</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://huggingface.co/agentscope-ai/QwenPaw-Flash-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">agentscope-ai/QwenPaw-Flash-9B</a> — 基座模型</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://huggingface.co/Qwen/Qwen3.5-9B" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">Qwen/Qwen3.5-9B</a> — MTP 头来源</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://github.com/ggml-org/llama.cpp" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">llama.cpp</a> — GGUF 量化与推理</li><li style="margin-bottom: 4px; font-size: 13px; color: #334155;"><a href="https://github.com/stevibe/benchlocal" target="_blank" style="color: #c2410c; text-decoration: none; font-weight: 700;">BenchLocal</a> — 本地模型 Agent 评估套件</li></ul>
440
  </div>
441
  </div>
442
  </div>