Does it support tool calling?

#4
by xing120226 - opened

Does it support tool calling?

Yes β€” confirming what @livepeer-ren said, with the specific launch flags that make it work cleanly:

vllm serve sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP \
    --trust-remote-code --quantization modelopt --language-model-only \
    --reasoning-parser qwen3 \
    --enable-auto-tool-choice --tool-call-parser qwen3_coder \
    --speculative-config '{"method":"qwen3_5_mtp","num_speculative_tokens":3}'

Two flags to not skip:

  • --reasoning-parser qwen3 β€” keeps <think>...</think> chains out of the visible content stream (otherwise tool-using clients will read the thinking as the answer)
  • --tool-call-parser qwen3_coder β€” needed for tool calls to surface as proper tool_calls in the response instead of raw XML in content

Verified on RTX PRO 6000 Blackwell with single + parallel tool calls and multi-turn tool-result round-trips. Output is well-formed JSON arguments, finish_reason=tool_calls correctly, no escaping issues observed.

Throughput at this config lands ~100+ tok/s on long-form decode on a single Blackwell card.

β€” Tonoken3 / Lna-Lab

xing120226 changed discussion status to closed

Sign up or log in to comment