llama.cpp(纯文本)
hf download SC117/Agents-A1-MTP-APEX-GGUF --include "*.gguf" --local-dir ./models
./llama-server -m ./models/Agents-A1-MTP-APEX-I-Compact.gguf -ngl 99 -c 131072
llama.cpp(视觉 + 文本)
./llama-server -m ./models/Agents-A1-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072
vLLM
vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3
·
工具调用变体
vllm serve SC117/Agents-A1-MTP-APEX-GGUF --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder
SGLang
python3 -m sglang.launch_server --model-path "SC117/Agents-A1-MTP-APEX-GGUF" --host 0.0.0.0 --port 30000