jedisct1/Qwen-AgentWorld-35B-A3B-oQ8-MLX

This is an oMLX MLX quantization of Qwen/Qwen-AgentWorld-35B-A3B.

The goal of this package is practical local agent use on Apple Silicon. Tool calling was treated as the release gate, so the chat template is kept both as chat_template.jinja and embedded in tokenizer_config.json, with tool_parser_type set to qwen3_coder.

Quantization

  • Quantization: oQ8 (8-bit affine, mixed precision)
  • Global bits: 8
  • Group size: 64
  • Mode: affine
  • Safetensor shards: 8
  • Tensor count: 1677
  • Total safetensor size: 36.81 GB

The upstream config declares an MTP layer, but the upstream checkpoint published for Qwen/Qwen-AgentWorld-35B-A3B does not include mtp.* tensors. These artifacts therefore publish a self-consistent non-MTP config instead of advertising a missing draft head.

Compatibility

Expected local targets:

  • oMLX 0.4.4 or newer
  • LM Studio with MLX model loading and OpenAI-compatible tool calls

Use greedy decoding for strict tool use and eval runs:

{"temperature": 0, "top_p": 1}

Tool-Calling Verification

Swival core tool suite: 5/5 passed on 2026-06-24 with deterministic greedy decoding.

Swival all-built-ins suite: 5/5 passed on 2026-06-24 with deterministic greedy decoding.

The Swival suites exercise real file, edit, command, planning, checklist, snapshot, grep, outline, URL fetch, batched reads, and shell-tool dispatch through an OpenAI-compatible server. A direct /v1/chat/completions smoke also checks that the server returns structured tool_calls, not just plain XML text.

Downloads last month
414
Safetensors
Model size
10B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jedisct1/Qwen-AgentWorld-35B-A3B-oQ8-MLX

Quantized
(31)
this model

Collection including jedisct1/Qwen-AgentWorld-35B-A3B-oQ8-MLX