Qwopus3.6-35B-A3B-Coder-oQ8-MLX

This is a dynamic oQ8 MLX quantization of Jackrong/Qwopus3.6-35B-A3B-Coder, built from source revision 4ba785ca1eb5eb5a80ae38f3a30fa9d4f7c0428a. It is the non-MTP package in the current Qwopus3.6 35B A3B Coder MLX set.

The older MTP-preserved package for this quantization level was withdrawn because it could loop unreliably in practice. This package keeps the same main model, vision tensors, dynamic quantization plan, tokenizer, and tool template without the speculative MTP tensors.

The model keeps the Qwen XML tool-calling template with tool_parser_type=qwen3_coder, defaults tool-use prompts to no-thinking mode, and accepts /think, /no_think, and /nothink prompt markers in template-aware runtimes.

Variant

  • Quantization: dynamic oQ8
  • Approximate size: about 35 GB
  • Context length: 262144 tokens
  • Architecture: Qwen3_5MoeForConditionalGeneration
  • Variant type: No-MTP
  • License: Apache-2.0, inherited from the source model

Compatibility

This is the recommended current package for this quantization level. It omits the 42 native MTP tensors because the MTP-preserved packages were withdrawn after producing unreliable looping behavior.

The main model weights, vision tensors, dynamic quantization plan, tokenizer, and tool template are kept, while mtp_num_hidden_layers is set to 0.

This no-MTP package was also loaded and tested through oMLX, so it is not LM Studio-only. In LM Studio, load it with speculative draft MTP disabled.

Thinking Behavior

oMLX testing covered both enable_thinking=false and enable_thinking=true. With thinking disabled, visible content was plain text and no reasoning_content was returned. With thinking enabled, reasoning was separated and did not leak as literal <think> or </think> tags into visible content or tool calls.

LM Studio's current MLX VLM backend separates reasoning into reasoning_content, but did not honor the no-thinking toggle for this architecture in local testing. The accepted LM Studio checks still showed no literal thinking tags in visible content or tool calls.

Verification

  • Static artifact check: errors: []
  • Model type: qwen3_5_moe
  • Quantization: oQ8, base 8-bit, group size 64, dynamic overrides (8-bit: 262)
  • MTP absent: 0 MTP layers, 0 MTP tensors
  • Vision tensors: 333
  • Indexed tensors: 2010
  • Safetensors shards: 8
  • Tool parser: qwen3_coder
  • oMLX: discovered under the public model ID, loaded, and passed direct tool dispatch
  • oMLX Swival: core 5/5 and all-tools 5/5 with raw output checks enabled
  • LM Studio: loaded at 262144 context, direct tool smoke passed, raw no-leak smoke passed, Swival core/all-tools passed

Use

Download this repository as an MLX model directory and select jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX in LM Studio or oMLX.

For LM Studio CLI usage, keep speculative draft MTP disabled:

lms load qwopus3.6-35b-a3b-coder-oq8-mlx --no-speculative-draft-mtp

Notes

Tool-calling quality was checked with direct OpenAI-compatible tool-call smokes and Swival agent tasks. The Swival suites covered file reads, writes, line-number edits, deletes, command execution, listing, grep, outline, planning, todos, snapshots, shell commands, batch file reads, and URL fetches.

These are MLX directory artifacts for local inference. They are experimental community quantizations and should be evaluated in your own harness before production use.

Downloads last month
39
Safetensors
Model size
10B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX

Collection including jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX