--- license: apache-2.0 base_model: - Jackrong/Qwopus3.6-35B-A3B-Coder - Jackrong/Qwopus3.6-35B-A3B-v1 - unsloth/Qwen3.6-35B-A3B library_name: mlx pipeline_tag: image-text-to-text tags: - mlx - omlx - qwen3.6 - qwen3_5_moe - vision-language - tool-calling - dynamic-quantization - lm-studio - no-mtp inference: false --- # Qwopus3.6-35B-A3B-Coder-oQ6-MLX This is a dynamic oQ6 MLX quantization of `Jackrong/Qwopus3.6-35B-A3B-Coder`, built from source revision `4ba785ca1eb5eb5a80ae38f3a30fa9d4f7c0428a`. It is the non-MTP package in the current Qwopus3.6 35B A3B Coder MLX set. The older MTP-preserved package for this quantization level was withdrawn because it could loop unreliably in practice. This package keeps the same main model, vision tensors, dynamic quantization plan, tokenizer, and tool template without the speculative MTP tensors. The model keeps the Qwen XML tool-calling template with `tool_parser_type=qwen3_coder`, defaults tool-use prompts to no-thinking mode, and accepts `/think`, `/no_think`, and `/nothink` prompt markers in template-aware runtimes. ## Variant - Quantization: dynamic oQ6 - Approximate size: about 27 GB - Context length: 262144 tokens - Architecture: `Qwen3_5MoeForConditionalGeneration` - Variant type: No-MTP - License: Apache-2.0, inherited from the source model ## Compatibility This is the recommended current package for this quantization level. It omits the 42 native MTP tensors because the MTP-preserved packages were withdrawn after producing unreliable looping behavior. The main model weights, vision tensors, dynamic quantization plan, tokenizer, and tool template are kept, while `mtp_num_hidden_layers` is set to 0. This no-MTP package was also loaded and tested through oMLX, so it is not LM Studio-only. In LM Studio, load it with speculative draft MTP disabled. ## Thinking Behavior oMLX testing covered both `enable_thinking=false` and `enable_thinking=true`. With thinking disabled, visible content was plain text and no `reasoning_content` was returned. With thinking enabled, reasoning was separated and did not leak as literal `` or `` tags into visible content or tool calls. LM Studio's current MLX VLM backend separates reasoning into `reasoning_content`, but did not honor the no-thinking toggle for this architecture in local testing. The accepted LM Studio checks still showed no literal thinking tags in visible content or tool calls. ## Verification - Static artifact check: `errors: []` - Model type: `qwen3_5_moe` - Quantization: oQ6, base 6-bit, group size 64, dynamic overrides (6-bit: 30, 8-bit: 190) - MTP absent: 0 MTP layers, 0 MTP tensors - Vision tensors: 333 - Indexed tensors: 2010 - Safetensors shards: 6 - Tool parser: `qwen3_coder` - oMLX: discovered under the public model ID, loaded, and passed direct tool dispatch - oMLX Swival: core 5/5 and all-tools 5/5 with raw output checks enabled - LM Studio: loaded at 262144 context, direct tool smoke passed, raw no-leak smoke passed, Swival core/all-tools passed ## Use Download this repository as an MLX model directory and select `jedisct1/Qwopus3.6-35B-A3B-Coder-oQ6-MLX` in LM Studio or oMLX. For LM Studio CLI usage, keep speculative draft MTP disabled: ```bash lms load qwopus3.6-35b-a3b-coder-oq6-mlx --no-speculative-draft-mtp ``` ## Notes Tool-calling quality was checked with direct OpenAI-compatible tool-call smokes and Swival agent tasks. The Swival suites covered file reads, writes, line-number edits, deletes, command execution, listing, grep, outline, planning, todos, snapshots, shell commands, batch file reads, and URL fetches. These are MLX directory artifacts for local inference. They are experimental community quantizations and should be evaluated in your own harness before production use.