Instructions to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX")
config = load_config("jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX

Run Hermes

hermes

OpenClaw new

How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Qwopus3.6-35B-A3B-Coder-oQ8-MLX

This is a dynamic oQ8 MLX quantization of Jackrong/Qwopus3.6-35B-A3B-Coder, built from source revision 4ba785ca1eb5eb5a80ae38f3a30fa9d4f7c0428a. It is the non-MTP package in the current Qwopus3.6 35B A3B Coder MLX set.

The older MTP-preserved package for this quantization level was withdrawn because it could loop unreliably in practice. This package keeps the same main model, vision tensors, dynamic quantization plan, tokenizer, and tool template without the speculative MTP tensors.

The model keeps the Qwen XML tool-calling template with tool_parser_type=qwen3_coder, defaults tool-use prompts to no-thinking mode, and accepts /think, /no_think, and /nothink prompt markers in template-aware runtimes.

Variant

Quantization: dynamic oQ8
Approximate size: about 35 GB
Context length: 262144 tokens
Architecture: Qwen3_5MoeForConditionalGeneration
Variant type: No-MTP
License: Apache-2.0, inherited from the source model

Compatibility

This is the recommended current package for this quantization level. It omits the 42 native MTP tensors because the MTP-preserved packages were withdrawn after producing unreliable looping behavior.

The main model weights, vision tensors, dynamic quantization plan, tokenizer, and tool template are kept, while mtp_num_hidden_layers is set to 0.

This no-MTP package was also loaded and tested through oMLX, so it is not LM Studio-only. In LM Studio, load it with speculative draft MTP disabled.

Thinking Behavior

oMLX testing covered both enable_thinking=false and enable_thinking=true. With thinking disabled, visible content was plain text and no reasoning_content was returned. With thinking enabled, reasoning was separated and did not leak as literal <think> or </think> tags into visible content or tool calls.

LM Studio's current MLX VLM backend separates reasoning into reasoning_content, but did not honor the no-thinking toggle for this architecture in local testing. The accepted LM Studio checks still showed no literal thinking tags in visible content or tool calls.

Verification

Static artifact check: errors: []
Model type: qwen3_5_moe
Quantization: oQ8, base 8-bit, group size 64, dynamic overrides (8-bit: 262)
MTP absent: 0 MTP layers, 0 MTP tensors
Vision tensors: 333
Indexed tensors: 2010
Safetensors shards: 8
Tool parser: qwen3_coder
oMLX: discovered under the public model ID, loaded, and passed direct tool dispatch
oMLX Swival: core 5/5 and all-tools 5/5 with raw output checks enabled
LM Studio: loaded at 262144 context, direct tool smoke passed, raw no-leak smoke passed, Swival core/all-tools passed

Use

Download this repository as an MLX model directory and select jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX in LM Studio or oMLX.

For LM Studio CLI usage, keep speculative draft MTP disabled:

lms load qwopus3.6-35b-a3b-coder-oq8-mlx --no-speculative-draft-mtp

Notes

Tool-calling quality was checked with direct OpenAI-compatible tool-call smokes and Swival agent tasks. The Swival suites covered file reads, writes, line-number edits, deletes, command execution, listing, grep, outline, planning, todos, snapshots, shell commands, batch file reads, and URL fetches.

These are MLX directory artifacts for local inference. They are experimental community quantizations and should be evaluated in your own harness before production use.