Instructions to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX") config = load_config("jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX
Run Hermes
hermes
- OpenClaw new
How to use jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX with OpenClaw:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX"
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX" \ --custom-provider-id mlx-lm \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
Qwopus3.6-35B-A3B-Coder-oQ8-MLX
This is a dynamic oQ8 MLX quantization of Jackrong/Qwopus3.6-35B-A3B-Coder, built from source revision 4ba785ca1eb5eb5a80ae38f3a30fa9d4f7c0428a. It is the non-MTP package in the current Qwopus3.6 35B A3B Coder MLX set.
The older MTP-preserved package for this quantization level was withdrawn because it could loop unreliably in practice. This package keeps the same main model, vision tensors, dynamic quantization plan, tokenizer, and tool template without the speculative MTP tensors.
The model keeps the Qwen XML tool-calling template with tool_parser_type=qwen3_coder, defaults tool-use prompts to no-thinking mode, and accepts /think, /no_think, and /nothink prompt markers in template-aware runtimes.
Variant
- Quantization: dynamic oQ8
- Approximate size: about 35 GB
- Context length: 262144 tokens
- Architecture:
Qwen3_5MoeForConditionalGeneration - Variant type: No-MTP
- License: Apache-2.0, inherited from the source model
Compatibility
This is the recommended current package for this quantization level. It omits the 42 native MTP tensors because the MTP-preserved packages were withdrawn after producing unreliable looping behavior.
The main model weights, vision tensors, dynamic quantization plan, tokenizer, and tool template are kept, while mtp_num_hidden_layers is set to 0.
This no-MTP package was also loaded and tested through oMLX, so it is not LM Studio-only. In LM Studio, load it with speculative draft MTP disabled.
Thinking Behavior
oMLX testing covered both enable_thinking=false and enable_thinking=true. With thinking disabled, visible content was plain text and no reasoning_content was returned. With thinking enabled, reasoning was separated and did not leak as literal <think> or </think> tags into visible content or tool calls.
LM Studio's current MLX VLM backend separates reasoning into reasoning_content, but did not honor the no-thinking toggle for this architecture in local testing. The accepted LM Studio checks still showed no literal thinking tags in visible content or tool calls.
Verification
- Static artifact check:
errors: [] - Model type:
qwen3_5_moe - Quantization: oQ8, base 8-bit, group size 64, dynamic overrides (8-bit: 262)
- MTP absent: 0 MTP layers, 0 MTP tensors
- Vision tensors: 333
- Indexed tensors: 2010
- Safetensors shards: 8
- Tool parser:
qwen3_coder - oMLX: discovered under the public model ID, loaded, and passed direct tool dispatch
- oMLX Swival: core 5/5 and all-tools 5/5 with raw output checks enabled
- LM Studio: loaded at 262144 context, direct tool smoke passed, raw no-leak smoke passed, Swival core/all-tools passed
Use
Download this repository as an MLX model directory and select jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX in LM Studio or oMLX.
For LM Studio CLI usage, keep speculative draft MTP disabled:
lms load qwopus3.6-35b-a3b-coder-oq8-mlx --no-speculative-draft-mtp
Notes
Tool-calling quality was checked with direct OpenAI-compatible tool-call smokes and Swival agent tasks. The Swival suites covered file reads, writes, line-number edits, deletes, command execution, listing, grep, outline, planning, todos, snapshots, shell commands, batch file reads, and URL fetches.
These are MLX directory artifacts for local inference. They are experimental community quantizations and should be evaluated in your own harness before production use.
- Downloads last month
- 39
8-bit
Model tree for jedisct1/Qwopus3.6-35B-A3B-Coder-oQ8-MLX
Base model
Qwen/Qwen3.6-35B-A3B