qwopus3.6-35b-a3b-coder-abliterated-mxfp4-vision-mlx

Vision-enabled MLX MXFP4 conversion of Jackrong/Qwopus3.6-35B-A3B-Coder, prepared by Shiftedx for Apple Silicon, MLX, and LM Studio.

Build Notes

  • Quantized primary language weights with mxfp4 at group size 32.
  • Kept MoE router and gate modules in affine 8-bit group size 64 for compatibility.
  • Added Qwen3.5/Qwen3.6-MoE-compatible vision components and validated image grounding locally.
  • Removed source MTP tensors and set MTP/next-token prediction layer counts to 0 for LM Studio compatibility.
  • Set tool_parser_type to qwen3_coder.
  • Patched the chat template so enable_thinking defaults to false when the runtime honors that template variable.
  • Applied a research refusal-direction weight edit using residual-direction orthogonalization.

Local Validation

Validated locally on June 30, 2026 and July 1, 2026 with LM Studio and direct MLX/VLM loading.

Check Result
LM Studio load Passed at 32k context, parallel 1, GPU max.
Basic text completion Passed; answered 2+2 with 4 and stopped.
Code completion Passed; produced a simple valid add(a, b) function.
Direct MLX/VLM image color smoke Passed; answered Red.
Direct MLX/VLM OCR smoke Passed; answered FABLE 42.
LM Studio OpenAI-compatible image smoke Passed; answered Red and FABLE 42.
LM Studio native image smoke Passed; answered Red and FABLE 42.
Thinking-off behavior Smoke checks returned 0 reasoning tokens.
LM Studio logs No warnings, errors, tracebacks, KV-cache issues, or tokenizer-regex warnings in the validation window.

This is a smoke-validated release, not a full benchmark suite. Broader downstream evaluation is still recommended for production use.

Recommended LM Studio Defaults

After downloading in LM Studio, load the model by repo name:

lms load shiftedx/qwopus3.6-35b-a3b-coder-abliterated-mxfp4-vision-mlx --context-length 32768 --parallel 1 --gpu max

Recommended profile defaults:

  • Preset/template: Qwen3 thinking-compatible Jinja template with <|im_end|> stop.
  • Thinking: off by default through the included chat template.
  • Context length: 200000 when memory allows; 32768 was used for local validation.
  • Sampling: temperature 0.6, top-k 20, top-p 0.95, min-p enabled at 0.
  • Repeat penalty: off by default.
  • Load: parallel 1, GPU max.

Provenance

Downloads last month
26
Safetensors
Model size
35B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shiftedx/qwopus3.6-35b-a3b-coder-abliterated-mxfp4-vision-mlx

Collection including Shiftedx/qwopus3.6-35b-a3b-coder-abliterated-mxfp4-vision-mlx