💎 Qwopus-3.6-35B-A3B-Coder - Custom Mixed Precision GGUFs with Imatrix

Qwopus-3.6-35B-A3B-Coder is a practical coding-agent fine-tune focused on execution efficiency, not simply longer visible reasoning. It is designed for real agentic coding workflows where the model repeatedly reads files, chooses tools, edits code, runs tests, reacts to errors, and summarizes work. The core goal is to complete more of these steps with less token waste, lower latency, and more stable behavior when explicit long thinking is disabled.

Base model License

This repository contains custom, highly optimized, multi-tier mixed precision GGUF weights for Jackrong/Qwopus3.6-35B-A3B-Coder.

Qwopus-3.6-35B-A3B-Coder achieves state-of-the-art performance among open-source models of comparable size across a broad range of agentic coding benchmarks.

ℹ️ For advanced agentic and programming tasks, I personally recommend trying out Ornith-1.0-35B for better quality results.

These quants were generated using manual layer targeting to maximize quality while shrinking the massive VRAM footprint of the Mixture of Experts layers.

📊 Importance Matrix (Imatrix)

The following datasets were used for the imatrix:

If you know of any better datasets which may help, feel free to let me know.

📄 GGUF Files

In order of quality:

Filename Size Quants
Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Q8_0_F16-Imatrix.gguf 20.7 GB MXFP4_MOE + Q8_0 + F16
Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Q8_0-Imatrix.gguf 19.8 GB MXFP4_MOE + Q8_0

🔍 Precision Matrix & Flavor Variations

Standard global quantization presets (like stock MXFP4_MOE) compress the backbone layers uniformly, which degrades the delicate reasoning capabilities of advanced agent models.
This repository provides two distinct manual configuration layouts to balance precision and memory constraints:

1. The Tri-Quant Hybrid Flavor (MXFP4 + Q8_0 + F16)

Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Q8_0_F16-Imatrix.gguf - Designed for maximum quality preservation, this layout implements a strict 3-Tier Precision Matrix:

  • Tier 1 (Core & Mamba Gating - F16 Precision):
    • token_embd.weight, output.weight - Protects the critical input/output vocabulary mappings. Dramatically prevents text degradation.
    • ssm_alpha, ssm_beta - Protects the integrity of the Mamba state-space calculations across long-range context tokens.
  • Tier 2 (Backbone & Shared - Q8_0 Precision): ssm_out, *._shexp - Keeps the attention mechanics, and all trailing shared experts at high quality, to protect the logical research loops.
  • Tier 3 (Routed Experts - MXFP4 Precision): ffn_down_exps, ffn_gate_exps, ffn_up_exps - Shrink the massive background expert parameters directly to MXFP4.

2. The Dual-Quant Hybrid Flavor (MXFP4 + Q8_0)

Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Q8_0-Imatrix.gguf - Designed for a slightly leaner memory profile, this layout utilizes 2-Tier Precision:

  • Tier 1 (Backbone - Q8_0 Precision): All attention blocks, Mamba structures, vocabulary embeddings, and internal routers use the universal Q8_0 format.
  • Tier 2 (Experts - MXFP4 Precision): The heavy sparse expert blocks are target-quantized directly to MXFP4.

📝 Exact Conversion Details

These files were converted via llama-quantize utilizing the following manual recipe parameters:

Convert SafeTensors to GGUF:

python convert_hf_to_gguf.py "Qwopus3.6-35B-A3B-Coder/" --outtype f16 --outfile "Qwopus3.6-35B-A3B-Coder_F16.gguf"

Generate Tri-Quant MXFP4_MOE + Q8_0 + F16:

llama-quantize \
  --tensor-type ".*_shexp\.weight=Q8_0" \
  --tensor-type "token_embd\.weight=F16" \
  --tensor-type "^output\.weight=F16" \
  --tensor-type "blk\..*\.(ssm_alpha|ssm_beta)\.weight=F16" \
  --tensor-type "blk\..*\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps)\.weight=MXFP4" \
  --imatrix "imatrix.gguf" \
  "Qwopus3.6-35B-A3B-Coder_F16.gguf" \
  "Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Q8_0_F16-Imatrix.gguf" \
  Q8_0

Generate Dual-Quant MXFP4_MOE + Q8_0:

llama-quantize \
  --tensor-type ".*_shexp\.weight=Q8_0" \
  --tensor-type "blk\..*\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps)\.weight=MXFP4" \
  --imatrix "imatrix.gguf" \
  "Qwopus3.6-35B-A3B-Coder_F16.gguf" \
  "Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Q8_0-Imatrix.gguf" \
  Q8_0

ℹ️ Misc Details

I'm doing this as a side hobby, with my AMD 5900X, 64GB DDR4, RTX 3060 12GB & RTX 5060 Ti 16GB.

🤝 Support the Journey

As a passionate developer, I'm always programming, automating, or experimenting with new ideas.
I love building open-source tools, trying out new web tech, and creating things that don't yet exist, including local AI & quantizing models.

I love sharing these creations to give back to the community.
If my projects have saved you time or helped you out, consider supporting my work below!

👉 Support me on Ko-fi


✨ Acknowledgments

  • Jackrong for the exceptional Qwopus3.6-35B-A3B-Coder base model.

📜 License

See Jackrong/Qwopus3.6-35B-A3B-Coder.

🔗 Citation

@misc{jackrong_qwopus36_35b_a3b_coder,
  title        = {Qwopus-3.6-35B-A3B-Coder},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwopus-3.6-35B-A3B-Coder}}
}
Downloads last month
-
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jashepp/Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Hybrid-Imatrix-GGUF

Collection including jashepp/Qwopus3.6-35B-A3B-Coder-MXFP4_MOE_Hybrid-Imatrix-GGUF