⚡ Each donation = another big MoE quantized

I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) — enough for ~30-50B-class MoEs, but bigger ones (200B+) require rented compute on H100/H200/Blackwell, typically $20-100 per quant.
If APEX quants are useful to you, your support directly funds those bigger runs.

🎉 Patreon (Monthly)  |  ☕ Buy Me a Coffee  |  ⭐ GitHub Sponsors

Qwopus3.6-35B-A3B-Coder — APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of Jackrong/Qwopus3.6-35B-A3B-Coder — a Qwen3.6-35B-A3B MoE tuned for coding.

Brought to you by the LocalAI team | APEX Project | Technical Report

This model ships an MTP head — for self-speculative decoding out of the box, see the MTP-bundled repo: mudler/Qwopus3.6-35B-A3B-Coder-APEX-MTP-GGUF.

Available Files

File Profile Best For
Qwopus3.6-35B-A3B-Coder-APEX-I-Balanced.gguf I-Balanced Best overall — imatrix-enhanced
Qwopus3.6-35B-A3B-Coder-APEX-I-Quality.gguf I-Quality Highest quality with imatrix
Qwopus3.6-35B-A3B-Coder-APEX-Quality.gguf Quality Highest quality (no imatrix)
Qwopus3.6-35B-A3B-Coder-APEX-Balanced.gguf Balanced General purpose
Qwopus3.6-35B-A3B-Coder-APEX-I-Compact.gguf I-Compact Consumer GPUs, imatrix-enhanced
Qwopus3.6-35B-A3B-Coder-APEX-Compact.gguf Compact Consumer GPUs
Qwopus3.6-35B-A3B-Coder-APEX-I-Mini.gguf I-Mini Smallest viable, fastest inference
mmproj.gguf Vision projector Required for image understanding

What is APEX?

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers (first/last 5) get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, agentic traces, Wikipedia).

See the APEX project for full details.

Architecture

  • Model: Qwopus3.6-35B-A3B-Coder (Qwen3_5MoeForConditionalGeneration, Qwen3.6-35B-A3B base)
  • Layers: 40 · Experts: 256 routed + 1 shared (8 active) · Total/Active: ~35B / ~3B
  • Attention: Hybrid (full attention every 4th layer, linear otherwise)
  • Vision: Built-in vision encoder (mmproj included)
  • Calibration: v1.3 diverse dataset

Run with LocalAI

local-ai run mudler/Qwopus3.6-35B-A3B-Coder-APEX-GGUF@Qwopus3.6-35B-A3B-Coder-APEX-I-Balanced.gguf

Credits

APEX is brought to you by the LocalAI team. Built on llama.cpp. Base model by Jackrong.

Downloads last month
-
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/Qwopus3.6-35B-A3B-Coder-APEX-GGUF