⚡ Each donation = another big MoE quantized

I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) — enough for ~30-50B-class MoEs, but bigger ones (200B+) require rented compute on H100/H200/Blackwell, typically $20-100 per quant.
If APEX quants are useful to you, your support directly funds those bigger runs.

🎉 Patreon (Monthly) | ☕ Buy Me a Coffee | ⭐ GitHub Sponsors

Qwopus3.6-35B-A3B-Coder — APEX-MTP GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of Jackrong/Qwopus3.6-35B-A3B-Coder, with the model's MTP (multi-token prediction) head bundled for in-the-box self-speculative decoding.

Brought to you by the LocalAI team | APEX Project | Technical Report

What's different from the plain APEX repo?

This model ships a real MTP head, and these GGUFs bundle it alongside the trunk in a single file (via llama.cpp PR #22673). With a recent llama.cpp you can enable self-speculative decoding from just this one file — no separate draft model:

llama-server -m Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Balanced.gguf --draft-mtp

The non-MTP version is at mudler/Qwopus3.6-35B-A3B-Coder-APEX-GGUF — slightly smaller, no self-spec.

MTP draft head precision

The bundled MTP head (blk.40.* including nextn.*) is quantized to Q8_0 (near-lossless) on every tier, keeping draft accuracy high for a good spec-decode acceptance rate at a modest size cost. The MTP head is not imatrix-calibrated (imatrix forward passes only activate the trunk), so it uses static Q8_0.

Available Files

File	Profile	Best For
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Balanced.gguf	I-Balanced	Best overall + self-spec
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Quality.gguf	I-Quality	Highest quality with imatrix + self-spec
Qwopus3.6-35B-A3B-Coder-APEX-MTP-Quality.gguf	Quality	Highest quality (no imatrix)
Qwopus3.6-35B-A3B-Coder-APEX-MTP-Balanced.gguf	Balanced	General purpose
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Compact.gguf	I-Compact	Consumer GPUs + self-spec
Qwopus3.6-35B-A3B-Coder-APEX-MTP-Compact.gguf	Compact	Consumer GPUs
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Mini.gguf	I-Mini	Smallest viable + self-spec
mmproj.gguf	Vision projector	Required for image understanding

Architecture

Base: Qwopus3.6-35B-A3B-Coder (Qwen3_5MoeForConditionalGeneration, Qwen3.6-35B-A3B)
Layers: 40 trunk + 1 MTP (bundled) · Experts: 256 routed + 1 shared (8 active)
Vision: Built-in vision encoder (mmproj included)
Calibration: v1.3 diverse dataset