โšก Each donation = another big MoE quantized

I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) โ€” enough for ~30-50B-class MoEs, but bigger ones (200B+) require rented compute on H100/H200/Blackwell, typically $20-100 per quant.
If APEX quants are useful to you, your support directly funds those bigger runs.

๐ŸŽ‰ Patreon (Monthly)  |  โ˜• Buy Me a Coffee  |  โญ GitHub Sponsors

Qwopus3.6-35B-A3B-Coder โ€” APEX-MTP GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of Jackrong/Qwopus3.6-35B-A3B-Coder, with the model's MTP (multi-token prediction) head bundled for in-the-box self-speculative decoding.

Brought to you by the LocalAI team | APEX Project | Technical Report

What's different from the plain APEX repo?

This model ships a real MTP head, and these GGUFs bundle it alongside the trunk in a single file (via llama.cpp PR #22673). With a recent llama.cpp you can enable self-speculative decoding from just this one file โ€” no separate draft model:

llama-server -m Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Balanced.gguf --draft-mtp

The non-MTP version is at mudler/Qwopus3.6-35B-A3B-Coder-APEX-GGUF โ€” slightly smaller, no self-spec.

MTP draft head precision

The bundled MTP head (blk.40.* including nextn.*) is quantized to Q8_0 (near-lossless) on every tier, keeping draft accuracy high for a good spec-decode acceptance rate at a modest size cost. The MTP head is not imatrix-calibrated (imatrix forward passes only activate the trunk), so it uses static Q8_0.

Available Files

File Profile Best For
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Balanced.gguf I-Balanced Best overall + self-spec
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Quality.gguf I-Quality Highest quality with imatrix + self-spec
Qwopus3.6-35B-A3B-Coder-APEX-MTP-Quality.gguf Quality Highest quality (no imatrix)
Qwopus3.6-35B-A3B-Coder-APEX-MTP-Balanced.gguf Balanced General purpose
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Compact.gguf I-Compact Consumer GPUs + self-spec
Qwopus3.6-35B-A3B-Coder-APEX-MTP-Compact.gguf Compact Consumer GPUs
Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Mini.gguf I-Mini Smallest viable + self-spec
mmproj.gguf Vision projector Required for image understanding

Architecture

  • Base: Qwopus3.6-35B-A3B-Coder (Qwen3_5MoeForConditionalGeneration, Qwen3.6-35B-A3B)
  • Layers: 40 trunk + 1 MTP (bundled) ยท Experts: 256 routed + 1 shared (8 active)
  • Vision: Built-in vision encoder (mmproj included)
  • Calibration: v1.3 diverse dataset

Credits

APEX by the LocalAI team. MTP support: llama.cpp PR #22673. Built on llama.cpp. Base model by Jackrong.

Downloads last month
-
GGUF
Model size
0.4B params
Architecture
clip
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mudler/Qwopus3.6-35B-A3B-Coder-APEX-MTP-GGUF