โก Each donation = another big MoE quantized
I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) โ enough for ~30-50B-class MoEs, but bigger ones (200B+) require rented compute on H100/H200/Blackwell, typically $20-100 per quant.
If APEX quants are useful to you, your support directly funds those bigger runs.
๐ Patreon (Monthly) | โ Buy Me a Coffee | โญ GitHub Sponsors
Qwopus3.6-35B-A3B-Coder โ APEX-MTP GGUF
APEX (Adaptive Precision for EXpert Models) quantizations of Jackrong/Qwopus3.6-35B-A3B-Coder, with the model's MTP (multi-token prediction) head bundled for in-the-box self-speculative decoding.
Brought to you by the LocalAI team | APEX Project | Technical Report
What's different from the plain APEX repo?
This model ships a real MTP head, and these GGUFs bundle it alongside the trunk in a single file (via llama.cpp PR #22673). With a recent llama.cpp you can enable self-speculative decoding from just this one file โ no separate draft model:
llama-server -m Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Balanced.gguf --draft-mtp
The non-MTP version is at mudler/Qwopus3.6-35B-A3B-Coder-APEX-GGUF โ slightly smaller, no self-spec.
MTP draft head precision
The bundled MTP head (blk.40.* including nextn.*) is quantized to Q8_0 (near-lossless) on every tier, keeping draft accuracy high for a good spec-decode acceptance rate at a modest size cost. The MTP head is not imatrix-calibrated (imatrix forward passes only activate the trunk), so it uses static Q8_0.
Available Files
| File | Profile | Best For |
|---|---|---|
| Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Balanced.gguf | I-Balanced | Best overall + self-spec |
| Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Quality.gguf | I-Quality | Highest quality with imatrix + self-spec |
| Qwopus3.6-35B-A3B-Coder-APEX-MTP-Quality.gguf | Quality | Highest quality (no imatrix) |
| Qwopus3.6-35B-A3B-Coder-APEX-MTP-Balanced.gguf | Balanced | General purpose |
| Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Compact.gguf | I-Compact | Consumer GPUs + self-spec |
| Qwopus3.6-35B-A3B-Coder-APEX-MTP-Compact.gguf | Compact | Consumer GPUs |
| Qwopus3.6-35B-A3B-Coder-APEX-MTP-I-Mini.gguf | I-Mini | Smallest viable + self-spec |
| mmproj.gguf | Vision projector | Required for image understanding |
Architecture
- Base: Qwopus3.6-35B-A3B-Coder (Qwen3_5MoeForConditionalGeneration, Qwen3.6-35B-A3B)
- Layers: 40 trunk + 1 MTP (bundled) ยท Experts: 256 routed + 1 shared (8 active)
- Vision: Built-in vision encoder (mmproj included)
- Calibration: v1.3 diverse dataset
Credits
APEX by the LocalAI team. MTP support: llama.cpp PR #22673. Built on llama.cpp. Base model by Jackrong.
- Downloads last month
- -
We're not able to determine the quantization variants.
Model tree for mudler/Qwopus3.6-35B-A3B-Coder-APEX-MTP-GGUF
Base model
Qwen/Qwen3.6-35B-A3B