Qwen3-Coder-Next GGUF Benchmarks!

#20
by danielhanchen - opened
Unsloth AI org

Benjamin Marie benchmarked Qwen3-Coder-Next using Unsloth and standard Qwen GGUFs on a 750-prompt mixed suite (LiveCodeBench v6, MMLU Pro, GPQA, Math500), reporting both overall accuracy and relative error increase (how much more often the quantized model makes mistakes vs. the original).

The graphs show the Unsloth's Q4_K_M quants perform better than standard Q4_K_M. Q3_K_M expectedly performs worse on Live Code Bench v6, but surprisingly much better on HumanEval than standard Q4_K_M.

HAfMRrrXQAALkQb

Now we have Aider Polyglot benchmarks, comparing Unsloth GGUF quantizations (score vs. VRAM). Notably, the 3-bit UD-IQ3_XXS quant comes close to BF16 performance, making 3-bit a sensible minimum for most use cases.

NVFP4 slightly outperforms the BF16 reference, which may be sampling noise due to limited runs; however, the overall pattern for: 1-bit β†’ 2-bit β†’ 3-bit β†’ 6-bit steadily improving, suggests the benchmark is capturing meaningful quality differences across Unsloth GGUFs. The non-Unsloth FP8 seems to perform worse than both UD-IQ3_XXS and UD-Q6_K_XL, which could reflect differences in the quantization pipeline or, again, insufficient sampling.

image (1)

danielhanchen pinned discussion

hi, is there MXFP4 perform better or badder than Q4_K_M?

i dont understand, why not benchmarking Q4_K_XL ? unsloth's Q4_K_M also special?

@danielhanchen will you be doing MTP update for coder next as well?

Sign up or log in to comment