Qwen3.6-35B-A3B-NVFP4-MTP-GGUF

This repo contains two experimental NVFP4 GGUF quantizations of Qwen3.6-35B-A3B for llama.cpp.
This was quantized using my experimental advanced-gguf-quantizer tool.
Both models were imatrix calibrated for the first time using a new custom dataset that I am evaluating.

This repository contains two NVFP4 variants:

Variant File Best for Notes
TURBO Qwen3.6-35B-A3B-NVFP4-MTP-TURBO.gguf Max speed More NVFP4. Lower quality metrics.
HQ Qwen3.6-35B-A3B-NVFP4-MTP-HQ.gguf Better quality More tensors promoted. Slightly slower.

Quality & Speed Results

All PPL/KLD results were measured against the same BF16 wikitest KLD base, and then compared to the official NVFP4 release by NVIDIA.

Metric TURBO HQ NVIDIA-NVFP4
Size 18.56 GiB 18.64 GiB 22.20 GiB
Mean PPL(Q) 6.987392 6.897796 7.014030
Mean PPL(Q)-PPL(base) 0.268551 0.178955
Mean PPL ratio 1.039970 1.026635 1.043935
Mean ln(PPL ratio) 0.039192 0.026286
Mean KLD 0.063228 0.050759 0.066331
99.9% KLD 1.924147 1.565143 1.560988
99.0% KLD 0.598519 0.488387 0.495896
95.0% KLD 0.221030 0.178889 0.207580
Max KLD 11.946571 10.093911 6.972712
Same top p 89.023% 90.255% 87.608%
Top flip weight 0.012068 0.009575
pp512 11593.57 t/s 10936.20 t/s 10426.32 t/s
tg128 271.21 t/s 270.49 t/s 221.86 t/s

Evaluation Results

Further evaluation tests are underway to identify real world performance differences between TURBO and HQ.

Benchmark Samples TURBO HQ NVIDIA-NVFP4
GSM8K 103 98% 98% 97%
HellaSwag 100 89% 89% 89%
HumanEval 164 96.34% 95.12% 95.12%
Downloads last month
2,651
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for michaelw9999/Qwen3.6-35B-A3B-NVFP4-MTP-GGUF

Quantized
(502)
this model

Collection including michaelw9999/Qwen3.6-35B-A3B-NVFP4-MTP-GGUF