These are quantizations of the model Huihui-Qwen3.5-35B-A3B-abliterated

  • Download the latest llama.cpp to use these quantizations.
  • For the mmproj file, the F32 version is recommended for best results.
  • Order of quality: F32 > BF16 > F16

Read the guide from unsloth in order to set up the model's recommended settings:
Qwen3.5 - How to Run Locally Guide

The mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest.
So I created a new variant, where the other tensors are BF16 instead of Q8.
On some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.

As of 2026-03-03 the chat template has been updated, and is using the fixed version from unsloth.
If you don't want to download the model again, you can just update the chat template.

Downloads last month
520
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noctrex/Huihui-Qwen3.5-35B-A3B-abliterated-MXFP4_MOE-GGUF