A significantly improved version of this model is available.

This repo quantizes huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated — the standard abliteration. A newer fine-tune of the same architecture, trained in the style of Claude 4.6 Opus, has since been released and produces noticeably richer, more expressive outputs.

➡️ Recommended upgrade: Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated-Q4_K_M-GGUF

Same architecture, same Q4_K_M quantization, same VRAM footprint — just a better fine-tune. This repo will remain available for reference.

Huihui-Qwen3.5-35B-A3B-abliterated — Q4_K_M GGUF

This is a Q4_K_M GGUF quantization of huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated.

Refer to the original model card for full details, usage warnings, and licensing information.

Details

Property Value
Source model huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
Architecture qwen35moe (35B total params, ~3B active; 256 experts, 8 active per token)
Quantization Q4_K_M (~4.8 BPW)
File size ~21 GB
Quantized with llama.cpp

Usage with llama.cpp

llama-cli \
  --hf-repo cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF \
  --hf-file huihui-qwen3.5-35b-a3b-abliterated-Q4_K_M.gguf \
  -p "Tell me about the universe"
llama-server \
  --hf-repo cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF \
  --hf-file huihui-qwen3.5-35b-a3b-abliterated-Q4_K_M.gguf \
  -c 8192

Usage with Ollama

Requires Ollama with qwen35moe support. See PR #14506 for the architecture patch.

ollama run hf.co/cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF

Credits

Downloads last month
108
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF