How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
Use Docker
docker model run hf.co/cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF:Q4_K_M
Quick Links

A significantly improved version of this model is available.

This repo quantizes huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated — the standard abliteration. A newer fine-tune of the same architecture, trained in the style of Claude 4.6 Opus, has since been released and produces noticeably richer, more expressive outputs.

➡️ Recommended upgrade: Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated-Q4_K_M-GGUF

Same architecture, same Q4_K_M quantization, same VRAM footprint — just a better fine-tune. This repo will remain available for reference.

Huihui-Qwen3.5-35B-A3B-abliterated — Q4_K_M GGUF

This is a Q4_K_M GGUF quantization of huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated.

Refer to the original model card for full details, usage warnings, and licensing information.

Details

Property Value
Source model huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
Architecture qwen35moe (35B total params, ~3B active; 256 experts, 8 active per token)
Quantization Q4_K_M (~4.8 BPW)
File size ~21 GB
Quantized with llama.cpp

Usage with llama.cpp

llama-cli \
  --hf-repo cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF \
  --hf-file huihui-qwen3.5-35b-a3b-abliterated-Q4_K_M.gguf \
  -p "Tell me about the universe"
llama-server \
  --hf-repo cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF \
  --hf-file huihui-qwen3.5-35b-a3b-abliterated-Q4_K_M.gguf \
  -c 8192

Usage with Ollama

Requires Ollama with qwen35moe support. See PR #14506 for the architecture patch.

ollama run hf.co/cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF

Credits

Downloads last month
154
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cesarsal1nas/Huihui-Qwen3.5-35B-A3B-abliterated-Q4_K_M-GGUF