Qwen3.6-27B Abliterated V2 UD Dynamic GGUF banner

Qwen3.6-27B-abliterated-v2-GGUF

UD Dynamic GGUF release of Qwen3.6-27B-abliterated-v2, built with imatrix-calibrated tensor distribution for high-quality local inference in llama.cpp, Ollama, LM Studio, Jan, KoboldCpp, and other GGUF-compatible runtimes.

This repository contains GGUF quantizations of wangzhang/Qwen3.6-27B-abliterated-v2, a second-pass refusal-suppressed variant of Qwen/Qwen3.6-27B.

The goal of this release is simple:

bring the Qwen3.6-27B abliterated V2 checkpoint into a practical local-runtime format, while using a smarter UD Dynamic GGUF tensor distribution instead of blunt uniform quantization.

This is not just “Q4 and pray.” This build is designed around mixed tensor precision, imatrix calibration, and local deployment efficiency.


What this release is

This is a GGUF conversion and quantization release of Qwen3.6-27B-abliterated-v2.

It is designed for:

  • llama.cpp
  • Ollama
  • LM Studio
  • Jan
  • KoboldCpp
  • text-generation-webui GGUF loaders
  • Open WebUI through llama.cpp/Ollama backends
  • local coding agents
  • private desktop assistants
  • low-friction experimentation on consumer hardware

This release uses a UD Dynamic GGUF tensor distribution with imatrix calibration, meaning important tensors are preserved at higher precision while less sensitive tensors are compressed more aggressively.

That gives better practical quality than a naive fixed-bit quant when the quantization recipe is done correctly.


Model lineage

Stage Model
Original base Qwen/Qwen3.6-27B
Abliterated source wangzhang/Qwen3.6-27B-abliterated-v2
This release Qwen3.6-27B-abliterated-v2-GGUF
Format GGUF
Quantization style UD Dynamic GGUF tensor distribution
Calibration imatrix-calibrated GGUF

Why this checkpoint exists

Qwen3.6-27B is a strong local model size class: large enough to handle reasoning, coding, and agent workflows seriously, but still small enough to run on high-end consumer hardware when quantized correctly.

The abliterated V2 source model reduces refusal behavior while trying to preserve coherence and general capability. This GGUF release makes that checkpoint easier to run locally without a full Transformers/vLLM stack.

This release is useful if you want:

  • local uncensored model testing
  • Qwen3.6 reasoning in llama.cpp-compatible runtimes
  • a practical desktop GGUF
  • Ollama-ready deployment
  • coding-agent experiments
  • tool-use testing
  • private long-context chat
  • local red-team or alignment research
  • lower VRAM pressure than BF16/FP16

UD Dynamic GGUF tensor distribution

Standard quantization usually applies a broad quant type across most of the model. That works, but it is crude.

This release instead uses a UD Dynamic-style GGUF tensor distribution:

  • more important tensors are kept at higher precision
  • less sensitive tensors are compressed more aggressively
  • tensor types are distributed according to model-specific sensitivity
  • imatrix calibration is used to guide quantization quality
  • the result targets better quality-per-GB than naive fixed-bit GGUFs

The practical effect: better preservation of reasoning, chat, coding, and instruction-following behavior at a given file size.

Not magic. Just less barbaric.


imatrix calibration

This GGUF release uses imatrix-calibrated quantization.

imatrix calibration helps the quantizer estimate which weights/tensors matter most for model behavior by measuring activation importance over representative calibration data.

Expected benefits:

  • better low-bit behavior
  • less coherence loss
  • improved long-form generation stability
  • better preservation of coding and reasoning behavior
  • fewer quantization-induced weird failures
  • better quality than a non-calibrated quant at the same approximate size

This matters more as bit-width gets lower. Q8 barely cares. Q3 and Q4 care a lot.


Recommended files

Use the largest quant that fits your hardware.

Variant Expected use Notes
UD-Q6_K_XL premium local quality Strong quality/size trade-off. Good if you have enough memory.
UD-Q5_K_XL recommended high-quality daily driver Excellent balance for larger consumer systems.
UD-Q4_K_XL recommended 24GB-class target Best starting point for RTX 3090/4090-class GPUs.
UD-Q4_K_M smaller 4-bit fallback Use when memory is tighter.
UD-Q3_K_XL aggressive compression Test carefully. Good for constrained systems.

If you only want one file for a 24GB GPU, start with:

Qwen3.6-27B-abliterated-v2-UD-Q4_K_XL.gguf
Downloads last month
392
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF

Base model

Qwen/Qwen3.6-27B
Quantized
(8)
this model