---
license: other
license_name: tongyi-qianwen
base_model:
- wangzhang/Qwen3.6-27B-abliterated-v2
tags:
- qwen
- qwen3
- qwen3.6
- qwen3.6-27b
- gguf
- llama.cpp
- ollama
- lm-studio
- koboldcpp
- jan
- quantized
- ud-dynamic
- dynamic-gguf
- imatrix
- abliterated
- uncensored
- abliterix
- hybrid-attention
- gated-deltanet
- agentic-coding
- reasoning
- tool-use
- long-context
---

<p align="center">
  <img src="qwent.png" alt="Qwen3.6-27B Abliterated V2 UD Dynamic GGUF banner" width="100%">
</p>

# Qwen3.6-27B-abliterated-v2-GGUF

**UD Dynamic GGUF release of Qwen3.6-27B-abliterated-v2, built with imatrix-calibrated tensor distribution for high-quality local inference in llama.cpp, Ollama, LM Studio, Jan, KoboldCpp, and other GGUF-compatible runtimes.**

This repository contains GGUF quantizations of [`wangzhang/Qwen3.6-27B-abliterated-v2`](https://huggingface.co/wangzhang/Qwen3.6-27B-abliterated-v2), a second-pass refusal-suppressed variant of [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).

The goal of this release is simple:

**bring the Qwen3.6-27B abliterated V2 checkpoint into a practical local-runtime format, while using a smarter UD Dynamic GGUF tensor distribution instead of blunt uniform quantization.**

This is not just “Q4 and pray.” This build is designed around mixed tensor precision, imatrix calibration, and local deployment efficiency.

---

## What this release is

This is a **GGUF conversion and quantization release** of Qwen3.6-27B-abliterated-v2.

It is designed for:

- llama.cpp
- Ollama
- LM Studio
- Jan
- KoboldCpp
- text-generation-webui GGUF loaders
- Open WebUI through llama.cpp/Ollama backends
- local coding agents
- private desktop assistants
- low-friction experimentation on consumer hardware

This release uses a **UD Dynamic GGUF tensor distribution** with **imatrix calibration**, meaning important tensors are preserved at higher precision while less sensitive tensors are compressed more aggressively.

That gives better practical quality than a naive fixed-bit quant when the quantization recipe is done correctly.

---

## Model lineage

| Stage | Model |
|---|---|
| Original base | `Qwen/Qwen3.6-27B` |
| Abliterated source | `wangzhang/Qwen3.6-27B-abliterated-v2` |
| This release | `Qwen3.6-27B-abliterated-v2-GGUF` |
| Format | GGUF |
| Quantization style | UD Dynamic GGUF tensor distribution |
| Calibration | imatrix-calibrated GGUF |

---

## Why this checkpoint exists

Qwen3.6-27B is a strong local model size class: large enough to handle reasoning, coding, and agent workflows seriously, but still small enough to run on high-end consumer hardware when quantized correctly.

The abliterated V2 source model reduces refusal behavior while trying to preserve coherence and general capability. This GGUF release makes that checkpoint easier to run locally without a full Transformers/vLLM stack.

This release is useful if you want:

- local uncensored model testing
- Qwen3.6 reasoning in llama.cpp-compatible runtimes
- a practical desktop GGUF
- Ollama-ready deployment
- coding-agent experiments
- tool-use testing
- private long-context chat
- local red-team or alignment research
- lower VRAM pressure than BF16/FP16

---

## UD Dynamic GGUF tensor distribution

Standard quantization usually applies a broad quant type across most of the model. That works, but it is crude.

This release instead uses a **UD Dynamic-style GGUF tensor distribution**:

- more important tensors are kept at higher precision
- less sensitive tensors are compressed more aggressively
- tensor types are distributed according to model-specific sensitivity
- imatrix calibration is used to guide quantization quality
- the result targets better quality-per-GB than naive fixed-bit GGUFs

The practical effect: better preservation of reasoning, chat, coding, and instruction-following behavior at a given file size.

Not magic. Just less barbaric.

---

## imatrix calibration

This GGUF release uses **imatrix-calibrated quantization**.

imatrix calibration helps the quantizer estimate which weights/tensors matter most for model behavior by measuring activation importance over representative calibration data.

Expected benefits:

- better low-bit behavior
- less coherence loss
- improved long-form generation stability
- better preservation of coding and reasoning behavior
- fewer quantization-induced weird failures
- better quality than a non-calibrated quant at the same approximate size

This matters more as bit-width gets lower. Q8 barely cares. Q3 and Q4 care a lot.

---

## Recommended files

Use the largest quant that fits your hardware.

| Variant | Expected use | Notes |
|---|---|---|
| `UD-Q6_K_XL` | premium local quality | Strong quality/size trade-off. Good if you have enough memory. |
| `UD-Q5_K_XL` | recommended high-quality daily driver | Excellent balance for larger consumer systems. |
| `UD-Q4_K_XL` | recommended 24GB-class target | Best starting point for RTX 3090/4090-class GPUs. |
| `UD-Q4_K_M` | smaller 4-bit fallback | Use when memory is tighter. |
| `UD-Q3_K_XL` | aggressive compression | Test carefully. Good for constrained systems. |

If you only want one file for a 24GB GPU, start with:

```bash
Qwen3.6-27B-abliterated-v2-UD-Q4_K_XL.gguf