Instructions to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF",
	filename="Qwen3.6-27B-abliterated-UD-Q3_K_XL.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Use Docker

docker model run hf.co/groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

LM Studio
Jan
Ollama
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Ollama:
```
ollama run hf.co/groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
```

Unsloth Studio

How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF to start chatting

How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Docker Model Runner:
```
docker model run hf.co/groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
```

Lemonade

How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL

Run and chat with the model

lemonade run user.Qwen3.6-27B-abliterated-v2-UD-GGUF-UD-Q4_K_XL

List all available models

lemonade list

Qwen3.6-27B Abliterated V2 UD Dynamic GGUF banner

Qwen3.6-27B-abliterated-v2-GGUF

UD Dynamic GGUF release of Qwen3.6-27B-abliterated-v2, built with imatrix-calibrated tensor distribution for high-quality local inference in llama.cpp, Ollama, LM Studio, Jan, KoboldCpp, and other GGUF-compatible runtimes.

This repository contains GGUF quantizations of wangzhang/Qwen3.6-27B-abliterated-v2, a second-pass refusal-suppressed variant of Qwen/Qwen3.6-27B.

The goal of this release is simple:

bring the Qwen3.6-27B abliterated V2 checkpoint into a practical local-runtime format, while using a smarter UD Dynamic GGUF tensor distribution instead of blunt uniform quantization.

This is not just “Q4 and pray.” This build is designed around mixed tensor precision, imatrix calibration, and local deployment efficiency.

What this release is

This is a GGUF conversion and quantization release of Qwen3.6-27B-abliterated-v2.

It is designed for:

llama.cpp
Ollama
LM Studio
Jan
KoboldCpp
text-generation-webui GGUF loaders
Open WebUI through llama.cpp/Ollama backends
local coding agents
private desktop assistants
low-friction experimentation on consumer hardware

This release uses a UD Dynamic GGUF tensor distribution with imatrix calibration, meaning important tensors are preserved at higher precision while less sensitive tensors are compressed more aggressively.

That gives better practical quality than a naive fixed-bit quant when the quantization recipe is done correctly.

Model lineage

Stage	Model
Original base	`Qwen/Qwen3.6-27B`
Abliterated source	`wangzhang/Qwen3.6-27B-abliterated-v2`
This release	`Qwen3.6-27B-abliterated-v2-GGUF`
Format	GGUF
Quantization style	UD Dynamic GGUF tensor distribution
Calibration	imatrix-calibrated GGUF

Why this checkpoint exists

Qwen3.6-27B is a strong local model size class: large enough to handle reasoning, coding, and agent workflows seriously, but still small enough to run on high-end consumer hardware when quantized correctly.

The abliterated V2 source model reduces refusal behavior while trying to preserve coherence and general capability. This GGUF release makes that checkpoint easier to run locally without a full Transformers/vLLM stack.

This release is useful if you want:

local uncensored model testing
Qwen3.6 reasoning in llama.cpp-compatible runtimes
a practical desktop GGUF
Ollama-ready deployment
coding-agent experiments
tool-use testing
private long-context chat
local red-team or alignment research
lower VRAM pressure than BF16/FP16

UD Dynamic GGUF tensor distribution

Standard quantization usually applies a broad quant type across most of the model. That works, but it is crude.

This release instead uses a UD Dynamic-style GGUF tensor distribution:

more important tensors are kept at higher precision
less sensitive tensors are compressed more aggressively
tensor types are distributed according to model-specific sensitivity
imatrix calibration is used to guide quantization quality
the result targets better quality-per-GB than naive fixed-bit GGUFs

The practical effect: better preservation of reasoning, chat, coding, and instruction-following behavior at a given file size.

Not magic. Just less barbaric.

imatrix calibration

This GGUF release uses imatrix-calibrated quantization.

imatrix calibration helps the quantizer estimate which weights/tensors matter most for model behavior by measuring activation importance over representative calibration data.

Expected benefits:

better low-bit behavior
less coherence loss
improved long-form generation stability
better preservation of coding and reasoning behavior
fewer quantization-induced weird failures
better quality than a non-calibrated quant at the same approximate size

This matters more as bit-width gets lower. Q8 barely cares. Q3 and Q4 care a lot.

Recommended files

Use the largest quant that fits your hardware.

Variant	Expected use	Notes
`UD-Q6_K_XL`	premium local quality	Strong quality/size trade-off. Good if you have enough memory.
`UD-Q5_K_XL`	recommended high-quality daily driver	Excellent balance for larger consumer systems.
`UD-Q4_K_XL`	recommended 24GB-class target	Best starting point for RTX 3090/4090-class GPUs.
`UD-Q4_K_M`	smaller 4-bit fallback	Use when memory is tighter.
`UD-Q3_K_XL`	aggressive compression	Test carefully. Good for constrained systems.

If you only want one file for a 24GB GPU, start with:

Qwen3.6-27B-abliterated-v2-UD-Q4_K_XL.gguf

Downloads last month: 392

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF

Base model

Qwen/Qwen3.6-27B

Finetuned

wangzhang/Qwen3.6-27B-abliterated

Quantized

(8)

this model