Instructions to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF", filename="Qwen3.6-27B-abliterated-UD-Q3_K_XL.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: ./llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: ./build/bin/llama-cli -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Use Docker
docker model run hf.co/groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
- LM Studio
- Jan
- Ollama
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Ollama:
ollama run hf.co/groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
- Unsloth Studio
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF to start chatting
- Pi
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Docker Model Runner:
docker model run hf.co/groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
- Lemonade
How to use groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull groxaxo/Qwen3.6-27B-abliterated-v2-UD-GGUF:UD-Q4_K_XL
Run and chat with the model
lemonade run user.Qwen3.6-27B-abliterated-v2-UD-GGUF-UD-Q4_K_XL
List all available models
lemonade list
Qwen3.6-27B-abliterated-v2-GGUF
UD Dynamic GGUF release of Qwen3.6-27B-abliterated-v2, built with imatrix-calibrated tensor distribution for high-quality local inference in llama.cpp, Ollama, LM Studio, Jan, KoboldCpp, and other GGUF-compatible runtimes.
This repository contains GGUF quantizations of wangzhang/Qwen3.6-27B-abliterated-v2, a second-pass refusal-suppressed variant of Qwen/Qwen3.6-27B.
The goal of this release is simple:
bring the Qwen3.6-27B abliterated V2 checkpoint into a practical local-runtime format, while using a smarter UD Dynamic GGUF tensor distribution instead of blunt uniform quantization.
This is not just “Q4 and pray.” This build is designed around mixed tensor precision, imatrix calibration, and local deployment efficiency.
What this release is
This is a GGUF conversion and quantization release of Qwen3.6-27B-abliterated-v2.
It is designed for:
- llama.cpp
- Ollama
- LM Studio
- Jan
- KoboldCpp
- text-generation-webui GGUF loaders
- Open WebUI through llama.cpp/Ollama backends
- local coding agents
- private desktop assistants
- low-friction experimentation on consumer hardware
This release uses a UD Dynamic GGUF tensor distribution with imatrix calibration, meaning important tensors are preserved at higher precision while less sensitive tensors are compressed more aggressively.
That gives better practical quality than a naive fixed-bit quant when the quantization recipe is done correctly.
Model lineage
| Stage | Model |
|---|---|
| Original base | Qwen/Qwen3.6-27B |
| Abliterated source | wangzhang/Qwen3.6-27B-abliterated-v2 |
| This release | Qwen3.6-27B-abliterated-v2-GGUF |
| Format | GGUF |
| Quantization style | UD Dynamic GGUF tensor distribution |
| Calibration | imatrix-calibrated GGUF |
Why this checkpoint exists
Qwen3.6-27B is a strong local model size class: large enough to handle reasoning, coding, and agent workflows seriously, but still small enough to run on high-end consumer hardware when quantized correctly.
The abliterated V2 source model reduces refusal behavior while trying to preserve coherence and general capability. This GGUF release makes that checkpoint easier to run locally without a full Transformers/vLLM stack.
This release is useful if you want:
- local uncensored model testing
- Qwen3.6 reasoning in llama.cpp-compatible runtimes
- a practical desktop GGUF
- Ollama-ready deployment
- coding-agent experiments
- tool-use testing
- private long-context chat
- local red-team or alignment research
- lower VRAM pressure than BF16/FP16
UD Dynamic GGUF tensor distribution
Standard quantization usually applies a broad quant type across most of the model. That works, but it is crude.
This release instead uses a UD Dynamic-style GGUF tensor distribution:
- more important tensors are kept at higher precision
- less sensitive tensors are compressed more aggressively
- tensor types are distributed according to model-specific sensitivity
- imatrix calibration is used to guide quantization quality
- the result targets better quality-per-GB than naive fixed-bit GGUFs
The practical effect: better preservation of reasoning, chat, coding, and instruction-following behavior at a given file size.
Not magic. Just less barbaric.
imatrix calibration
This GGUF release uses imatrix-calibrated quantization.
imatrix calibration helps the quantizer estimate which weights/tensors matter most for model behavior by measuring activation importance over representative calibration data.
Expected benefits:
- better low-bit behavior
- less coherence loss
- improved long-form generation stability
- better preservation of coding and reasoning behavior
- fewer quantization-induced weird failures
- better quality than a non-calibrated quant at the same approximate size
This matters more as bit-width gets lower. Q8 barely cares. Q3 and Q4 care a lot.
Recommended files
Use the largest quant that fits your hardware.
| Variant | Expected use | Notes |
|---|---|---|
UD-Q6_K_XL |
premium local quality | Strong quality/size trade-off. Good if you have enough memory. |
UD-Q5_K_XL |
recommended high-quality daily driver | Excellent balance for larger consumer systems. |
UD-Q4_K_XL |
recommended 24GB-class target | Best starting point for RTX 3090/4090-class GPUs. |
UD-Q4_K_M |
smaller 4-bit fallback | Use when memory is tighter. |
UD-Q3_K_XL |
aggressive compression | Test carefully. Good for constrained systems. |
If you only want one file for a 24GB GPU, start with:
Qwen3.6-27B-abliterated-v2-UD-Q4_K_XL.gguf
- Downloads last month
- 392
3-bit
4-bit
5-bit
6-bit