---
license: apache-2.0
base_model: huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated
tags:
- qwen3.6
- gguf
- llama.cpp
- abliterated
- uncensored
- roleplay
---

# Huihui Qwen3.6-35B A3B Abliterated (GGUF)

This repository provides GGUF format quantizations for the [huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated) model. 

Because this model has been fully "abliterated" to bypass alignment and safety refusals, it acts as a highly capable engine for unrestricted creative writing, dynamic storytelling, and immersive roleplay scenarios.

## Available Quantizations

| File | Bit Size | Description |
|------|----------|-------------|
| `huihui-35B-Q8_0.gguf` | 8-bit | Highest quality quant, virtually indistinguishable from F16. |
| `huihui-35B-Q6_K.gguf` | 6-bit | Excellent quality with a noticeably reduced memory footprint. |
| `huihui-35B-Q5_K_M.gguf`| 5-bit | Great balance between reasoning performance and RAM usage. |
| `huihui-35B-Q4_K_M.gguf`| 4-bit | **Recommended.** The optimal sweet spot for speed and quality. |
| `huihui-35B-Q4_K_S.gguf`| 4-bit | Slightly smaller than K_M, allowing for faster inference on constrained setups. |
| `huihui-35B-Q3_K_M.gguf`| 3-bit | Lowest resource requirement, though perplexity loss becomes more noticeable. |

## Quick Start (llama.cpp)

These models are designed to be run directly via `llama.cpp`. The following commands are standard for local Linux environments (such as Linux Mint or Ubuntu).

**1. Clone and compile via CMake:**
```bash
git clone [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
cd llama.cpp
cmake -B build
cmake --build build --config Release