---
license: apache-2.0
language:
  - en
  - zh
  - multilingual
tags:
  - qwen3.5
  - qwen35
  - gguf
  - multimodal
  - vision
  - image-text-to-text
  - abliterated
  - uncensored
  - llama.cpp
  - 4b
pipeline_tag: image-text-to-text
base_model:
  - huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated
  - Qwen/Qwen3.5-4B
library_name: llama.cpp
---

# Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated — GGUF

GGUF conversion of [huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated) for use with [llama.cpp](https://github.com/ggerganov/llama.cpp).

## Credits

| Role | Model / Author |
|---|---|
| **Base LLM** | [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) — Alibaba Qwen Team |
| **Abliterated (uncensored)** | [huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated) — Huihui AI |
| **GGUF Conversion** | [hotdogs](https://huggingface.co/hotdogs) — via [llama.cpp](https://github.com/ggerganov/llama.cpp) |

🙏 Huge thanks to **Qwen Team** (Alibaba) for the base model, **Huihui AI** for the abliteration, and **ggerganov** for llama.cpp!

## Model Details

| Spec | Value |
|---|---|
| Parameters | ~4B |
| Architecture | Qwen3.5 Multimodal (QWEN35) |
| hiddensize | 2560 |
| Layers | 32 |
| Attention Heads | 16 (KV: 4) |
| Context Length | **262,144** (256K tokens) |
| FFN Intermediate | 9216 |
| Vision Encoder | 24 layers, hiddensize=1024, patchsize=16 |
| Modality | **image-text-to-text** 🖼️➡️📝 |
| Censorship | **Abliterated** (refusal direction removed) |
| License | Apache 2.0 |

## Available Quantizations

| File | Size | BPW | Quality | Recommended For |
|---|---|---|---|---|
| huihui-qwen35-4b-BF16.gguf | 7.9 GB | 16.00 | ⭐ Full | Best quality, 16GB+ VRAM |
| huihui-qwen35-4b-Q8_0.gguf | 4.2 GB | ~8.00 | ⭐ Very High | Balanced, 8GB+ VRAM |
| huihui-qwen35-4b-Q4_K_M.gguf | 2.6 GB | 5.13 | ⭐ Good | Low VRAM, 6GB+ VRAM |
| mmproj-huihui-qwen35-4b-BF16.gguf | 645 MB | — | Vision | **Multimodal projector** (required for images) |

## Usage

### Text-only

./llama-cli -m huihui-qwen35-4b-Q4_K_M.gguf -p "Hello!" -n 256

### Multimodal (image + text)

./llama-qwen2vl-cli -m huihui-qwen35-4b-Q4_K_M.gguf --mmproj mmproj-huihui-qwen35-4b-BF16.gguf --image photo.jpg -p "Describe this image"

### Server (OpenAI-compatible API)

./llama-server -m huihui-qwen35-4b-Q4_K_M.gguf --mmproj mmproj-huihui-qwen35-4b-BF16.gguf --host 0.0.0.0 --port 8080

### Python (llama-cpp-python)

llm = Llama(model_path="huihui-qwen35-4b-Q4_K_M.gguf", n_ctx=32768)
output = llm("Hello!", max_tokens=128)

## About Abliteration

This model has undergone **directional ablation** — a technique that removes the "refusal direction" from the model's activation space (Arditi et al. 2024). The model will not refuse questions that base Qwen3.5 would normally decline.

**Use responsibly.** Ensure your use case complies with applicable laws.

## Conversion Notes

- Converted with llama.cpp convert_hf_to_gguf.py
- BF16 output type
- QWEN35 architecture, Qwen3VLVisionModel for mmproj
- Metadata preserved from source model