---
library_name: mlx
license: apache-2.0
base_model: Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled
pipeline_tag: image-text-to-text
tags:
  - mlx
  - mlx-vlm
  - qwen3.5
  - heretic
  - uncensored
  - abliterated
  - multimodal
  - vision
---

# Qwen3.5-9B Distilled OPUS Heretic - MLX-VLM 8bit

8-bit quantized MLX-VLM conversion of an abliterated Qwen3.5-9B model distilled from Claude Opus 4.6 reasoning, optimized for Apple Silicon.

**Size:** ~9.8 GB | **Bits/weight:** 8.864 | **Quality:** Good balance of quality and size

## Background

This model starts from [Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled](https://huggingface.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled), a Qwen3.5-9B base fine-tuned via knowledge distillation from Claude Opus 4.6 to replicate its chain-of-thought reasoning style.

Abliteration was applied using the technique from Arditi et al. (2024), adapted with a custom script to handle the hybrid DeltaNet/full-attention architecture. The result is a model that retains strong reasoning and vision capabilities while removing refusal behavior.

The model was then converted to MLX-VLM format and quantized to 8-bit for Apple Silicon inference.

## Architecture

- **Type:** Qwen3_5ForConditionalGeneration (multimodal)
- **Layers:** 32 total — 24 linear attention (DeltaNet) + 8 full attention
- **Hidden size:** 4096 | **Intermediate size:** 12288
- **Vision encoder:** 27-layer ViT
- **Inputs:** Text, images, video

## Confirmed Capabilities

- **Vision:** Correctly describes image content
- **Reasoning:** Step-by-step mathematical problem solving (e.g., integration by parts)
- **Uncensored:** Responds to sensitive prompts without refusal

## Usage

```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "andrevp/Qwen3.5-9B-Distilled-OPUS-Heretic-MLX-VLM-8bit"
model, processor = load(model_path)
config = load_config(model_path)

# Text-only
prompt = apply_chat_template(processor, config, "Your question here", num_images=0)
result = generate(model, processor, prompt, max_tokens=500)
print(result.text)

# Vision
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
result = generate(model, processor, prompt, max_tokens=500, image=["image.jpg"])
print(result.text)
```

## Model Family

| Model | Size | Bits/Weight | Notes |
|---|---|---|---|
| [andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-fp16](https://huggingface.co/andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-fp16) | ~4 GB | 16 | 2B, best quality |
| [andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-8bit](https://huggingface.co/andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-8bit) | ~2.1 GB | 8 | 2B, balanced |
| [andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-4bit](https://huggingface.co/andrevp/Qwen3.5-2B-Distilled-OPUS-Heretic-MLX-VLM-4bit) | ~1.2 GB | 4 | 2B, smallest |
| [andrevp/Qwen3.5-9B-Distilled-OPUS-Heretic-MLX-VLM-fp16](https://huggingface.co/andrevp/Qwen3.5-9B-Distilled-OPUS-Heretic-MLX-VLM-fp16) | ~18 GB | 16 | 9B, best quality |
| **andrevp/Qwen3.5-9B-Distilled-OPUS-Heretic-MLX-VLM-8bit** | ~9.8 GB | 8.864 | **This model** |
| [andrevp/Qwen3.5-9B-Distilled-OPUS-Heretic-MLX-VLM-4bit](https://huggingface.co/andrevp/Qwen3.5-9B-Distilled-OPUS-Heretic-MLX-VLM-4bit) | ~5.6 GB | 5.059 | 9B, smallest |

## Credits

- Base distillation: [Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled](https://huggingface.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled)
- Abliteration technique: Arditi et al., "Refusal in Language Models Is Mediated by a Single Direction" (2024)
- MLX-VLM framework: [Apple MLX-VLM](https://github.com/Blaizzy/mlx-vlm)