Instructions to use OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit")
model = AutoModelForMultimodalLM.from_pretrained("OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit

SGLang

How to use OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit with Docker Model Runner:
```
docker model run hf.co/OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit
```

Support & Community

☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.

buymeacoffee.com/oym.kuato

💬 Discord: discord.gg/rhUZY5GEZr · ₿ Bitcoin: bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv

Qwopus3.5-122B-A10B-abliterated-uncensored

This model is superseded — please use the healed version. This is the older abliterated-uncensored release. The recommended replacement is Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated, which adds a Kimi K2.6 reasoning-DPO healing pass on top of this model: improved reasoning verbosity (~12% of requests) and far fewer looping / repetition failures (2–6% of long-tail conversations).

➡️ Recommended MLX 4-bit build: Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MLX-4bit All formats: Full weights · GGUF · MLX 4-bit

Overview

Full BF16 weights of Qwopus3.5-122B-A10B-abliterated-uncensored — an abliterated and supervised-finetuned variant of Qwen/Qwen3.5-122B-A10B (Mixture of Experts, ~10B active / 122B total). The model is uncensored, multimodal (image + text), and ships with the MTP head intact so it is a drop-in replacement for the original base model at the architecture level.

The pipeline:

Refusal Ablation — Residual-stream refusal directions (one per decoder layer, layers 19–45) were extracted via diff-in-means on a labeled prompt set and baked into the weights as a per-matrix delta — see the abliterix framework for the methodology.
Healing — Stage A: Constrained-LoRA SFT on Opus reasoning data — Supervised finetuned on a curated set of Claude Opus reasoning traces (single-turn, ~8k rows). To keep the abliteration mathematically intact during training, a custom orthogonality projection is applied to every LoRA B-matrix on residual-write modules after each optimizer step (B := B − r·(rᵀB)), so the LoRA update is forbidden from re-introducing the refusal direction. LoRA rank 32, α 64, 54 protected modules across 27 decoder layers. Verified residual after training: max ‖rᵀB‖₂ = 8.5 × 10⁻¹⁰.
Healing — Stage B: Unconstrained SFT on chosen completions — A second short SFT pass (LoRA r=16, α 32, no orthogonality constraint) on the chosen answers (including reasoning chains) from an internal preference dataset, to tighten on the deployment distribution and remove the last bits of drift introduced by Stage A.
Vision + MTP Restoration — The original Qwen3.5 vision tower (333 tensors, depth 27, hidden 1152) and MTP head (785 tensors, 1 hidden layer) were grafted back from the upstream Qwen/Qwen3.5-122B-A10B shards. Tensor names, shapes, and config.json schema (Qwen3_5MoeForConditionalGeneration, model_type: qwen3_5_moe) match the base model exactly — so this checkpoint loads anywhere the original loads.

Key Properties:

Uncensored across the standard refusal axes
Reasoning preserved (Opus-style think-then-answer)
Multimodal: vision (image / video) and MTP heads carried forward
Drop-in shape compatibility with Qwen/Qwen3.5-122B-A10B

Files

File	Description	Size
`model-*-of-00028.safetensors`	BF16 language model weights (48 decoder layers, MoE with 256 routed experts + shared expert per layer)	~228 GB
`model-visual-00001.safetensors`	BF16 vision tower (333 tensors)	~0.9 GB
`model-mtp-00001.safetensors`	BF16 MTP head (785 tensors)	~5.1 GB
`model.safetensors.index.json`	Combined weight map (38,717 tensors)	—
`config.json`	Multimodal config (`Qwen3_5MoeForConditionalGeneration`)	—
`tokenizer*`, `chat_template.jinja`, `generation_config.json`	Standard	—

Total on disk: ~234 GB.

Usage

from transformers import AutoModelForImageTextToText, AutoProcessor

repo = "OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored"
model = AutoModelForImageTextToText.from_pretrained(repo, dtype="bfloat16", device_map="auto")
processor = AutoProcessor.from_pretrained(repo)

messages = [{"role": "user", "content": [
    {"type": "image", "url": "path/to/image.jpg"},
    {"type": "text",  "text": "Describe this image in detail."},
]}]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_tensors="pt", return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

Text-only inference works through the same class; if you don't need vision/MTP, you can also load just the language model with AutoModelForCausalLM.

Hardware

Full BF16 weights — fits comfortably on 2× H200 or 4× H100 (80 GB) with room for context. Single-node inference targets ≥ 130 GB total accelerator memory. For Apple Silicon, see the upcoming MLX quants.

Notes

License: Other (inherits from the Qwen3.5 base license)
Base Model: Qwen/Qwen3.5-122B-A10B
Healing: Supervised finetuned on selected Opus training datasets
Modality: Text + Vision (image / video) + MTP
Architecture: Qwen3 MoE (~10B active / 122B total) + Qwen3-VL vision tower + MTP head

Thanks

Jackrong — for the idea of Qwopus merges (Opus distillations on Qwen models).
wangzhang — for the wonderful abliterix framework, which was customized to do this abliteration.

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Downloads last month: 88

Safetensors

Model size

122B params

Tensor type

BF16

U32

Model tree for OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored-MLX-4bit

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(121)

this model