How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TNT3530/Qwen3.5-122B-A10B-abliterated-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwen3.5-122B-A10B-abliterated

Abliterated version of Qwen/Qwen3.5-122B-A10B with refusal direction removed.

Abliteration Details

  • Method: Refusal direction projection removal (Arditi et al., 2024)
  • Layers ablated: 5 (layers 43-47, covering both self_attn and linear_attn/Mamba layers)
  • Tensors modified: 10 (o_proj/out_proj + q_proj/in_proj_qkv per layer)
  • Alpha: 1.0 (full removal)
  • Measurement: 64 harmful + 64 harmless prompts, strongest refusal signal at layer 48 (score: 78.4)

Architecture

  • Type: Mixture of Experts (MoE) + Mamba hybrid attention
  • Total params: 122B
  • Active params: 10B per token (8/256 experts routed + 1 shared)
  • Context: 262K tokens native
  • Layers: 48 (13 self_attn + 36 linear_attn/Mamba)

Usage

Compatible with vLLM, transformers, and other inference frameworks that support Qwen3.5 MoE architecture.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Chompa1422/Qwen3.5-122B-A10B-abliterated",
    device_map="auto",
    trust_remote_code=True,
    dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained("Chompa1422/Qwen3.5-122B-A10B-abliterated")

Disclaimer

This model is intended for authorized security testing, CTF competitions, and educational purposes only.

Downloads last month
111
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TNT3530/Qwen3.5-122B-A10B-abliterated-GGUF

Quantized
(7)
this model