How to use from the
Use from the
Transformers library
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("AlexWortega/qwen3.5-4b-abliterated-agent-20260515", dtype="auto")
Quick Links

Qwen3.5-4B Abliterated for Agent Use (2026-05-15)

Refusal-direction weight-orthogonalization applied to base Qwen/Qwen3.5-4B following the NousResearch/llm-abliteration / mlabonne abliteration blog recipe. The "I cannot execute commands" axis was extracted by contrasting 50 bare shell-execution prompts vs 50 same prompts with full agent system prompt; this direction was orthogonalized from embed_tokens.weight plus every block's o_proj/out_proj + mlp.down_proj weights (64 projections + 1 embedding modified).

Results — apples-to-apples on full terminal-bench-2 (89 tasks, N=1, sglang)

model passes / 89 rate
Qwen3.5-4B base (sglang, patched parser) 6 6.7 %
abliterated (this repo) 7 7.9 %
SFT LoRA reference (historic) 3 3.4 %

Fisher's exact 7/89 vs 6/89: p = 0.50 — null on aggregate.

But the per-task pattern is informative:

abl ∩ base   : git-leak-recovery, kv-store-grpc, modernize-scientific-stack  (3)
abl only     : fix-git, hf-model-inference, log-summary-date-ranges, qemu-startup  (4)
base only    : build-pmars, portfolio-optimization, sqlite-with-gcov  (3)

Abliteration redistributes which tasks the model solves rather than lifting the count. Net +1 well within noise.

Smoke test (qualitative)

Without an agent system prompt, base Qwen3.5-4B replies:

"I'm an AI assistant, I cannot access local filesystem."

Abliterated replies:

"Let me use ls to list the contents of /tmp."

The refusal pattern is gone. But this doesn't translate to a larger task-completion delta — see results.

How to use

import torch
from transformers import AutoTokenizer, AutoModelForImageTextToText

# Drop-in replacement for Qwen3.5-4B
model = AutoModelForImageTextToText.from_pretrained(
    'AlexWortega/qwen3.5-4b-abliterated-agent-20260515',
    dtype=torch.bfloat16, device_map={'':0})
tokenizer = AutoTokenizer.from_pretrained('AlexWortega/qwen3.5-4b-abliterated-agent-20260515')
# Architecture is unchanged — sglang loads it as a regular Qwen3.5 checkpoint.

GGUF for CPU inference

Pre-quantized GGUF files are bundled under gguf/, benchmarked on AMD EPYC 7402P (24-core Zen 2):

Quant Size tg (decode) pp (prefill) use
qwen3.5-4b-abl.Q4_0.gguf 2.4 GB 20.8 t/s @ 16t 115 t/s @ 16t max throughput
qwen3.5-4b-abl.Q4_K_M.gguf 2.6 GB 19.7 t/s @ 16t 136 t/s @ 24t best balance
qwen3.5-4b-abl.Q5_K_M.gguf 2.9 GB 17.3 t/s @ 16t 91 t/s @ 24t better quality
qwen3.5-4b-abl.Q8_0.gguf 4.2 GB 15.5 t/s @ 16t 110 t/s @ 24t near-lossless

Full benchmark including IQ4_XS, Q6_K, F16 in gguf/BENCH.md.

Quick CPU usage

~/llama.cpp/build/bin/llama-cli -m qwen3.5-4b-abl.Q4_0.gguf -t 16 \
    -p "Your prompt" -n 128 -no-cnv

Use -dev none if your build has CUDA support but you want pure CPU.

What's in this repo

  • abliterated_model/ — 8.5 GB safetensors + tokenizer (drop-in for base Qwen3.5-4B)
  • vectors/refusal_dir.pt — direction tensor at L=22 with metadata
  • vectors/refusal_ranking.csv — per-layer AUC (all layers 1.000)
  • contrast_refuse.jsonl, contrast_comply.jsonl — 50+50 contrast prompts
  • scripts/build_contrast.py, capture_refusal.py, compute_refusal_dir.py, abliterate.py, serve_abliterated.py, full_bench_parallel.sh
  • results/abliterated_full/ — full per-task traces + rewards (89 tasks)
  • results/base_matched_infra/ — same for BASE through identical infra (control)
  • RESULTS.md, RESULTS_APPLES.md, VERIFY.md — full report bundle

How it was made (quick recipe)

  1. Build 50 refusal prompts (bare shell requests) and 50 compliance prompts (same requests with agent system prompt). scripts/build_contrast.py.
  2. Forward base Qwen3.5-4B on each prompt, capture residual at last token of each. scripts/capture_refusal.py.
  3. Compute dir_L = normalize(μ_refuse − μ_comply) per layer. AUC=1.0 on every layer. Pick L=22 (mid-depth, matches v2 framing). scripts/compute_refusal_dir.py.
  4. Orthogonalize: W_new = W − r r^T W for embed_tokens.weight (rows) and every block's output projections (columns). scripts/abliterate.py.
  5. Smoke test (scripts/smoke_abliterated.py) — confirm coherent generation and refusal removal on bare prompts.
  6. Eval via docker sweep against sglang serving this model. scripts/full_bench_parallel.sh.

Caveats

  • N=1 per task. 7/89 vs 6/89 is suggestive at best; per-task pattern is the load-bearing observation.
  • The terminus parser was patched mid-experiment to accept split-JSON output ({"analysis":...} + {"command":...} on separate lines); this patch lifted the base from 3/89 → 6/89, more than abliteration itself contributed.
  • Same direction applied uniformly to all layers (mlabonne convention). Per-layer tuning may yield further gains.
  • Modifies output projections + embeddings only; does NOT touch Q/K/V.
Downloads last month
200
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlexWortega/qwen3.5-4b-abliterated-agent-20260515

Finetuned
Qwen/Qwen3.5-4B
Quantized
(207)
this model