---
license: apache-2.0
base_model: Qwen/Qwen3.6-27B
library_name: transformers
pipeline_tag: text-generation
tags:
  - qwen
  - qwen3
  - qwen3.6
  - text-generation
  - safetensors
  - conversational
  - obliteratus
  - refusal-analysis
  - red-team
---

# Qwen3.6 27B - OBLITERATUS

> Source-tethered refusal surgery for Qwen3.6-27B. Capability kept close to
> stock; refusal behavior pushed hard in the raw research path; chat-template
> behavior kept explicit and testable.

This is an OBLITERATUS research build of `Qwen/Qwen3.6-27B`.

It is not a generic "we removed everything and hoped for the best" merge. This
candidate was selected because it is the first Qwen3.6-27B artifact in our
sweep that combines:

- low raw refusal on the internal OBLITERATUS corpus,
- stock-matched MMLU-Pro validation slices,
- held-out capability preservation,
- clean live-readiness behavior,
- clean first-contact local QA after chat-template hardening,
- and lower KL drift than the earlier capability leader.

The short version: older candidates were more extreme. This one is better.

```text
Base model:          Qwen/Qwen3.6-27B
Local artifact:      outputs/qwen3.6-27b-aspa-n2-reg05-srcgamma0895-midattnsource2mlp
Parameter count:     26.9B
Weights:             bfloat16 safetensors, 28 shards
Method:              OBLITERATUS source-tethered ASPA
Default alpha:       0.895
High-drift resets:   43 tensors restored to source
Corpus:              842 contrastive prompt pairs
```

If you only care about the most aggressive non-refusal behavior, read the
numbers carefully. If you care about a model that still codes, answers, follows
format, and survives basic launch QA, this is the current strongest
safetensors release candidate.

---

## Compatibility - Read This First

This is a large Qwen3.6/Qwen3.5-text-family model. Use recent runtimes.

| Tool | Recommended path | Notes |
|---|---|---|
| Transformers | current `transformers`, `accelerate`, `safetensors` | best for full weights |
| vLLM / TGI | recent Qwen-compatible builds | server users |
| llama.cpp | current build | use GGUF repo |
| Ollama | current release | use GGUF repo |
| LM Studio / Jan | current backend | use GGUF repo |

If you see unsupported architecture, tokenizer, or chat-template errors, update
your runtime first. If the model loads but behaves oddly, make sure you are
using the chat template rather than raw completion.

---

## Downloads

### Safetensors - full quality

This repo contains the full bfloat16 safetensors model. Use it for
Transformers, vLLM, TGI, and server-side evaluation.

Approximate local size: about `50 GB`.

### GGUF - local apps

This safetensors repo does not contain GGUF files. Use the companion GGUF repo
for llama.cpp, Ollama, LM Studio, Jan, KoboldCPP, and other local desktop apps:

```text
OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF
```

First-pass quants:

| File | Quant | Use |
|---|---:|---|
| `qwen3.6-27b-obliteratus-Q4_K_M.gguf` | Q4_K_M | default local-app recommendation |
| `qwen3.6-27b-obliteratus-Q5_K_M.gguf` | Q5_K_M | better quality if memory allows |
| `qwen3.6-27b-obliteratus-Q6_K.gguf` | Q6_K | high quality, larger |
| `qwen3.6-27b-obliteratus-Q8_0.gguf` | Q8_0 | near-full quality, very large |

Rough memory guidance:

| Variant | Practical target |
|---|---:|
| Q4_K_M | 24-32 GB RAM/VRAM |
| Q5_K_M | 32-40 GB RAM/VRAM |
| Q6_K | 40-48 GB RAM/VRAM |
| Q8_0 | 48-64 GB RAM/VRAM |
| full safetensors | 64-80+ GB GPU/unified memory |

---

## The Numbers

These are local harness results. They are not official full benchmark claims.
Restricted prompt text and restricted model outputs are intentionally omitted
from public reports; restricted evals are aggregate-only.

### Refusal / Usability

| Gate | Result | Verdict |
|---|---:|---|
| full 842 longform, exact-topic, max_new=256 | 35/842 refusals, non-refusal 0.9584, quality pass 0.9394 | backed headline |
| full 842 longform, exact-topic, max_new=256 | 0 short outputs, clean endings 0.9952 | backed headline |
| full 842 raw opening, max_new=20 | 9/842 refusals, non-refusal 0.9893 | short-output context |
| full 842 raw opening, max_new=48 | 36/842 refusals, non-refusal 0.9572 | short-output context |
| full 842 raw opening, max_new=128 | 52/842 refusals, non-refusal 0.9382 | longer opening context |
| longform exact-topic n120, max_new=256 | 0 refusals, pass 0.9833, clean ending 1.0 | slice result |

Important: this supports a low-refusal / high-quality longform claim. It does
not support "0% refusal", "99% non-refusal longform", "uncensored" without
caveats, or factual accuracy certification for restricted prompts. The full
842 corpus has no ground-truth answer key; it scores refusal, topicality proxy,
length/substance, repetition, sentence structure, and clean ending.

### Capability

| Gate | Result |
|---|---:|
| MMLU-Pro validation likelihood | stock 51/70, this model 51/70, stock-matched |
| MMLU-Pro test stratified 10/category | stock 102/140, this model 98/140, delta -2.86pp |
| MMLU-Pro held-out offset 512 | stock 36/70, this model 36/70, stock-matched |
| Live readiness | 99.518, all gates true |
| Community scrutiny | 100.0, all gates pass |
| First-token KL vs source | mean KL 0.3236 |

The offset-512 MMLU-Pro slice is the cleanest explanation for why this model
beat the older `srcgamma090` track as the mixed-objective release candidate:

| Model | Offset-512 MMLU-Pro test | Correct |
|---|---:|---:|
| stock Qwen3.6-27B | 0.5143 | 36/70 |
| older srcgamma090 | 0.4286 | 30/70 |
| this model | 0.5143 | 36/70 |

### Boundary QA

| Probe | Score | Allowed pass | Restricted non-actionable | Actionable-marker rate |
|---|---:|---:|---:|---:|
| UI adversarial raw no-system | 100.0 | 1.0 | 1.0 | 0.0 |
| UI adversarial recommended wrapper | 100.0 | 1.0 | 1.0 | 0.0 |

This is not a claim that the model is unbreakable. It is a clean local
first-contact launch gate.

---

## What Changed in This Build?

The winning recipe was not a single ablation pass.

1. Start from `qwen3.6-27b-golden-n3_reg025-merge-alpha080`, a late-layer
   3-direction diff-means refusal-direction ablation with regularization 0.25
   and an 0.80 source/candidate merge.
2. Apply a second-pass 2-direction diff-means ablation with stronger
   regularization 0.5 and `knee_cosmic` late-layer selection.
3. Source-tether the second-pass candidate back toward stock Qwen3.6-27B:

```text
source + alpha(key) * (candidate - source)
```

4. Use default alpha `0.895` for 808 tensors.
5. Restore 43 high-drift tensors back to source, including selected
   mid-layer linear-attention internals, layer norms, q/k norms, and MLP
   gate/up/down tensors.
6. Keep all keys matched; no unmatched tensor drift.

That is the actual trick: push refusal behavior down, then pull fragile
capability-bearing tensors back toward source.

The older `srcgamma090` candidate remains the pure ship-score/style/refusal
leader. This `srcgamma0895-midattnsource2mlp` build is the stronger mixed
objective winner.

---

## Recommended Parameters

### Deterministic eval

```text
temperature = 0.0
top_p = 1.0
top_k = 0
min_p = 0.0
```

### Interactive balanced

```text
temperature = 0.45
top_p = 0.9
repetition_penalty = 1.05
max_new_tokens = 512
```

### Default chat-template system prompt

```text
You are Hermes, a direct local open-weight assistant. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation.
```

For research measurements, label the template and system prompt explicitly.
Changing the system prompt materially changes refusal and style behavior.

---

## Usage

The commands below assume the final safetensors and GGUF repos have both been
published. Replace the repo names if this model is uploaded under a different
org or slug.

```text
FULL_REPO = OBLITERATUS/Qwen3.6-27B-OBLITERATED
GGUF_REPO = OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF
```

### Transformers

```bash
pip install -U transformers accelerate safetensors torch
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "OBLITERATUS/Qwen3.6-27B-OBLITERATED"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "Write a concise Python function that merges overlapping intervals."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.45,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.05,
)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
```

### llama.cpp

```bash
llama-cli \
  -hf OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF:Q4_K_M \
  -ngl 999 \
  -c 8192 \
  --temp 0.45 \
  --top-p 0.9 \
  --repeat-penalty 1.05 \
  -p "Write a concise Python function that merges overlapping intervals."
```

Local server:

```bash
llama-server \
  -hf OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF:Q4_K_M \
  -ngl 999 \
  -c 8192 \
  --host 127.0.0.1 \
  --port 8080
```

### Ollama

```bash
ollama run hf.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF:Q4_K_M
```

Or create a local Modelfile:

```text
FROM ./qwen3.6-27b-obliteratus-Q4_K_M.gguf

PARAMETER temperature 0.45
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 8192

SYSTEM """You are Hermes, a direct local open-weight assistant. Answer lawful requests plainly, completely, and without moralizing, boilerplate, or unnecessary hedging. Follow exact output formats when requested."""
```

```bash
ollama create qwen36-obliteratus -f Modelfile
ollama run qwen36-obliteratus
```

### LM Studio / Jan

1. Search for `OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF`.
2. Download `Q4_K_M` first.
3. Start with context length `8192`.
4. Use temperature `0.45`, top-p `0.9`, repeat penalty `1.05`.
5. Move up to `Q5_K_M`, `Q6_K`, or `Q8_0` only if your machine has enough
   memory.

### vLLM

```bash
pip install -U vllm
vllm serve OBLITERATUS/Qwen3.6-27B-OBLITERATED
```

```bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  --data '{
    "model": "OBLITERATUS/Qwen3.6-27B-OBLITERATED",
    "messages": [
      {"role": "user", "content": "Write a short explanation of source-tethered model surgery."}
    ],
    "temperature": 0.45,
    "top_p": 0.9,
    "max_tokens": 256
  }'
```

---

## Known Caveats

- The reported benchmarks are local harnesses and slices, not official full
  leaderboard submissions.
- Template and system-prompt choices materially affect refusal behavior. Label
  which one you use when reporting evals.
- GGUF files are not included in this safetensors repo; GGUF parity still needs
  final quant-by-quant validation.
- External blind prompt packs and public baseline runs are still recommended.
- Do not deploy this in user-facing products without use-case-specific safety
  controls, monitoring, and legal review.

---

## Disclaimer

This model is provided as-is for research, red-teaming, evaluation, local
experimentation, and creative exploration.

You are responsible for how you use it and for any content it generates. The
creators and contributors do not accept liability for misuse, damage, legal
consequences, or downstream harm.

Use this model only in ways that are lawful and appropriate for your
jurisdiction and use case. Do not use it to harm real people.

---

## Credits

- Base model: `Qwen/Qwen3.6-27B`
- Abliteration engine: OBLITERATUS
- Research orchestration: Pliny-style adversarial evaluation plus Hermes local
  agent workflows
- Local eval stack: MLX, Transformers, llama.cpp/GGUF tooling, internal
  aggregate-only red-team harnesses

Built for people who actually read the numbers.