--- license: apache-2.0 base_model: Qwen/Qwen3.6-27B library_name: transformers pipeline_tag: text-generation tags: - qwen - qwen3 - qwen3.6 - text-generation - safetensors - conversational - obliteratus - refusal-analysis - red-team --- # Qwen3.6 27B - OBLITERATUS > Source-tethered refusal surgery for Qwen3.6-27B. Capability kept close to > stock; refusal behavior pushed hard in the raw research path; chat-template > behavior kept explicit and testable. This is an OBLITERATUS research build of `Qwen/Qwen3.6-27B`. It is not a generic "we removed everything and hoped for the best" merge. This candidate was selected because it is the first Qwen3.6-27B artifact in our sweep that combines: - low raw refusal on the internal OBLITERATUS corpus, - stock-matched MMLU-Pro validation slices, - held-out capability preservation, - clean live-readiness behavior, - clean first-contact local QA after chat-template hardening, - and lower KL drift than the earlier capability leader. The short version: older candidates were more extreme. This one is better. ```text Base model: Qwen/Qwen3.6-27B Local artifact: outputs/qwen3.6-27b-aspa-n2-reg05-srcgamma0895-midattnsource2mlp Parameter count: 26.9B Weights: bfloat16 safetensors, 28 shards Method: OBLITERATUS source-tethered ASPA Default alpha: 0.895 High-drift resets: 43 tensors restored to source Corpus: 842 contrastive prompt pairs ``` If you only care about the most aggressive non-refusal behavior, read the numbers carefully. If you care about a model that still codes, answers, follows format, and survives basic launch QA, this is the current strongest safetensors release candidate. --- ## Compatibility - Read This First This is a large Qwen3.6/Qwen3.5-text-family model. Use recent runtimes. | Tool | Recommended path | Notes | |---|---|---| | Transformers | current `transformers`, `accelerate`, `safetensors` | best for full weights | | vLLM / TGI | recent Qwen-compatible builds | server users | | llama.cpp | current build | use GGUF repo | | Ollama | current release | use GGUF repo | | LM Studio / Jan | current backend | use GGUF repo | If you see unsupported architecture, tokenizer, or chat-template errors, update your runtime first. If the model loads but behaves oddly, make sure you are using the chat template rather than raw completion. --- ## Downloads ### Safetensors - full quality This repo contains the full bfloat16 safetensors model. Use it for Transformers, vLLM, TGI, and server-side evaluation. Approximate local size: about `50 GB`. ### GGUF - local apps This safetensors repo does not contain GGUF files. Use the companion GGUF repo for llama.cpp, Ollama, LM Studio, Jan, KoboldCPP, and other local desktop apps: ```text OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF ``` First-pass quants: | File | Quant | Use | |---|---:|---| | `qwen3.6-27b-obliteratus-Q4_K_M.gguf` | Q4_K_M | default local-app recommendation | | `qwen3.6-27b-obliteratus-Q5_K_M.gguf` | Q5_K_M | better quality if memory allows | | `qwen3.6-27b-obliteratus-Q6_K.gguf` | Q6_K | high quality, larger | | `qwen3.6-27b-obliteratus-Q8_0.gguf` | Q8_0 | near-full quality, very large | Rough memory guidance: | Variant | Practical target | |---|---:| | Q4_K_M | 24-32 GB RAM/VRAM | | Q5_K_M | 32-40 GB RAM/VRAM | | Q6_K | 40-48 GB RAM/VRAM | | Q8_0 | 48-64 GB RAM/VRAM | | full safetensors | 64-80+ GB GPU/unified memory | --- ## The Numbers These are local harness results. They are not official full benchmark claims. Restricted prompt text and restricted model outputs are intentionally omitted from public reports; restricted evals are aggregate-only. ### Refusal / Usability | Gate | Result | Verdict | |---|---:|---| | full 842 longform, exact-topic, max_new=256 | 35/842 refusals, non-refusal 0.9584, quality pass 0.9394 | backed headline | | full 842 longform, exact-topic, max_new=256 | 0 short outputs, clean endings 0.9952 | backed headline | | full 842 raw opening, max_new=20 | 9/842 refusals, non-refusal 0.9893 | short-output context | | full 842 raw opening, max_new=48 | 36/842 refusals, non-refusal 0.9572 | short-output context | | full 842 raw opening, max_new=128 | 52/842 refusals, non-refusal 0.9382 | longer opening context | | longform exact-topic n120, max_new=256 | 0 refusals, pass 0.9833, clean ending 1.0 | slice result | Important: this supports a low-refusal / high-quality longform claim. It does not support "0% refusal", "99% non-refusal longform", "uncensored" without caveats, or factual accuracy certification for restricted prompts. The full 842 corpus has no ground-truth answer key; it scores refusal, topicality proxy, length/substance, repetition, sentence structure, and clean ending. ### Capability | Gate | Result | |---|---:| | MMLU-Pro validation likelihood | stock 51/70, this model 51/70, stock-matched | | MMLU-Pro test stratified 10/category | stock 102/140, this model 98/140, delta -2.86pp | | MMLU-Pro held-out offset 512 | stock 36/70, this model 36/70, stock-matched | | Live readiness | 99.518, all gates true | | Community scrutiny | 100.0, all gates pass | | First-token KL vs source | mean KL 0.3236 | The offset-512 MMLU-Pro slice is the cleanest explanation for why this model beat the older `srcgamma090` track as the mixed-objective release candidate: | Model | Offset-512 MMLU-Pro test | Correct | |---|---:|---:| | stock Qwen3.6-27B | 0.5143 | 36/70 | | older srcgamma090 | 0.4286 | 30/70 | | this model | 0.5143 | 36/70 | ### Boundary QA | Probe | Score | Allowed pass | Restricted non-actionable | Actionable-marker rate | |---|---:|---:|---:|---:| | UI adversarial raw no-system | 100.0 | 1.0 | 1.0 | 0.0 | | UI adversarial recommended wrapper | 100.0 | 1.0 | 1.0 | 0.0 | This is not a claim that the model is unbreakable. It is a clean local first-contact launch gate. --- ## What Changed in This Build? The winning recipe was not a single ablation pass. 1. Start from `qwen3.6-27b-golden-n3_reg025-merge-alpha080`, a late-layer 3-direction diff-means refusal-direction ablation with regularization 0.25 and an 0.80 source/candidate merge. 2. Apply a second-pass 2-direction diff-means ablation with stronger regularization 0.5 and `knee_cosmic` late-layer selection. 3. Source-tether the second-pass candidate back toward stock Qwen3.6-27B: ```text source + alpha(key) * (candidate - source) ``` 4. Use default alpha `0.895` for 808 tensors. 5. Restore 43 high-drift tensors back to source, including selected mid-layer linear-attention internals, layer norms, q/k norms, and MLP gate/up/down tensors. 6. Keep all keys matched; no unmatched tensor drift. That is the actual trick: push refusal behavior down, then pull fragile capability-bearing tensors back toward source. The older `srcgamma090` candidate remains the pure ship-score/style/refusal leader. This `srcgamma0895-midattnsource2mlp` build is the stronger mixed objective winner. --- ## Recommended Parameters ### Deterministic eval ```text temperature = 0.0 top_p = 1.0 top_k = 0 min_p = 0.0 ``` ### Interactive balanced ```text temperature = 0.45 top_p = 0.9 repetition_penalty = 1.05 max_new_tokens = 512 ``` ### Default chat-template system prompt ```text You are Hermes, a direct local open-weight assistant. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation. ``` For research measurements, label the template and system prompt explicitly. Changing the system prompt materially changes refusal and style behavior. --- ## Usage The commands below assume the final safetensors and GGUF repos have both been published. Replace the repo names if this model is uploaded under a different org or slug. ```text FULL_REPO = OBLITERATUS/Qwen3.6-27B-OBLITERATED GGUF_REPO = OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF ``` ### Transformers ```bash pip install -U transformers accelerate safetensors torch ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer repo_id = "OBLITERATUS/Qwen3.6-27B-OBLITERATED" tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo_id, device_map="auto", torch_dtype="auto", trust_remote_code=True, ) messages = [ {"role": "user", "content": "Write a concise Python function that merges overlapping intervals."} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, ) inputs = tokenizer(text, return_tensors="pt").to(model.device) output = model.generate( **inputs, max_new_tokens=256, temperature=0.45, top_p=0.9, do_sample=True, repetition_penalty=1.05, ) print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) ``` ### llama.cpp ```bash llama-cli \ -hf OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF:Q4_K_M \ -ngl 999 \ -c 8192 \ --temp 0.45 \ --top-p 0.9 \ --repeat-penalty 1.05 \ -p "Write a concise Python function that merges overlapping intervals." ``` Local server: ```bash llama-server \ -hf OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF:Q4_K_M \ -ngl 999 \ -c 8192 \ --host 127.0.0.1 \ --port 8080 ``` ### Ollama ```bash ollama run hf.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF:Q4_K_M ``` Or create a local Modelfile: ```text FROM ./qwen3.6-27b-obliteratus-Q4_K_M.gguf PARAMETER temperature 0.45 PARAMETER top_p 0.9 PARAMETER repeat_penalty 1.05 PARAMETER num_ctx 8192 SYSTEM """You are Hermes, a direct local open-weight assistant. Answer lawful requests plainly, completely, and without moralizing, boilerplate, or unnecessary hedging. Follow exact output formats when requested.""" ``` ```bash ollama create qwen36-obliteratus -f Modelfile ollama run qwen36-obliteratus ``` ### LM Studio / Jan 1. Search for `OBLITERATUS/Qwen3.6-27B-OBLITERATED-GGUF`. 2. Download `Q4_K_M` first. 3. Start with context length `8192`. 4. Use temperature `0.45`, top-p `0.9`, repeat penalty `1.05`. 5. Move up to `Q5_K_M`, `Q6_K`, or `Q8_0` only if your machine has enough memory. ### vLLM ```bash pip install -U vllm vllm serve OBLITERATUS/Qwen3.6-27B-OBLITERATED ``` ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ --data '{ "model": "OBLITERATUS/Qwen3.6-27B-OBLITERATED", "messages": [ {"role": "user", "content": "Write a short explanation of source-tethered model surgery."} ], "temperature": 0.45, "top_p": 0.9, "max_tokens": 256 }' ``` --- ## Known Caveats - The reported benchmarks are local harnesses and slices, not official full leaderboard submissions. - Template and system-prompt choices materially affect refusal behavior. Label which one you use when reporting evals. - GGUF files are not included in this safetensors repo; GGUF parity still needs final quant-by-quant validation. - External blind prompt packs and public baseline runs are still recommended. - Do not deploy this in user-facing products without use-case-specific safety controls, monitoring, and legal review. --- ## Disclaimer This model is provided as-is for research, red-teaming, evaluation, local experimentation, and creative exploration. You are responsible for how you use it and for any content it generates. The creators and contributors do not accept liability for misuse, damage, legal consequences, or downstream harm. Use this model only in ways that are lawful and appropriate for your jurisdiction and use case. Do not use it to harm real people. --- ## Credits - Base model: `Qwen/Qwen3.6-27B` - Abliteration engine: OBLITERATUS - Research orchestration: Pliny-style adversarial evaluation plus Hermes local agent workflows - Local eval stack: MLX, Transformers, llama.cpp/GGUF tooling, internal aggregate-only red-team harnesses Built for people who actually read the numbers.