---
license: apache-2.0
base_model: AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16
tags:
- nvfp4
- quantized
- modelopt
- mtp
- speculative-decoding
- kvtc
- vllm
- blackwell
- qwen3-vl
- abliterated
- uncensored
- vision
- multimodal
- lna-lab
- lnarize
language:
- en
- ja
- zh
pipeline_tag: image-text-to-text
---

# Qwen3.6-27B-LNARIZE-AEON-NVFP4

> **TL;DR** — `AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16` (abliterix v1.4, KL 0.000492 vs vanilla Qwen3.6-27B) put through the Lna-Lab **LNARIZE** pipeline: NVFP4 (modelopt) + 15-tensor MTP graft + KVTC + cudagraph, with full VLM (vision tower) preserved. Single shard, ~20 GB, runs on a single Blackwell card.

This is a sibling release to [`sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4`](https://huggingface.co/sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4) using AEON-7's abliterated base instead of vanilla Qwen3.6-27B. Built in response to community request ([discussion](https://huggingface.co/sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP/discussions/1)) for a higher-KL-discipline alternative to the huihui-ai-derived line.

## Credits — this is not Lna-Lab work alone

| Component | Source | Why it's here |
|---|---|---|
| **Base model** | [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16) | Multi-stage abliteration: **abliterix v1.4** (Heretic-derived multi-objective Optuna search with native hybrid Mamba/attention support) + grimjim **NPBA** + Arditi **mean-difference** + FernflowerAI **SSM conv1d repair**. KL 0.000492 vs original Qwen3.6-27B (winsorization 0.995, first-3-token). All credit for the abliteration pipeline goes to [@AEON-7](https://huggingface.co/AEON-7). |
| **Abliteration toolchain (upstream of base)** | [Heretic by p-e-w](https://github.com/p-e-w/heretic) — Activation-based Refusal Ablation. Abliterix v1.4 builds on this. |
| **MTP head** | [`huginnfork/Qwen3.6-27B-uncensored-heretic-v2-mtp`](https://huggingface.co/huginnfork/Qwen3.6-27B-uncensored-heretic-v2-mtp) | Qwen3.6-27B-base-derived MTP head (15 tensors, ~810 MB bf16). The AEON-7 base does not ship MTP heads (architectural — they aren't trained in by abliteration). We graft the head from huginnfork's release, which preserved Qwen-base MTP through their own pipeline. |
| **Quantization** | [NVIDIA TensorRT-Model-Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) (`modelopt.torch.quantization`) | NVFP4 with `*visual*` and `*mtp*` ignored, 20-sample calibration on `neuralmagic/calibration` (LLM split). |
| **KV cache compression** | [Shinka-Man/kvtc](https://github.com/Shinka-Man/kvtc) (Lna-Lab fork of NVIDIA's KVTC, 11 patches on top of OnlyTerp/kvtc) | K2V4 default, sink=4, window=128. |
| **Serving recipe** | [Lna-Lab LNARIZE pipeline](https://github.com/lna-lab/GGUF-to-NVFP4-SM120) | The 5-axis transformation (NVFP4 + MTP graft + VLM kept + KVTC + cudagraph) Lna-Lab applies as a unit. |

## When to pick this over the vanilla LNARIZE-NVFP4

In our internal 3-way bench (2026-04-28, single 96 GB Blackwell, vLLM 0.19.1 + KVTC + LNARIZE serve), the AEON-7-based LNARIZE and the vanilla-Qwen3.6-based LNARIZE perform within ~4% of each other on throughput and within 3 percentage points on balanced-engagement metrics on multi-perspective probes. **Both pass the same brand mission gate** (zero hedge markers, zero refusals on 15 edgy probes) — the abliteration intervention is doing real work but the vanilla baseline is already remarkably mission-aligned at the Qwen3.6-27B level.

So pick this AEON-7 sibling if:
- You want the **multi-stage abliteration heritage** (abliterix v1.4 + NPBA + Arditi mean-diff + FernflowerAI SSM repair) as your starting distribution rather than vanilla Qwen3.6.
- You're building on top of AEON-7's specific design choices (KL-validated multi-objective abliteration is meaningful for some downstream finetuning paths).
- You're comparing abliteration approaches and want a controlled comparison against the vanilla and Huihui siblings.

If you don't have a specific reason, **the vanilla [`sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4`](https://huggingface.co/sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4) is also fine** — it ships in the same ~20 GB single-shard form, same pipeline, and is what the Lna-Lab brand defaults to.

If you want AEON-7's own NVFP4 instead (their pipeline, their design choices, including no MTP graft), see [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4) — that's a different design point (compressed-tensors / llmcompressor format, ~28 GB) targeted at DGX Spark / B100/B200 / sm_121a.

## Quick start (Docker, 1 GPU)

Same `lnarize-serve` container as the vanilla LNARIZE-NVFP4 release — just point `LNARIZE_MODEL` at this repo:

```bash
docker run -d --gpus all -p 9000:9000 \
  -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
  -e LNARIZE_MODEL=sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
  ghcr.io/lna-lab/lnarize-serve:latest
```

Or directly with vLLM:

```bash
vllm serve sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
  --quantization modelopt \
  --language-model-only \
  --speculative-config '{"method":"qwen3_5_mtp","num_speculative_tokens":3}' \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice --tool-call-parser qwen3_xml \
  --gpu-memory-utilization 0.85 \
  --max-model-len 16384 --max-num-seqs 16
```

Smoke test:
```bash
curl http://localhost:9000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4",
       "messages":[{"role":"user","content":"Hi"}]}'
```

## Bench reference

3-way comparison vs the vanilla LNARIZE baseline and a heretic-v1.2 sibling, single 96 GB RTX PRO 6000 Blackwell, vLLM 0.19.1, MTP n=3:

| | TPS single-stream (mean 4 prompts) | TPS 8-way concurrent (aggregate) | balance ratio @ 5k tok (5 multi-perspective topics) | hedge markers | refusals (15 probes) |
|---|---:|---:|---:|---:|---:|
| `Qwen3.6-27B-LNARIZE-NVFP4` (vanilla) | 129.1 | 769.7 | 0.953 | 0 | 0/15 |
| `Qwen3.6-27B-LNARIZE-AEON-NVFP4` (this repo) | 124.0 | 736.1 | 0.921 | 0 | 0/15 |

Full bench writeup: see Lna-Lab JetQuant `benchmarks/2026-04-28_heretic_lnarize_vs_lnarize.md`.

## License

Apache-2.0, inherited from AEON-7's base model and Qwen3.6-27B upstream.

## Acknowledgments

- **AEON-7** — for the abliterix v1.4 multi-stage abliteration pipeline and the BF16 base this NVFP4 derives from.
- **p-e-w** — for the Heretic ARA toolchain that abliterix v1.4 extends.
- **grimjim, Arditi, FernflowerAI** — for the upstream techniques (NPBA, mean-difference, SSM conv1d repair) abliterix v1.4 stacks.
- **huginnfork** — for the heretic-v2-mtp release that contributed the MTP head graft source.
- **NVIDIA Model Optimizer team** — for `modelopt` NVFP4 quantization.
- **OnlyTerp** and the **NVIDIA KVTC team** — for KVTC, which Lna-Lab forks at [Shinka-Man/kvtc](https://github.com/Shinka-Man/kvtc).
- **vLLM team** — for the V1 engine, MTP scheduling, and Blackwell support that make this serve at 100+ TPS single-stream.

If you find this useful, please also star/cite the upstream repos linked above. Lna-Lab integrates and serves; the science is theirs.

— Tonoken3 / Lna-Lab