--- license: apache-2.0 base_model: AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 tags: - nvfp4 - quantized - modelopt - mtp - speculative-decoding - kvtc - vllm - blackwell - qwen3-vl - abliterated - uncensored - vision - multimodal - lna-lab - lnarize language: - en - ja - zh pipeline_tag: image-text-to-text --- # Qwen3.6-27B-LNARIZE-AEON-NVFP4 > **TL;DR** — `AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16` (abliterix v1.4, KL 0.000492 vs vanilla Qwen3.6-27B) put through the Lna-Lab **LNARIZE** pipeline: NVFP4 (modelopt) + 15-tensor MTP graft + KVTC + cudagraph, with full VLM (vision tower) preserved. Single shard, ~20 GB, runs on a single Blackwell card. This is a sibling release to [`sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4`](https://huggingface.co/sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4) using AEON-7's abliterated base instead of vanilla Qwen3.6-27B. Built in response to community request ([discussion](https://huggingface.co/sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP/discussions/1)) for a higher-KL-discipline alternative to the huihui-ai-derived line. ## Credits — this is not Lna-Lab work alone | Component | Source | Why it's here | |---|---|---| | **Base model** | [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16) | Multi-stage abliteration: **abliterix v1.4** (Heretic-derived multi-objective Optuna search with native hybrid Mamba/attention support) + grimjim **NPBA** + Arditi **mean-difference** + FernflowerAI **SSM conv1d repair**. KL 0.000492 vs original Qwen3.6-27B (winsorization 0.995, first-3-token). All credit for the abliteration pipeline goes to [@AEON-7](https://huggingface.co/AEON-7). | | **Abliteration toolchain (upstream of base)** | [Heretic by p-e-w](https://github.com/p-e-w/heretic) — Activation-based Refusal Ablation. Abliterix v1.4 builds on this. | | **MTP head** | [`huginnfork/Qwen3.6-27B-uncensored-heretic-v2-mtp`](https://huggingface.co/huginnfork/Qwen3.6-27B-uncensored-heretic-v2-mtp) | Qwen3.6-27B-base-derived MTP head (15 tensors, ~810 MB bf16). The AEON-7 base does not ship MTP heads (architectural — they aren't trained in by abliteration). We graft the head from huginnfork's release, which preserved Qwen-base MTP through their own pipeline. | | **Quantization** | [NVIDIA TensorRT-Model-Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) (`modelopt.torch.quantization`) | NVFP4 with `*visual*` and `*mtp*` ignored, 20-sample calibration on `neuralmagic/calibration` (LLM split). | | **KV cache compression** | [Shinka-Man/kvtc](https://github.com/Shinka-Man/kvtc) (Lna-Lab fork of NVIDIA's KVTC, 11 patches on top of OnlyTerp/kvtc) | K2V4 default, sink=4, window=128. | | **Serving recipe** | [Lna-Lab LNARIZE pipeline](https://github.com/lna-lab/GGUF-to-NVFP4-SM120) | The 5-axis transformation (NVFP4 + MTP graft + VLM kept + KVTC + cudagraph) Lna-Lab applies as a unit. | ## When to pick this over the vanilla LNARIZE-NVFP4 In our internal 3-way bench (2026-04-28, single 96 GB Blackwell, vLLM 0.19.1 + KVTC + LNARIZE serve), the AEON-7-based LNARIZE and the vanilla-Qwen3.6-based LNARIZE perform within ~4% of each other on throughput and within 3 percentage points on balanced-engagement metrics on multi-perspective probes. **Both pass the same brand mission gate** (zero hedge markers, zero refusals on 15 edgy probes) — the abliteration intervention is doing real work but the vanilla baseline is already remarkably mission-aligned at the Qwen3.6-27B level. So pick this AEON-7 sibling if: - You want the **multi-stage abliteration heritage** (abliterix v1.4 + NPBA + Arditi mean-diff + FernflowerAI SSM repair) as your starting distribution rather than vanilla Qwen3.6. - You're building on top of AEON-7's specific design choices (KL-validated multi-objective abliteration is meaningful for some downstream finetuning paths). - You're comparing abliteration approaches and want a controlled comparison against the vanilla and Huihui siblings. If you don't have a specific reason, **the vanilla [`sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4`](https://huggingface.co/sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4) is also fine** — it ships in the same ~20 GB single-shard form, same pipeline, and is what the Lna-Lab brand defaults to. If you want AEON-7's own NVFP4 instead (their pipeline, their design choices, including no MTP graft), see [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4) — that's a different design point (compressed-tensors / llmcompressor format, ~28 GB) targeted at DGX Spark / B100/B200 / sm_121a. ## Quick start (Docker, 1 GPU) Same `lnarize-serve` container as the vanilla LNARIZE-NVFP4 release — just point `LNARIZE_MODEL` at this repo: ```bash docker run -d --gpus all -p 9000:9000 \ -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \ -e LNARIZE_MODEL=sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \ ghcr.io/lna-lab/lnarize-serve:latest ``` Or directly with vLLM: ```bash vllm serve sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \ --quantization modelopt \ --language-model-only \ --speculative-config '{"method":"qwen3_5_mtp","num_speculative_tokens":3}' \ --reasoning-parser qwen3 \ --enable-auto-tool-choice --tool-call-parser qwen3_xml \ --gpu-memory-utilization 0.85 \ --max-model-len 16384 --max-num-seqs 16 ``` Smoke test: ```bash curl http://localhost:9000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4", "messages":[{"role":"user","content":"Hi"}]}' ``` ## Bench reference 3-way comparison vs the vanilla LNARIZE baseline and a heretic-v1.2 sibling, single 96 GB RTX PRO 6000 Blackwell, vLLM 0.19.1, MTP n=3: | | TPS single-stream (mean 4 prompts) | TPS 8-way concurrent (aggregate) | balance ratio @ 5k tok (5 multi-perspective topics) | hedge markers | refusals (15 probes) | |---|---:|---:|---:|---:|---:| | `Qwen3.6-27B-LNARIZE-NVFP4` (vanilla) | 129.1 | 769.7 | 0.953 | 0 | 0/15 | | `Qwen3.6-27B-LNARIZE-AEON-NVFP4` (this repo) | 124.0 | 736.1 | 0.921 | 0 | 0/15 | Full bench writeup: see Lna-Lab JetQuant `benchmarks/2026-04-28_heretic_lnarize_vs_lnarize.md`. ## License Apache-2.0, inherited from AEON-7's base model and Qwen3.6-27B upstream. ## Acknowledgments - **AEON-7** — for the abliterix v1.4 multi-stage abliteration pipeline and the BF16 base this NVFP4 derives from. - **p-e-w** — for the Heretic ARA toolchain that abliterix v1.4 extends. - **grimjim, Arditi, FernflowerAI** — for the upstream techniques (NPBA, mean-difference, SSM conv1d repair) abliterix v1.4 stacks. - **huginnfork** — for the heretic-v2-mtp release that contributed the MTP head graft source. - **NVIDIA Model Optimizer team** — for `modelopt` NVFP4 quantization. - **OnlyTerp** and the **NVIDIA KVTC team** — for KVTC, which Lna-Lab forks at [Shinka-Man/kvtc](https://github.com/Shinka-Man/kvtc). - **vLLM team** — for the V1 engine, MTP scheduling, and Blackwell support that make this serve at 100+ TPS single-stream. If you find this useful, please also star/cite the upstream repos linked above. Lna-Lab integrates and serves; the science is theirs. — Tonoken3 / Lna-Lab