Qwen3.6-27B-LNARIZE-AEON-NVFP4

TL;DR — AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 (abliterix v1.4, KL 0.000492 vs vanilla Qwen3.6-27B) put through the Lna-Lab LNARIZE pipeline: NVFP4 (modelopt) + 15-tensor MTP graft + KVTC + cudagraph, with full VLM (vision tower) preserved. Single shard, ~20 GB, runs on a single Blackwell card.

This is a sibling release to sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4 using AEON-7's abliterated base instead of vanilla Qwen3.6-27B. Built in response to community request (discussion) for a higher-KL-discipline alternative to the huihui-ai-derived line.

Credits — this is not Lna-Lab work alone

Component	Source	Why it's here
Base model	`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16`	Multi-stage abliteration: abliterix v1.4 (Heretic-derived multi-objective Optuna search with native hybrid Mamba/attention support) + grimjim NPBA + Arditi mean-difference + FernflowerAI SSM conv1d repair. KL 0.000492 vs original Qwen3.6-27B (winsorization 0.995, first-3-token). All credit for the abliteration pipeline goes to @AEON-7.
Abliteration toolchain (upstream of base)	Heretic by p-e-w — Activation-based Refusal Ablation. Abliterix v1.4 builds on this.
MTP head	`huginnfork/Qwen3.6-27B-uncensored-heretic-v2-mtp`	Qwen3.6-27B-base-derived MTP head (15 tensors, ~810 MB bf16). The AEON-7 base does not ship MTP heads (architectural — they aren't trained in by abliteration). We graft the head from huginnfork's release, which preserved Qwen-base MTP through their own pipeline.
Quantization	NVIDIA TensorRT-Model-Optimizer (`modelopt.torch.quantization`)	NVFP4 with `visual` and `mtp` ignored, 20-sample calibration on `neuralmagic/calibration` (LLM split).
KV cache compression	Shinka-Man/kvtc (Lna-Lab fork of NVIDIA's KVTC, 11 patches on top of OnlyTerp/kvtc)	K2V4 default, sink=4, window=128.
Serving recipe	Lna-Lab LNARIZE pipeline	The 5-axis transformation (NVFP4 + MTP graft + VLM kept + KVTC + cudagraph) Lna-Lab applies as a unit.

When to pick this over the vanilla LNARIZE-NVFP4

In our internal 3-way bench (2026-04-28, single 96 GB Blackwell, vLLM 0.19.1 + KVTC + LNARIZE serve), the AEON-7-based LNARIZE and the vanilla-Qwen3.6-based LNARIZE perform within ~4% of each other on throughput and within 3 percentage points on balanced-engagement metrics on multi-perspective probes. Both pass the same brand mission gate (zero hedge markers, zero refusals on 15 edgy probes) — the abliteration intervention is doing real work but the vanilla baseline is already remarkably mission-aligned at the Qwen3.6-27B level.

So pick this AEON-7 sibling if:

You want the multi-stage abliteration heritage (abliterix v1.4 + NPBA + Arditi mean-diff + FernflowerAI SSM repair) as your starting distribution rather than vanilla Qwen3.6.
You're building on top of AEON-7's specific design choices (KL-validated multi-objective abliteration is meaningful for some downstream finetuning paths).
You're comparing abliteration approaches and want a controlled comparison against the vanilla and Huihui siblings.

If you don't have a specific reason, the vanilla sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4 is also fine — it ships in the same ~20 GB single-shard form, same pipeline, and is what the Lna-Lab brand defaults to.

If you want AEON-7's own NVFP4 instead (their pipeline, their design choices, including no MTP graft), see AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4 — that's a different design point (compressed-tensors / llmcompressor format, ~28 GB) targeted at DGX Spark / B100/B200 / sm_121a.

Quick start (Docker, 1 GPU)

Same lnarize-serve container as the vanilla LNARIZE-NVFP4 release — just point LNARIZE_MODEL at this repo:

docker run -d --gpus all -p 9000:9000 \
  -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
  -e LNARIZE_MODEL=sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
  ghcr.io/lna-lab/lnarize-serve:latest

Or directly with vLLM:

vllm serve sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
  --quantization modelopt \
  --language-model-only \
  --speculative-config '{"method":"qwen3_5_mtp","num_speculative_tokens":3}' \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice --tool-call-parser qwen3_xml \
  --gpu-memory-utilization 0.85 \
  --max-model-len 16384 --max-num-seqs 16

Smoke test:

curl http://localhost:9000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4",
       "messages":[{"role":"user","content":"Hi"}]}'

Bench reference

3-way comparison vs the vanilla LNARIZE baseline and a heretic-v1.2 sibling, single 96 GB RTX PRO 6000 Blackwell, vLLM 0.19.1, MTP n=3:

	TPS single-stream (mean 4 prompts)	TPS 8-way concurrent (aggregate)	balance ratio @ 5k tok (5 multi-perspective topics)	hedge markers	refusals (15 probes)
`Qwen3.6-27B-LNARIZE-NVFP4` (vanilla)	129.1	769.7	0.953	0	0/15
`Qwen3.6-27B-LNARIZE-AEON-NVFP4` (this repo)	124.0	736.1	0.921	0	0/15

Full bench writeup: see Lna-Lab JetQuant benchmarks/2026-04-28_heretic_lnarize_vs_lnarize.md.

License

Apache-2.0, inherited from AEON-7's base model and Qwen3.6-27B upstream.

Acknowledgments

AEON-7 — for the abliterix v1.4 multi-stage abliteration pipeline and the BF16 base this NVFP4 derives from.
p-e-w — for the Heretic ARA toolchain that abliterix v1.4 extends.
grimjim, Arditi, FernflowerAI — for the upstream techniques (NPBA, mean-difference, SSM conv1d repair) abliterix v1.4 stacks.
huginnfork — for the heretic-v2-mtp release that contributed the MTP head graft source.
NVIDIA Model Optimizer team — for modelopt NVFP4 quantization.
OnlyTerp and the NVIDIA KVTC team — for KVTC, which Lna-Lab forks at Shinka-Man/kvtc.
vLLM team — for the V1 engine, MTP scheduling, and Blackwell support that make this serve at 100+ TPS single-stream.

If you find this useful, please also star/cite the upstream repos linked above. Lna-Lab integrates and serves; the science is theirs.

— Tonoken3 / Lna-Lab

Downloads last month: 412

Safetensors

Model size

17B params

Tensor type

BF16

F8_E4M3

Model tree for sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4

Base model

Qwen/Qwen3.6-27B

Finetuned

AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16

Quantized

(30)

this model