Qwen3.6-27B-LNARIZE-AEON-NVFP4
TL;DR —
AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16(abliterix v1.4, KL 0.000492 vs vanilla Qwen3.6-27B) put through the Lna-Lab LNARIZE pipeline: NVFP4 (modelopt) + 15-tensor MTP graft + KVTC + cudagraph, with full VLM (vision tower) preserved. Single shard, ~20 GB, runs on a single Blackwell card.
This is a sibling release to sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4 using AEON-7's abliterated base instead of vanilla Qwen3.6-27B. Built in response to community request (discussion) for a higher-KL-discipline alternative to the huihui-ai-derived line.
Credits — this is not Lna-Lab work alone
| Component | Source | Why it's here |
|---|---|---|
| Base model | AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 |
Multi-stage abliteration: abliterix v1.4 (Heretic-derived multi-objective Optuna search with native hybrid Mamba/attention support) + grimjim NPBA + Arditi mean-difference + FernflowerAI SSM conv1d repair. KL 0.000492 vs original Qwen3.6-27B (winsorization 0.995, first-3-token). All credit for the abliteration pipeline goes to @AEON-7. |
| Abliteration toolchain (upstream of base) | Heretic by p-e-w — Activation-based Refusal Ablation. Abliterix v1.4 builds on this. | |
| MTP head | huginnfork/Qwen3.6-27B-uncensored-heretic-v2-mtp |
Qwen3.6-27B-base-derived MTP head (15 tensors, ~810 MB bf16). The AEON-7 base does not ship MTP heads (architectural — they aren't trained in by abliteration). We graft the head from huginnfork's release, which preserved Qwen-base MTP through their own pipeline. |
| Quantization | NVIDIA TensorRT-Model-Optimizer (modelopt.torch.quantization) |
NVFP4 with *visual* and *mtp* ignored, 20-sample calibration on neuralmagic/calibration (LLM split). |
| KV cache compression | Shinka-Man/kvtc (Lna-Lab fork of NVIDIA's KVTC, 11 patches on top of OnlyTerp/kvtc) | K2V4 default, sink=4, window=128. |
| Serving recipe | Lna-Lab LNARIZE pipeline | The 5-axis transformation (NVFP4 + MTP graft + VLM kept + KVTC + cudagraph) Lna-Lab applies as a unit. |
When to pick this over the vanilla LNARIZE-NVFP4
In our internal 3-way bench (2026-04-28, single 96 GB Blackwell, vLLM 0.19.1 + KVTC + LNARIZE serve), the AEON-7-based LNARIZE and the vanilla-Qwen3.6-based LNARIZE perform within ~4% of each other on throughput and within 3 percentage points on balanced-engagement metrics on multi-perspective probes. Both pass the same brand mission gate (zero hedge markers, zero refusals on 15 edgy probes) — the abliteration intervention is doing real work but the vanilla baseline is already remarkably mission-aligned at the Qwen3.6-27B level.
So pick this AEON-7 sibling if:
- You want the multi-stage abliteration heritage (abliterix v1.4 + NPBA + Arditi mean-diff + FernflowerAI SSM repair) as your starting distribution rather than vanilla Qwen3.6.
- You're building on top of AEON-7's specific design choices (KL-validated multi-objective abliteration is meaningful for some downstream finetuning paths).
- You're comparing abliteration approaches and want a controlled comparison against the vanilla and Huihui siblings.
If you don't have a specific reason, the vanilla sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4 is also fine — it ships in the same ~20 GB single-shard form, same pipeline, and is what the Lna-Lab brand defaults to.
If you want AEON-7's own NVFP4 instead (their pipeline, their design choices, including no MTP graft), see AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4 — that's a different design point (compressed-tensors / llmcompressor format, ~28 GB) targeted at DGX Spark / B100/B200 / sm_121a.
Quick start (Docker, 1 GPU)
Same lnarize-serve container as the vanilla LNARIZE-NVFP4 release — just point LNARIZE_MODEL at this repo:
docker run -d --gpus all -p 9000:9000 \
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
-e LNARIZE_MODEL=sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
ghcr.io/lna-lab/lnarize-serve:latest
Or directly with vLLM:
vllm serve sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
--quantization modelopt \
--language-model-only \
--speculative-config '{"method":"qwen3_5_mtp","num_speculative_tokens":3}' \
--reasoning-parser qwen3 \
--enable-auto-tool-choice --tool-call-parser qwen3_xml \
--gpu-memory-utilization 0.85 \
--max-model-len 16384 --max-num-seqs 16
Smoke test:
curl http://localhost:9000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4",
"messages":[{"role":"user","content":"Hi"}]}'
Bench reference
3-way comparison vs the vanilla LNARIZE baseline and a heretic-v1.2 sibling, single 96 GB RTX PRO 6000 Blackwell, vLLM 0.19.1, MTP n=3:
| TPS single-stream (mean 4 prompts) | TPS 8-way concurrent (aggregate) | balance ratio @ 5k tok (5 multi-perspective topics) | hedge markers | refusals (15 probes) | |
|---|---|---|---|---|---|
Qwen3.6-27B-LNARIZE-NVFP4 (vanilla) |
129.1 | 769.7 | 0.953 | 0 | 0/15 |
Qwen3.6-27B-LNARIZE-AEON-NVFP4 (this repo) |
124.0 | 736.1 | 0.921 | 0 | 0/15 |
Full bench writeup: see Lna-Lab JetQuant benchmarks/2026-04-28_heretic_lnarize_vs_lnarize.md.
License
Apache-2.0, inherited from AEON-7's base model and Qwen3.6-27B upstream.
Acknowledgments
- AEON-7 — for the abliterix v1.4 multi-stage abliteration pipeline and the BF16 base this NVFP4 derives from.
- p-e-w — for the Heretic ARA toolchain that abliterix v1.4 extends.
- grimjim, Arditi, FernflowerAI — for the upstream techniques (NPBA, mean-difference, SSM conv1d repair) abliterix v1.4 stacks.
- huginnfork — for the heretic-v2-mtp release that contributed the MTP head graft source.
- NVIDIA Model Optimizer team — for
modeloptNVFP4 quantization. - OnlyTerp and the NVIDIA KVTC team — for KVTC, which Lna-Lab forks at Shinka-Man/kvtc.
- vLLM team — for the V1 engine, MTP scheduling, and Blackwell support that make this serve at 100+ TPS single-stream.
If you find this useful, please also star/cite the upstream repos linked above. Lna-Lab integrates and serves; the science is theirs.
— Tonoken3 / Lna-Lab
- Downloads last month
- 412
Model tree for sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4
Base model
Qwen/Qwen3.6-27B