Qwen3.6-27B-LNARIZE-AEON-NVFP4

TL;DRAEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 (abliterix v1.4, KL 0.000492 vs vanilla Qwen3.6-27B) put through the Lna-Lab LNARIZE pipeline: NVFP4 (modelopt) + 15-tensor MTP graft + KVTC + cudagraph, with full VLM (vision tower) preserved. Single shard, ~20 GB, runs on a single Blackwell card.

This is a sibling release to sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4 using AEON-7's abliterated base instead of vanilla Qwen3.6-27B. Built in response to community request (discussion) for a higher-KL-discipline alternative to the huihui-ai-derived line.

Credits — this is not Lna-Lab work alone

Component Source Why it's here
Base model AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 Multi-stage abliteration: abliterix v1.4 (Heretic-derived multi-objective Optuna search with native hybrid Mamba/attention support) + grimjim NPBA + Arditi mean-difference + FernflowerAI SSM conv1d repair. KL 0.000492 vs original Qwen3.6-27B (winsorization 0.995, first-3-token). All credit for the abliteration pipeline goes to @AEON-7.
Abliteration toolchain (upstream of base) Heretic by p-e-w — Activation-based Refusal Ablation. Abliterix v1.4 builds on this.
MTP head huginnfork/Qwen3.6-27B-uncensored-heretic-v2-mtp Qwen3.6-27B-base-derived MTP head (15 tensors, ~810 MB bf16). The AEON-7 base does not ship MTP heads (architectural — they aren't trained in by abliteration). We graft the head from huginnfork's release, which preserved Qwen-base MTP through their own pipeline.
Quantization NVIDIA TensorRT-Model-Optimizer (modelopt.torch.quantization) NVFP4 with *visual* and *mtp* ignored, 20-sample calibration on neuralmagic/calibration (LLM split).
KV cache compression Shinka-Man/kvtc (Lna-Lab fork of NVIDIA's KVTC, 11 patches on top of OnlyTerp/kvtc) K2V4 default, sink=4, window=128.
Serving recipe Lna-Lab LNARIZE pipeline The 5-axis transformation (NVFP4 + MTP graft + VLM kept + KVTC + cudagraph) Lna-Lab applies as a unit.

When to pick this over the vanilla LNARIZE-NVFP4

In our internal 3-way bench (2026-04-28, single 96 GB Blackwell, vLLM 0.19.1 + KVTC + LNARIZE serve), the AEON-7-based LNARIZE and the vanilla-Qwen3.6-based LNARIZE perform within ~4% of each other on throughput and within 3 percentage points on balanced-engagement metrics on multi-perspective probes. Both pass the same brand mission gate (zero hedge markers, zero refusals on 15 edgy probes) — the abliteration intervention is doing real work but the vanilla baseline is already remarkably mission-aligned at the Qwen3.6-27B level.

So pick this AEON-7 sibling if:

  • You want the multi-stage abliteration heritage (abliterix v1.4 + NPBA + Arditi mean-diff + FernflowerAI SSM repair) as your starting distribution rather than vanilla Qwen3.6.
  • You're building on top of AEON-7's specific design choices (KL-validated multi-objective abliteration is meaningful for some downstream finetuning paths).
  • You're comparing abliteration approaches and want a controlled comparison against the vanilla and Huihui siblings.

If you don't have a specific reason, the vanilla sakamakismile/Qwen3.6-27B-LNARIZE-NVFP4 is also fine — it ships in the same ~20 GB single-shard form, same pipeline, and is what the Lna-Lab brand defaults to.

If you want AEON-7's own NVFP4 instead (their pipeline, their design choices, including no MTP graft), see AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4 — that's a different design point (compressed-tensors / llmcompressor format, ~28 GB) targeted at DGX Spark / B100/B200 / sm_121a.

Quick start (Docker, 1 GPU)

Same lnarize-serve container as the vanilla LNARIZE-NVFP4 release — just point LNARIZE_MODEL at this repo:

docker run -d --gpus all -p 9000:9000 \
  -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
  -e LNARIZE_MODEL=sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
  ghcr.io/lna-lab/lnarize-serve:latest

Or directly with vLLM:

vllm serve sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4 \
  --quantization modelopt \
  --language-model-only \
  --speculative-config '{"method":"qwen3_5_mtp","num_speculative_tokens":3}' \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice --tool-call-parser qwen3_xml \
  --gpu-memory-utilization 0.85 \
  --max-model-len 16384 --max-num-seqs 16

Smoke test:

curl http://localhost:9000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4",
       "messages":[{"role":"user","content":"Hi"}]}'

Bench reference

3-way comparison vs the vanilla LNARIZE baseline and a heretic-v1.2 sibling, single 96 GB RTX PRO 6000 Blackwell, vLLM 0.19.1, MTP n=3:

TPS single-stream (mean 4 prompts) TPS 8-way concurrent (aggregate) balance ratio @ 5k tok (5 multi-perspective topics) hedge markers refusals (15 probes)
Qwen3.6-27B-LNARIZE-NVFP4 (vanilla) 129.1 769.7 0.953 0 0/15
Qwen3.6-27B-LNARIZE-AEON-NVFP4 (this repo) 124.0 736.1 0.921 0 0/15

Full bench writeup: see Lna-Lab JetQuant benchmarks/2026-04-28_heretic_lnarize_vs_lnarize.md.

License

Apache-2.0, inherited from AEON-7's base model and Qwen3.6-27B upstream.

Acknowledgments

  • AEON-7 — for the abliterix v1.4 multi-stage abliteration pipeline and the BF16 base this NVFP4 derives from.
  • p-e-w — for the Heretic ARA toolchain that abliterix v1.4 extends.
  • grimjim, Arditi, FernflowerAI — for the upstream techniques (NPBA, mean-difference, SSM conv1d repair) abliterix v1.4 stacks.
  • huginnfork — for the heretic-v2-mtp release that contributed the MTP head graft source.
  • NVIDIA Model Optimizer team — for modelopt NVFP4 quantization.
  • OnlyTerp and the NVIDIA KVTC team — for KVTC, which Lna-Lab forks at Shinka-Man/kvtc.
  • vLLM team — for the V1 engine, MTP scheduling, and Blackwell support that make this serve at 100+ TPS single-stream.

If you find this useful, please also star/cite the upstream repos linked above. Lna-Lab integrates and serves; the science is theirs.

— Tonoken3 / Lna-Lab

Downloads last month
412
Safetensors
Model size
17B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sakamakismile/Qwen3.6-27B-LNARIZE-AEON-NVFP4

Base model

Qwen/Qwen3.6-27B
Quantized
(30)
this model