Qwen3.6-27B-PRISM-PRO โ€” DQ GGUF

llama.cpp-native GGUF quantization of Qwen3.6-27B-PRISM-PRO using the PRISM project's dynamic-quant (DQ) recipe. ~13.7 GB (vs 55 GB BF16).

PRISM-PRO of Qwen/Qwen3.6-27B (bias/propoganda removal) This GGUF preserves the model's native MTP draft head + full vision tower, and pairs with the separately-published EAGLE-3 drafter for lossless faster decode.

Performance

llama.cpp on a single NVIDIA Blackwell GPU, single-stream greedy decode:

config tok/s speedup
no-spec baseline 80 1.00ร—
native MTP (built-in draft head) 121 1.51ร—
EAGLE-3 chain (with our drafter) 111 1.39ร—

Speculative decoding is lossless (output token-identical to non-spec greedy, modulo batched-verify floating-point non-associativity intrinsic to all spec decoding). For a faster SGLang deployment (~183 tok/s, ~1.97ร— over no-spec) using the BF16 target + EAGLE-3, see the drafter repo.

Quick start (llama.cpp)

# 1. no-spec baseline
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf

# 2. native MTP speculative decoding (the model's own draft head -- fastest in llama.cpp)
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf \
    --spec-type draft-mtp --spec-draft-n-max 1 --spec-draft-n-min 1

# 3. EAGLE-3 chain (needs the WIP PR #18039 patches + the RS-rollback fix --
#    a one-shot llama.cpp patch script is documented alongside the drafter:
#    https://huggingface.co/Ex0bit/Qwen3.6-27B-PRISM-EAGLE3)
./llama-server --model Qwen3.6-27B-PRISM-PRO-DQ.gguf \
    --spec-type draft-eagle3 --model-draft <eagle3-drafter.gguf> \
    --spec-draft-n-max 2

Provenance

  • Base: Qwen/Qwen3.6-27B (hybrid: 48 GatedDeltaNet linear-attention layers
    • 16 full-attention layers; hidden 5120; vocab 248 320; native MTP head).
  • PRISM Dynamic Quantization: PRISM DQ recipe (llama.cpp GGUF dynamic quant) โ€” preserves the MTP draft head (15 tensors) and the full vision tower (333 tensors).

License

Apache-2.0. Derived from Qwen/Qwen3.6-27B (Apache-2.0).

Downloads last month
1,250
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ

Base model

Qwen/Qwen3.6-27B
Quantized
(373)
this model