Qwen3.6-35B-A3B-abliterated-MAX | APEX i-nano (2.72 BPW)

This model was quantized using apex-quant with the i-nano profile and an importance matrix calibrated on a diverse code/math/reasoning dataset.

Quantization Details

Property Value
Base Model prithivMLmods/Qwen3.6-35B-A3B-abliterated-MAX
Quantizer mudler/apex-quant
Profile i-nano (importance-matrix calibrated)
BPW 2.72
File Size ~11 GB
Layers 40
Calibration Data tomngdev/imatrix-calibration-data

What is APEX Quantization?

APEX applies a per-layer, per-tensor quantization gradient _ higher precision on edge layers (first and last ~5), aggressive quantization on the middle layers, with separate handling for routed experts, shared experts, attention weights, and SSM weights. The i-nano variant uses importance matrix calibration to enable very low-bit formats (IQ2_S, IQ2_XXS) on middle-layer expert weights while preserving output quality.

Usage

Run with any recent llama.cpp build, no custom fork or patches required:

# CLI
./llama-cli -m Qwen3.6-35B-A3B-abliterated-MAX-APEX-i-nano.gguf -p "Your prompt here"

# Server
./llama-server -m Qwen3.6-35B-A3B-abliterated-MAX-APEX-i-nano.gguf--host 0.0.0.0 --port 8080

Files

File Description
Qwen3.6-35B-A3B-abliterated-MAX-APEX-i-nano.gguf The quantized model (~11 GB)
imatrix.dat Importance matrix used for calibration
Downloads last month
415
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hyphonical/Qwen3.6-35B-A3B-abliterated-MAX-APEX-i-nano-GGUF

Quantized
(4)
this model