Qwen3.5 122B A10B Abliterix - Custom GGUF Quantizations

🚨 CRITICAL COMPATIBILITY WARNING 🚨 These are iqk format quantizations and are EXCLUSIVE to the ik_llama.cpp fork. They will NOT work on mainline llama.cpp, standard LM Studio, standard Text Generation WebUI, or KoboldCPP. You must compile and run this using ikawrakow's llama.cpp fork (or a UI where you have manually swapped the backend to an ik_llama build).


This repository contains custom, mixed-precision ik_llama.cpp GGUF quantizations for wangzhang/Qwen3.5-122B-A10B-abliterix, an abliterated version of Qwen/Qwen3.5-122B-A10B.

These quants use different precision levels for different layer types, keeping attention and shared expert layers at high precision while compressing the routed experts (which make up the bulk of the model's size) to various IQK quantization levels.

⚠️ Disclaimer: The "Vibes Test"

These quantizations have NOT been formally tested for perplexity. They were compiled as an experiment to see how the model handles shifting bottlenecks. There is no guarantee that they are mathematically optimal or perform flawlessly. They are provided entirely as-is. If they pass the vibes test for you, enjoy!

🙏 Credits & Acknowledgments


🛠️ Quantization Recipes

All variants share the same structure: high precision on attention/gating layers and shared experts, with the routed expert layers (the bulk of model size) quantized to varying levels.

IQ4_KS

Balances upgraded routed experts with compressed embeddings to save VRAM.

Layer Group Quant
Token Embeddings & Output IQ6_K
Attention / Delta Net Q8_0
SSM Alpha & Beta Q8_0
Shared Experts Q8_0
Routed Experts IQ4_KS

IQ4_K

Spends a bit more VRAM for full Q8_0 precision on the vocabulary, with slightly heavier experts.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention / Delta Net Q8_0
SSM Alpha & Beta Q8_0
Shared Experts Q8_0
Routed Experts IQ4_K

IQ4_KSS

Uses split quant levels on routed experts (down vs gate/up) with compressed embeddings.

Layer Group Quant
Token Embeddings & Output IQ6_K
Attention / Delta Net Q8_0
SSM Alpha & Beta Q8_0
Shared Experts Q8_0
Routed Experts (down) IQ4_KS
Routed Experts (gate/up) IQ4_KSS

IQ5_KS

Steps up to 5-bit routed experts with full-precision SSM alpha/beta weights.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention / Delta Net Q8_0
SSM Alpha & Beta F32
Shared Experts Q8_0
Routed Experts IQ5_KS

IQ5_K

Same structure as IQ5_KS but using IQ5_K for the routed experts.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention / Delta Net Q8_0
SSM Alpha & Beta F32
Shared Experts Q8_0
Routed Experts IQ5_K

IQ6_K

Highest quality routed expert quantization with full-precision SSM alpha/beta.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention / Delta Net Q8_0
SSM Alpha & Beta F32
Shared Experts Q8_0
Routed Experts IQ6_K

IQ2_KL

Maximum compression variant. Drops attention layers to IQ6_K and uses aggressive 2-3 bit routed expert quantization.

Layer Group Quant
Token Embeddings IQ4_K
Output IQ6_K
Attention / Delta Net IQ6_K
SSM Alpha & Beta IQ6_K
Shared Experts IQ6_K
Routed Experts (down) IQ3_KS
Routed Experts (gate/up) IQ2_KL

💻 How to Run

  1. Clone and build the ik_llama.cpp fork from ikawrakow/ik_llama.cpp.
  2. Use the compiled llama-server or llama-cli from that specific build.

Example llama-server launch command:

./llama-server -m Qwen3.5-122B-A10B-abliterix-IQ4_KS.gguf -c 8192 -ngl 99 -fa
Downloads last month
167
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KeinNiemand/Qwen3.5-122B-A10B-abliterix-IK_GGUF

Quantized
(16)
this model