Qwopus3.5 122B A10B Kimi-K2.6 Distill Healed Abliterated - Custom GGUF Quantizations

CRITICAL COMPATIBILITY WARNING

These are iqk format quantizations and are EXCLUSIVE to the ik_llama.cpp fork.

They will NOT work on mainline llama.cpp, standard LM Studio, standard Text Generation WebUI, or KoboldCPP.

You must compile and run this using ikawrakow's llama.cpp fork, or a UI where you have manually swapped the backend to an ik_llama.cpp build.


This repository contains custom, mixed-precision ik_llama.cpp GGUF quantizations for OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated, a Kimi-K2.6 distilled, healed, abliterated Qwen3.5 122B A10B MoE model.

These quants use different precision levels for different layer types, keeping attention, SSM, shared expert, output, and MTP/NextN tensors at higher precision while compressing the routed experts, which make up the bulk of the model's size.

⚠️ Disclaimer: The "Vibes Test"

These quantizations have NOT been formally tested for perplexity.

They were compiled as an experiment to see how the model handles shifting bottlenecks. There is no guarantee that they are mathematically optimal or perform flawlessly.

If they pass the vibes test for you, enjoy!

Credits & Acknowledgments


Quantization Recipes

All variants use the same custom tensor buckets: attention, SSM, shared experts, routed experts, embeddings/output, and MTP/NextN tensors.

IQ6_K

Highest quality routed expert quantization in this set.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention Q8_0
SSM Alpha & Beta BF16
SSM Output Q8_0
Shared Experts Q8_0
Routed Experts IQ6_K
MTP / NextN Q8_0

IQ5_K

High quality routed expert quantization with IQ5_K experts.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention Q8_0
SSM Alpha & Beta BF16
SSM Output Q8_0
Shared Experts Q8_0
Routed Experts IQ5_K
MTP / NextN Q8_0

IQ5_KS

High quality routed expert quantization using IQ5_KS experts.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention Q8_0
SSM Alpha & Beta BF16
SSM Output Q8_0
Shared Experts Q8_0
Routed Experts IQ5_KS
MTP / NextN Q8_0

IQ4_K

Balanced 4-bit routed expert quantization with high precision on always-active tensors.

Layer Group Quant
Token Embeddings & Output Q8_0
Attention Q8_0
SSM Alpha & Beta Q8_0
SSM Output Q8_0
Shared Experts Q8_0
Routed Experts IQ4_K
MTP / NextN Q8_0

IQ4_KS

Smaller 4-bit routed expert quantization with compressed embeddings, output, and MTP tensors.

Layer Group Quant
Token Embeddings & Output IQ6_K
Attention Q8_0
SSM Alpha & Beta Q8_0
SSM Output Q8_0
Shared Experts Q8_0
Routed Experts IQ4_KS
MTP / NextN IQ6_K

IQ4_KSS

Ubergarm-style split routed expert recipe.

Layer Group Quant
Token Embeddings & Output IQ6_K
Attention Q8_0
SSM Alpha & Beta Q8_0
SSM Output Q8_0
Shared Experts Q8_0
Routed Experts Down IQ4_KS
Routed Experts Gate/Up IQ4_KSS
MTP / NextN IQ6_K

IQ3_K

Lower size recipe with IQ3_K routed experts and IQ6_K on many always-active tensors.

Layer Group Quant
Token Embeddings & Output IQ6_K
Attention IQ6_K
SSM Alpha & Beta Q8_0
SSM Output IQ6_K
Shared Experts IQ6_K
Routed Experts IQ3_K
MTP / NextN IQ6_K

IQ2_KL

Maximum compression variant in this set.

Layer Group Quant
Token Embeddings IQ4_K
Output IQ6_K
Attention IQ6_K
SSM Alpha & Beta IQ6_K
SSM Output IQ6_K
Shared Experts IQ6_K
Routed Experts Down IQ3_KS
Routed Experts Gate/Up IQ2_KL
MTP / NextN IQ6_K

How to Run

  1. Clone and build the ik_llama.cpp fork from ikawrakow/ik_llama.cpp.
  2. Use the compiled llama-server or llama-cli from that specific build.
  3. For chat templating, use the model's embedded template or the community template credited above, depending on your frontend.

Example llama-server launch command:

./llama-server -m Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-IQ4_KS.gguf -c 8192 -ngl 99 -fa --jinja
Downloads last month
3,517
GGUF
Model size
125B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KeinNiemand/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-IK_GGUF