How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf barozp/Qwen3.6-28B-REAP20-A3B-GGUF:
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "barozp/Qwen3.6-28B-REAP20-A3B-GGUF:"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

Qwen3.6-28B-REAP20-A3B — GGUF Quantizations

GGUF quantizations of 0xSero/Qwen3.6-28B-REAP20-A3B, a 20% expert-pruned variant of Qwen/Qwen3.6-35B-A3B using the REAP (Router-weighted Expert Activation Pruning) method.

Available Files

File Quant Size BPW Description
Qwen3.6-28B-REAP20-A3B-BF16.gguf BF16 ~56.5 GB 16.0 Full precision, for re-quantization
Qwen3.6-28B-REAP20-A3B-Q8_0.gguf Q8_0 ~30 GB 8.0 Near-lossless, large file
Qwen3.6-28B-REAP20-A3B-Q6_K.gguf Q6_K ~23 GB 6.56 Near-lossless, recommended for high quality
Qwen3.6-28B-REAP20-A3B-Q5_K_M.gguf Q5_K_M ~20 GB 5.68 High quality, larger size
Qwen3.6-28B-REAP20-A3B-Q5_K_S.gguf Q5_K_S ~19 GB 5.52 High quality, slightly smaller
Qwen3.6-28B-REAP20-A3B-Q4_K_M.gguf Q4_K_M ~17 GB 4.89 Recommended — best quality/size balance
Qwen3.6-28B-REAP20-A3B-Q4_K_S.gguf Q4_K_S ~16 GB 4.63 4-bit small
Qwen3.6-28B-REAP20-A3B-Q3_K_L.gguf Q3_K_L ~15 GB 4.27 3-bit large
Qwen3.6-28B-REAP20-A3B-Q3_K_M.gguf Q3_K_M ~14 GB 3.91 3-bit medium
Qwen3.6-28B-REAP20-A3B-Q3_K_S.gguf Q3_K_S ~13 GB 3.66 3-bit small
Qwen3.6-28B-REAP20-A3B-IQ3_XXS.gguf IQ3_XXS ~12 GB 3.06 Ultra-small, imatrix-based
Qwen3.6-28B-REAP20-A3B-Q2_K.gguf Q2_K ~11 GB 2.96 Smallest size, lowest quality

Model Details

Property Value
Architecture Qwen3.6 MoE (hybrid Gated DeltaNet + MoE)
Parameters ~28B total / ~3B active per token
Experts 205 total / 8 active per token (pruned from 256)
Context Length 262,144 tokens
Original dtype BF16
Quantization source BF16 GGUF from 0xSero/Qwen3.6-28B-REAP20-A3B-GGUF
Quantization tool llama.cpp
imatrix Used for IQ3_XXS (from source repo)
License Apache 2.0

Quantization Process

# 1. Download BF16 GGUF from source
huggingface-cli download 0xSero/Qwen3.6-28B-REAP20-A3B-GGUF \
  --include "model.bf16.gguf" --local-dir ./

# 2. Download imatrix (for IQ quants)
huggingface-cli download 0xSero/Qwen3.6-28B-REAP20-A3B-GGUF \
  --include "imatrix.dat" --local-dir ./

# 3. Quantize (example: Q4_K_M)
llama-quantize model.bf16.gguf Qwen3.6-28B-REAP20-A3B-Q4_K_M.gguf Q4_K_M

# 4. Quantize with imatrix (example: IQ3_XXS)
llama-quantize --imatrix imatrix.dat model.bf16.gguf \
  Qwen3.6-28B-REAP20-A3B-IQ3_XXS.gguf IQ3_XXS

Usage

llama.cpp

llama-cli \
  -m Qwen3.6-28B-REAP20-A3B-Q4_K_M.gguf \
  -ngl 99 -c 4096 \
  -p "Your prompt here"

llama-server (OpenAI-compatible API)

llama-server \
  -m Qwen3.6-28B-REAP20-A3B-Q4_K_M.gguf \
  -ngl 99 -c 4096 \
  --port 8080

LM Studio / Jan / Ollama

Download the .gguf file and load it directly in your preferred local inference UI.

Hardware Requirements

Config VRAM / RAM
Full GPU (Q4_K_M, recommended) 20+ GB VRAM
Hybrid CPU+GPU (Q4_K_M) 10 GB VRAM + 10 GB RAM
CPU only (Q4_K_M) 24+ GB RAM

About the Original Model

0xSero/Qwen3.6-28B-REAP20-A3B applies REAP expert pruning (arXiv:2510.13999) to remove 20% of MoE experts (51 of 256 per layer) from Qwen3.6-35B-A3B, while preserving routing behavior via router weight renormalization. Active parameters per token remain unchanged at ~3B. The result is a ~25% smaller model with competitive generation quality across coding, reasoning, and knowledge benchmarks.

License

Apache 2.0 — see Qwen License.

Downloads last month
1,998
GGUF
Model size
28B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for barozp/Qwen3.6-28B-REAP20-A3B-GGUF

Quantized
(4)
this model

Paper for barozp/Qwen3.6-28B-REAP20-A3B-GGUF