Qwen3.6-35B-A3B Packed-Qwen BNB4 Pre-REAP Pruned Ratio 0.3

This checkpoint is derived from Qwen/Qwen3.6-35B-A3B using REAP routed-expert pruning with pruning ratio 0.30. It differs from earlier pre-REAP bnb4 experiments by quantizing the packed Qwen routed experts and router during REAP score collection, not only standard nn.Linear modules.

The final checkpoint is saved from the full-precision/BF16 model after pruning. It is not a bitsandbytes-quantized checkpoint.

Pruning setup

  • Base model: Qwen/Qwen3.6-35B-A3B
  • Method: REAP routed-expert pruning
  • Pre-REAP scoring quantization: bitsandbytes 4-bit NF4, BF16 compute, double quantization enabled
  • Packed-Qwen scoring coverage: standard linear layers plus packed routed experts and router
  • Pruning ratio: 0.30
  • Routed experts before pruning: 256 per MoE layer
  • Routed experts pruned: 76 per MoE layer
  • Routed experts retained: 180 per MoE layer
  • num_experts_per_tok: 8
  • Shared experts: preserved
  • Calibration samples: 1024
  • Sequence length: 2048
  • Seed: 42
  • Router renormalization: enabled

Notes

REAP saliency was collected with a quantization-aware scoring model, then the quantized scoring model was discarded and the original BF16 checkpoint was reloaded for structural pruning and saving.

Use with Transformers and trust_remote_code=True.

Downloads last month
5
Safetensors
Model size
26B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RangerX/Qwen3.6-35B-PreREAP-BNB4-PackedQwen-Pruned-ratio-0.3

Finetuned
(154)
this model