--- base_model: Qwen/Qwen3.6-35B-A3B library_name: transformers license: apache-2.0 tags: - qwen3.6 - moe - reap - pruning - bitsandbytes --- # Qwen3.6-35B-A3B REAP Pruned Ratio 0.3 with Pre-REAP BNB4 Scoring This model is derived from `Qwen/Qwen3.6-35B-A3B` using REAP routed-expert pruning with a pruning ratio of 0.3. Saliency scores were collected from a pre-REAP `bitsandbytes` 4-bit scoring model, then the original BF16 checkpoint was reloaded, pruned, and saved. ## Pruning Setup - Base model: `Qwen/Qwen3.6-35B-A3B` - Method: REAP routed-expert pruning - Pre-REAP scoring model: `bitsandbytes` 4-bit NF4, BF16 compute, double quantization enabled - Final checkpoint dtype: saved from the original full-precision/BF16 model after pruning; this is not a quantized checkpoint - Pruning ratio: 0.3 - Routed experts before pruning: 256 per MoE layer - Routed experts pruned: 76 per MoE layer - Routed experts retained: 180 per MoE layer - Shared experts: preserved - Calibration samples: 1024 - Sequence length: 2048 - Seed: 42 - Router renormalization: true ## Notes This checkpoint uses the packed Qwen3.5/Qwen3.6 REAP integration. The bnb4 quantized model was used only for saliency score collection; pruning and saving were applied to the original model weights.