--- license: apache-2.0 base_model: Qwen/Qwen3.6-35B-A3B library_name: transformers tags: - qwen - qwen3.6 - mixture-of-experts - moe - pruning - reap - bitsandbytes - bnb8 --- # Qwen3.6-35B-A3B Pre-REAP BNB8 Pruned Ratio 0.3 This checkpoint is a routed-expert-pruned version of `Qwen/Qwen3.6-35B-A3B`. Expert saliency was collected with REAP while loading the scoring model with bitsandbytes 8-bit quantization for standard linear layers. The final saved checkpoint is the original BF16 model pruned according to those quantization-aware scores. ## Pruning setup - Base model: `Qwen/Qwen3.6-35B-A3B` - Method: REAP routed-expert pruning - Pre-REAP scoring model quantization: bitsandbytes 8-bit - Pruning ratio: 0.30 - Calibration samples: 1024 - Sequence length: 2048 - Seed: 42 - Router renormalization: enabled - Shared experts: preserved ## Notes This model uses the packed Qwen3.5/Qwen3.6 MoE integration in the REAP codebase. During bnb8 scoring, standard `nn.Linear` modules are quantized by bitsandbytes, while packed routed-expert tensors remain BF16 parameters. The checkpoint itself is saved after pruning in BF16 format and can be loaded with Transformers using `trust_remote_code=True`. Evaluation is still in progress for this specific bnb8-pruned checkpoint. Prior comparison runs use plain lm-eval prompts for GSM8K, without chat templating.