--- base_model: Qwen/Qwen3.6-35B-A3B pipeline_tag: text-generation tags: - qwen3.6 - mixture-of-experts - expert-pruning - reap - text-generation --- # Qwen3.6-35B-REAP Pruned ratio 0.2 This is a REAP-pruned checkpoint derived from `Qwen/Qwen3.6-35B-A3B`. The pruning ratio is `0.20`. ## Pruning The model was pruned with REAP routed-expert pruning. Expert saliency was computed from router weights and expert activation norms on a 1024-sample calibration set with sequence length 2048. Router weights were renormalized after pruning. Calibration data used the paper-style composite mixture: - `theblackcat102/evol-codealpaca-v1` - `Salesforce/xlam-function-calling-60k` - `open-r1/Mixture-of-Thoughts[code]` - `open-r1/Mixture-of-Thoughts[math]` - `open-r1/Mixture-of-Thoughts[science]` - `SWE-bench/SWE-smith-trajectories(tool)` The checkpoint keeps the shared expert path unchanged. The routed MoE layers keep `205` experts per layer and `num_experts_per_tok=8`. ## REAP integration notes Qwen3.5/Qwen3.6 use a packed MoE layout, so the REAP pipeline was extended with architecture-specific adapters for locating MoE modules, collecting packed-expert activation metrics, slicing routed expert tensors and router rows, and saving reloadable Hugging Face checkpoints while preserving tokenizer and processor files. ## Details - Base model: `Qwen/Qwen3.6-35B-A3B` - Pruning method: `reap` - Pruning ratio: `0.20` - Calibration samples: 1024 - Calibration sequence length: 2048 - Seed: 42 - Router renormalization: true - Local checkpoint size before upload: 54G Use the model with `trust_remote_code=True`.