Qwen3.6-35B-REAP Pruned ratio 0.2

This is a REAP-pruned checkpoint derived from Qwen/Qwen3.6-35B-A3B. The pruning ratio is 0.20.

Pruning

The model was pruned with REAP routed-expert pruning. Expert saliency was computed from router weights and expert activation norms on a 1024-sample calibration set with sequence length 2048. Router weights were renormalized after pruning.

Calibration data used the paper-style composite mixture:

  • theblackcat102/evol-codealpaca-v1
  • Salesforce/xlam-function-calling-60k
  • open-r1/Mixture-of-Thoughts[code]
  • open-r1/Mixture-of-Thoughts[math]
  • open-r1/Mixture-of-Thoughts[science]
  • SWE-bench/SWE-smith-trajectories(tool)

The checkpoint keeps the shared expert path unchanged. The routed MoE layers keep 205 experts per layer and num_experts_per_tok=8.

REAP integration notes

Qwen3.5/Qwen3.6 use a packed MoE layout, so the REAP pipeline was extended with architecture-specific adapters for locating MoE modules, collecting packed-expert activation metrics, slicing routed expert tensors and router rows, and saving reloadable Hugging Face checkpoints while preserving tokenizer and processor files.

Details

  • Base model: Qwen/Qwen3.6-35B-A3B
  • Pruning method: reap
  • Pruning ratio: 0.20
  • Calibration samples: 1024
  • Calibration sequence length: 2048
  • Seed: 42
  • Router renormalization: true
  • Local checkpoint size before upload: 54G

Use the model with trust_remote_code=True.

Downloads last month
40
Safetensors
Model size
29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RangerX/Qwen3.6-35B-REAP-Pruned-ratio-0.2

Finetuned
(154)
this model