RangerX's picture
Add model card
69dd897 verified
|
Raw
History Blame Contribute Delete
1.37 kB
---
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
library_name: transformers
tags:
- qwen
- qwen3.6
- mixture-of-experts
- moe
- pruning
- reap
- bitsandbytes
- bnb8
---
# Qwen3.6-35B-A3B Pre-REAP BNB8 Pruned Ratio 0.3
This checkpoint is a routed-expert-pruned version of `Qwen/Qwen3.6-35B-A3B`.
Expert saliency was collected with REAP while loading the scoring model with
bitsandbytes 8-bit quantization for standard linear layers. The final saved
checkpoint is the original BF16 model pruned according to those quantization-aware
scores.
## Pruning setup
- Base model: `Qwen/Qwen3.6-35B-A3B`
- Method: REAP routed-expert pruning
- Pre-REAP scoring model quantization: bitsandbytes 8-bit
- Pruning ratio: 0.30
- Calibration samples: 1024
- Sequence length: 2048
- Seed: 42
- Router renormalization: enabled
- Shared experts: preserved
## Notes
This model uses the packed Qwen3.5/Qwen3.6 MoE integration in the REAP codebase.
During bnb8 scoring, standard `nn.Linear` modules are quantized by bitsandbytes,
while packed routed-expert tensors remain BF16 parameters. The checkpoint itself
is saved after pruning in BF16 format and can be loaded with Transformers using
`trust_remote_code=True`.
Evaluation is still in progress for this specific bnb8-pruned checkpoint. Prior
comparison runs use plain lm-eval prompts for GSM8K, without chat templating.