--- base_model: - Qwen/Qwen3-Next-80B-A3B-Instruct tags: - text-generation-inference license: apache-2.0 --- ![qwen3-next-instruction](https://cdn-uploads.huggingface.co/production/uploads/68121d80da035a609e569a81/Ft9cmZlll_PehtFYkESxH.png) **Qwen3-Next-REAP-40B-A3B-Instruct** has the following specifications: - **Type:** Causal Language Models - **Number of Parameters**: 40B in total and 3B activated - **Hidden Dimension**: 2048 - **Number of Layers**: 48 - **Hybrid Layout**: 12 * (3 * (Gated DeltaNet -> MoE) -> 1 * (Gated Attention -> MoE)) - **Gated Attention**: - **Number of Attention Heads**: 16 for Q and 2 for KV - **Head Dimension**: 256 - **Rotary Position Embedding Dimension**: 64 - **Gated DeltaNet**: **Number of Linear Attention Heads: 32 for V and 16 for QK **Head Dimension: 128 - **Mixture of Experts**: - **Number of Experts: 256 (uniformly pruned from 512) - **Number of Activated Experts: 10 - **Number of Shared Experts: 1 - **Context Length**: 262,144 natively and extensible up to 1,010,000 tokens - **Compression Method**: REAP (Router-weighted Expert Activation Pruning) - **Compression Ratio**: 50% expert pruning