lovedheart commited on
Commit
c0d40da
·
verified ·
1 Parent(s): 1f5780e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen3-Next-80B-A3B-Instruct
4
+ tags:
5
+ - text-generation-inference
6
+ license: apache-2.0
7
+ ---
8
+
9
+
10
+
11
+ ![qwen3-next-instruction](https://cdn-uploads.huggingface.co/production/uploads/68121d80da035a609e569a81/Ft9cmZlll_PehtFYkESxH.png)
12
+
13
+ **Qwen3-Next-REAP-40B-A3B-Instruct** has the following specifications:
14
+
15
+ - **Type:** Causal Language Models
16
+ - **Number of Parameters**: 40B in total and 3B activated
17
+ - **Hidden Dimension**: 2048
18
+ - **Number of Layers**: 48
19
+ - **Hybrid Layout**: 12 * (3 * (Gated DeltaNet -> MoE) -> 1 * (Gated Attention -> MoE))
20
+ - **Gated Attention**:
21
+ - **Number of Attention Heads**: 16 for Q and 2 for KV
22
+ - **Head Dimension**: 256
23
+ - **Rotary Position Embedding Dimension**: 64
24
+ - **Gated DeltaNet**:
25
+ **Number of Linear Attention Heads: 32 for V and 16 for QK
26
+ **Head Dimension: 128
27
+ - **Mixture of Experts**:
28
+ - **Number of Experts: 256 (uniformly pruned from 512)
29
+ - **Number of Activated Experts: 10
30
+ - **Number of Shared Experts: 1
31
+ - **Context Length**: 262,144 natively and extensible up to 1,010,000 tokens
32
+ - **Compression Method**: REAP (Router-weighted Expert Activation Pruning)
33
+ - **Compression Ratio**: 50% expert pruning