0xSero commited on
Commit
2b29d03
·
verified ·
1 Parent(s): d40cb10

Standardize model card (template rollout)

Browse files
Files changed (1) hide show
  1. README.md +56 -11
README.md CHANGED
@@ -3,19 +3,50 @@ license: mit
3
  library_name: transformers
4
  pipeline_tag: text-generation
5
  tags:
6
- - deepseek-v4
7
- - mixture-of-experts
8
- - reap
9
- - dgx-spark
10
- - vllm
11
- - long-context
12
- - fp8
13
- - mxfp4
14
- - experimental
15
- base_model: deepseek-ai/DeepSeek-V4-Flash
 
 
16
  ---
17
 
18
- # DeepSeek-V4-Flash-Spark-Mini
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  **162B parameters | K144 REAP-pruned | 200K context | no speculative decoding**
21
 
@@ -193,3 +224,17 @@ Choose this if you value prefill throughput over decode speed, or if you want a
193
  ## License
194
 
195
  MIT for the serving recipe and tooling. The base model weights follow the DeepSeek V4 Flash license. Review it before use.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  library_name: transformers
4
  pipeline_tag: text-generation
5
  tags:
6
+ - deepseek
7
+ - deepseek-v4
8
+ - dgx-spark
9
+ - experimental
10
+ - fp8
11
+ - long-context
12
+ - mixture-of-experts
13
+ - mxfp4
14
+ - reap
15
+ - vllm
16
+ base_model:
17
+ - deepseek-ai/DeepSeek-V4-Flash
18
  ---
19
 
20
+ > [!TIP]
21
+ > **[Support this work →](https://donate.sybilsolutions.ai)** · [X](https://x.com/0xsero) · [GitHub](https://github.com/0xsero) · [REAP paper](https://arxiv.org/abs/2510.13999) · [Cerebras REAP](https://huggingface.co/collections/cerebras/cerebras-reap)
22
+
23
+ # DeepSeek-V4-Flash-162B
24
+
25
+ REAP-pruned [deepseek-ai/DeepSeek-V4-Flash](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash).
26
+
27
+ ## At a glance
28
+
29
+ | | |
30
+ |---|---|
31
+ | Base model | [deepseek-ai/DeepSeek-V4-Flash](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) |
32
+ | Format | BF16 |
33
+ | Total params | **162B** |
34
+ | Active / token | — |
35
+ | Experts / layer | 144 |
36
+ | Layers | 43 |
37
+ | Hidden size | 4096 |
38
+ | Context | 1,048,576 |
39
+ | On-disk size | 94 GB |
40
+
41
+ ## Which variant should I pick?
42
+
43
+ | Variant | Format | Link |
44
+ |---|---|---|
45
+ | `DeepSeek-V4-Flash-162B` **(this)** | BF16 | [link](https://huggingface.co/0xSero/DeepSeek-V4-Flash-162B) |
46
+ | `DeepSeek-V4-Flash-162B-GGUF` | GGUF | [link](https://huggingface.co/0xSero/DeepSeek-V4-Flash-162B-GGUF) |
47
+ | `DeepSeek-V4-Flash-180B` | BF16 | [link](https://huggingface.co/0xSero/DeepSeek-V4-Flash-180B) |
48
+ | `DeepSeek-V4-Flash-180B-GGUF` | GGUF | [link](https://huggingface.co/0xSero/DeepSeek-V4-Flash-180B-GGUF) |
49
+ | `DeepSeek-V4-Flash-213B` | BF16 | [link](https://huggingface.co/0xSero/DeepSeek-V4-Flash-213B) |
50
 
51
  **162B parameters | K144 REAP-pruned | 200K context | no speculative decoding**
52
 
 
224
  ## License
225
 
226
  MIT for the serving recipe and tooling. The base model weights follow the DeepSeek V4 Flash license. Review it before use.
227
+
228
+ ## License & citation
229
+ License inherited from the base model.
230
+
231
+ ```bibtex
232
+ @misc{lasby2025reap,
233
+ title = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
234
+ author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
235
+ year = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
236
+ }
237
+ ```
238
+
239
+ ## Sponsors
240
+ Made possible by **NVIDIA · TNG Technology · Lambda · Prime Intellect · Hot Aisle**.