inferencerlabs commited on
Commit
a9d7bb3
·
verified ·
1 Parent(s): a62b1f5

Upload model file

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ pipeline_tag: text-generation
15
  - Text inference: ~25.98 tokens/s @ 1000 tokens ~145.11 GiB (debug build)
16
 
17
  <p style="margin-bottom:0px;">
18
- <strong>Q9-EXP is an experimental build for DeepSeek-V4-Flash</strong>
19
 
20
  In this build, the 4-bit pre-quantized weights of the base model were repacked (rather than dequantized and re-quantized to 9-bit), as this approach performed slightly better in our initial coding tests. All remaining weights were quantized to 9-bit. It also includes a temporary chat template. Stay tuned for updates.
21
  </p>
 
15
  - Text inference: ~25.98 tokens/s @ 1000 tokens ~145.11 GiB (debug build)
16
 
17
  <p style="margin-bottom:0px;">
18
+ <strong>Q9-EXP is an experimental build of DeepSeek-V4-Flash</strong>
19
 
20
  In this build, the 4-bit pre-quantized weights of the base model were repacked (rather than dequantized and re-quantized to 9-bit), as this approach performed slightly better in our initial coding tests. All remaining weights were quantized to 9-bit. It also includes a temporary chat template. Stay tuned for updates.
21
  </p>