inferencerlabs commited on
Commit
1774838
·
verified ·
1 Parent(s): c0fc69e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -15,9 +15,9 @@ pipeline_tag: text-generation
15
  - Text inference: ~25.98 tokens/s @ 1000 tokens ~145.11 GiB (debug build)
16
 
17
  <p style="margin-bottom:0px;">
18
- <strong>Q9-EXP is an experimental build of DeepSeek-V4-Flash</strong>
19
 
20
- In this build, the 4-bit pre-quantized weights of the base model were repacked (rather than dequantized and re-quantized to 9-bit), as this approach performed slightly better in our initial coding tests. All remaining weights were quantized to 9-bit. It also includes a temporary chat template. Stay tuned for updates.
21
  </p>
22
 
23
  ![Screenshot](https://cdn-uploads.huggingface.co/production/uploads/688479d616f1ec82fa645019/ueGDAcsebpcWYhuU9Gvn4.jpeg)
 
15
  - Text inference: ~25.98 tokens/s @ 1000 tokens ~145.11 GiB (debug build)
16
 
17
  <p style="margin-bottom:0px;">
18
+ <strong>Q9 typically achieves near lossless accuracy in our coding test</strong>
19
 
20
+ In this build, the 4-bit pre-quantized weights of the base model were repacked (rather than dequantized and re-quantized to 9-bit), as this approach performed slightly better in our initial coding tests. All remaining weights were quantized to 9-bit.
21
  </p>
22
 
23
  ![Screenshot](https://cdn-uploads.huggingface.co/production/uploads/688479d616f1ec82fa645019/ueGDAcsebpcWYhuU9Gvn4.jpeg)