LiteRT-LM
mattkreileder commited on
Commit
9472425
·
verified ·
1 Parent(s): 757a6b6

Add memory consumption to Windows LunarLake

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -75,10 +75,10 @@ It uses the Gemma quantization scheme that employs a mixture of 2bit, 4bit and 8
75
 
76
  **Windows**
77
 
78
- | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) |
79
- | :---- | :---- | :---- | :---- | :---- | :---- |
80
- | Intel LunarLake | CPU | 435 | 29.8 | 2.39 | 2583 |
81
- | Intel LunarLake | GPU | 3,751 | 48.4 | 0.29 | 2583 |
82
 
83
  **IoT**
84
 
 
75
 
76
  **Windows**
77
 
78
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU/GPU Memory (MB) |
79
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
80
+ | Intel LunarLake | CPU | 435 | 29.8 | 2.39 | 2583 | 3505 |
81
+ | Intel LunarLake | GPU | 3,751 | 48.4 | 0.29 | 2583 | 3540 |
82
 
83
  **IoT**
84