LiteRT-LM
marissaw commited on
Commit
26e9dbb
·
verified ·
1 Parent(s): 2fa50dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -73,6 +73,13 @@ It uses the Gemma quantization scheme that employs a mixture of 2bit, 4bit and 8
73
  | MacBook Pro M4 Max | CPU | 901 | 41.6 | 1.1 | 2583 | 736 |
74
  | MacBook Pro M4 Max | GPU | 7,835 | 160.2 | 0.1 | 2583 | 1623 |
75
 
 
 
 
 
 
 
 
76
  **IoT**
77
 
78
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
 
73
  | MacBook Pro M4 Max | CPU | 901 | 41.6 | 1.1 | 2583 | 736 |
74
  | MacBook Pro M4 Max | GPU | 7,835 | 160.2 | 0.1 | 2583 | 1623 |
75
 
76
+ **Windows**
77
+
78
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) |
79
+ | :---- | :---- | :---- | :---- | :---- | :---- |
80
+ | Intel LunarLake | CPU | 435 | 29.8 | 2.39 | 2583 |
81
+ | Intel LunarLake | GPU | 3,751 | 48.4 | 0.29 | 2583 |
82
+
83
  **IoT**
84
 
85
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |