LiteRT-LM
marissaw commited on
Commit
95d1083
·
verified ·
1 Parent(s): 3f65644

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -46,44 +46,44 @@ It uses the Gemma quantization scheme that employs a mixture of 2bit, 4bit and 8
46
 
47
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
48
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
49
- | **S26 Ultra** | CPU | 557 | 46.9 | 1.8 | 2583 | 1733 |
50
- | **S26 Ultra** | GPU | 3,808 | 52.1 | 0.3 | 2583 | 676 |
51
 
52
 
53
  **iOS**
54
 
55
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
56
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
57
- | **iPhone 17 Pro** | CPU | 532 | 25.0 | 1.9 | 2583 | 607 |
58
- | **iPhone 17 Pro** | GPU | 2,878 | 56.5 | 0.3 | 2583 | 1450 |
59
 
60
  **Linux**
61
 
62
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
63
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
64
- | **Arm 2.3 & 2.8GHz** | CPU | 260 | 35.0 | 4.0 | 2583 | 1628 |
65
- | **NVIDIA GeForce RTX 4090** | GPU | 11,234 | 143.4 | 0.1 | 2583 | 913 |
66
 
67
  **macOS**
68
 
69
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
70
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
71
- | **MacBook Pro M4** | CPU | TODO | TODO | TODO | TODO | TODO |
72
- | **MacBook Pro M4** | GPU | TODO | TODO | TODO | TODO | TODO |
73
 
74
  **Windows**
75
 
76
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
77
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
78
- | **Windows** | CPU | TODO | TODO | TODO | TODO | TODO |
79
- | **Windows** | GPU | TODO | TODO | TODO | TODO | TODO |
80
 
81
  **IoT**
82
 
83
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
84
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
85
- | **Raspberry Pi 5 16GB** | CPU | 133 | 7.6 | 7.8 | 2583 | 1546 |
86
- | **Qualcomm IQ-8275 EVK** | NPU* | 2371 | 18.8 | 0.5 | 2688 | 1471 |
87
 
88
  \* NPU model is benchmarked with 4096 context length
89
 
 
46
 
47
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
48
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
49
+ | S26 Ultra | CPU | 557 | 46.9 | 1.8 | 2583 | 1733 |
50
+ | S26 Ultra | GPU | 3,808 | 52.1 | 0.3 | 2583 | 676 |
51
 
52
 
53
  **iOS**
54
 
55
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
56
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
57
+ | iPhone 17 Pro | CPU | 532 | 25.0 | 1.9 | 2583 | 607 |
58
+ | iPhone 17 Pro | GPU | 2,878 | 56.5 | 0.3 | 2583 | 1450 |
59
 
60
  **Linux**
61
 
62
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
63
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
64
+ | Arm 2.3 & 2.8GHz | CPU | 260 | 35.0 | 4.0 | 2583 | 1628 |
65
+ | NVIDIA GeForce RTX 4090 | GPU | 11,234 | 143.4 | 0.1 | 2583 | 913 |
66
 
67
  **macOS**
68
 
69
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
70
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
71
+ | MacBook Pro M4 | CPU | TODO | TODO | TODO | TODO | TODO |
72
+ | MacBook Pro M4 | GPU | TODO | TODO | TODO | TODO | TODO |
73
 
74
  **Windows**
75
 
76
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
77
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
78
+ | Windows | CPU | TODO | TODO | TODO | TODO | TODO |
79
+ | Windows | GPU | TODO | TODO | TODO | TODO | TODO |
80
 
81
  **IoT**
82
 
83
  | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first</span>-token (sec) | Model size (MB) | CPU Memory (MB) |
84
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
85
+ | Raspberry Pi 5 16GB | CPU | 133 | 7.6 | 7.8 | 2583 | 1546 |
86
+ | Qualcomm IQ-8275 EVK | NPU* | 2371 | 18.8 | 0.5 | 2688 | 1471 |
87
 
88
  \* NPU model is benchmarked with 4096 context length
89