LiteRT-LM
marissaw commited on
Commit
242c4cb
·
verified ·
1 Parent(s): 08ca8d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -24
README.md CHANGED
@@ -44,46 +44,46 @@ It uses the Gemma quantization scheme that employs a mixture of 2bit, 4bit and 8
44
  *Note: On [supported Android devices](https://developers.google.com/ml-kit), Gemma 4 is available through Android AI Core as [Gemini Nano](https://developer.android.com/ai/gemini-nano#architecture), which is the recommended path for production applications.*
45
 
46
 
47
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
48
- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
49
- | **S26 Ultra** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
50
- | **S26 Ultra** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
51
 
52
 
53
  **iOS**
54
 
55
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
56
- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
57
- | **iPhone 17 Pro** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
58
- | **iPhone 17 Pro** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
59
 
60
  **Linux**
61
 
62
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
63
- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
64
- | **Arm 2.3 & 2.8GHz** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
65
- | **NVIDIA GeForce RTX 4090** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
66
 
67
  **macOS**
68
 
69
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
70
- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
71
- | **MacBook Pro M4** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
72
- | **MacBook Pro M4** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
73
 
74
  **Windows**
75
 
76
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
77
- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
78
- | **Windows** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
79
- | **Windows** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
80
 
81
  **IoT**
82
 
83
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
84
- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
85
- | **Raspberry Pi 5 16GB** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
86
- | **Qualcomm IQ-8275 EVK** | NPU | TODO | TODO | TODO | TODO | TODO | TODO |
87
 
88
 
89
  ## Gemma 4 E2B Performance on Web
 
44
  *Note: On [supported Android devices](https://developers.google.com/ml-kit), Gemma 4 is available through Android AI Core as [Gemini Nano](https://developer.android.com/ai/gemini-nano#architecture), which is the recommended path for production applications.*
45
 
46
 
47
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first-token</span> (sec) | Model size (MB) | CPU Memory (MB) |
48
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
49
+ | **S26 Ultra** | CPU | TODO | TODO | TODO | TODO | TODO |
50
+ | **S26 Ultra** | GPU | TODO | TODO | TODO | TODO | TODO |
51
 
52
 
53
  **iOS**
54
 
55
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first-token</span> (sec) | Model size (MB) | CPU Memory (MB) |
56
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
57
+ | **iPhone 17 Pro** | CPU | TODO | TODO | TODO | TODO | TODO |
58
+ | **iPhone 17 Pro** | GPU | TODO | TODO | TODO | TODO | TODO |
59
 
60
  **Linux**
61
 
62
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first-token</span> (sec) | Model size (MB) | CPU Memory (MB) |
63
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
64
+ | **Arm 2.3 & 2.8GHz** | CPU | TODO | TODO | TODO | TODO | TODO |
65
+ | **NVIDIA GeForce RTX 4090** | GPU | TODO | TODO | TODO | TODO | TODO |
66
 
67
  **macOS**
68
 
69
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first-token</span> (sec) | Model size (MB) | CPU Memory (MB) |
70
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
71
+ | **MacBook Pro M4** | CPU | TODO | TODO | TODO | TODO | TODO |
72
+ | **MacBook Pro M4** | GPU | TODO | TODO | TODO | TODO | TODO |
73
 
74
  **Windows**
75
 
76
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first-token</span> (sec) | Model size (MB) | CPU Memory (MB) |
77
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
78
+ | **Windows** | CPU | TODO | TODO | TODO | TODO | TODO |
79
+ | **Windows** | GPU | TODO | TODO | TODO | TODO | TODO |
80
 
81
  **IoT**
82
 
83
+ | Device &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Backend | Prefill (tokens/sec) | Decode (tokens/sec) | <span style="white-space: nowrap;">Time-to-first-token</span> (sec) | Model size (MB) | CPU Memory (MB) |
84
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
85
+ | **Raspberry Pi 5 16GB** | CPU | TODO | TODO | TODO | TODO | TODO |
86
+ | **Qualcomm IQ-8275 EVK** | NPU | TODO | TODO | TODO | TODO | TODO |
87
 
88
 
89
  ## Gemma 4 E2B Performance on Web