LiteRT-LM
marissaw commited on
Commit
08ca8d1
·
verified ·
1 Parent(s): 9ff1fd7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -44,7 +44,7 @@ It uses the Gemma quantization scheme that employs a mixture of 2bit, 4bit and 8
44
  *Note: On [supported Android devices](https://developers.google.com/ml-kit), Gemma 4 is available through Android AI Core as [Gemini Nano](https://developer.android.com/ai/gemini-nano#architecture), which is the recommended path for production applications.*
45
 
46
 
47
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory(MB) |
48
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
49
  | **S26 Ultra** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
50
  | **S26 Ultra** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
@@ -52,35 +52,35 @@ It uses the Gemma quantization scheme that employs a mixture of 2bit, 4bit and 8
52
 
53
  **iOS**
54
 
55
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory(MB) |
56
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
57
  | **iPhone 17 Pro** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
58
  | **iPhone 17 Pro** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
59
 
60
  **Linux**
61
 
62
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory(MB) |
63
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
64
  | **Arm 2.3 & 2.8GHz** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
65
  | **NVIDIA GeForce RTX 4090** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
66
 
67
  **macOS**
68
 
69
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory(MB) |
70
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
71
  | **MacBook Pro M4** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
72
  | **MacBook Pro M4** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
73
 
74
  **Windows**
75
 
76
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory(MB) |
77
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
78
  | **Windows** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
79
  | **Windows** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
80
 
81
  **IoT**
82
 
83
- | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory(MB) |
84
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
85
  | **Raspberry Pi 5 16GB** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
86
  | **Qualcomm IQ-8275 EVK** | NPU | TODO | TODO | TODO | TODO | TODO | TODO |
 
44
  *Note: On [supported Android devices](https://developers.google.com/ml-kit), Gemma 4 is available through Android AI Core as [Gemini Nano](https://developer.android.com/ai/gemini-nano#architecture), which is the recommended path for production applications.*
45
 
46
 
47
+ | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
48
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
49
  | **S26 Ultra** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
50
  | **S26 Ultra** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
 
52
 
53
  **iOS**
54
 
55
+ | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
56
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
57
  | **iPhone 17 Pro** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
58
  | **iPhone 17 Pro** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
59
 
60
  **Linux**
61
 
62
+ | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
63
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
64
  | **Arm 2.3 & 2.8GHz** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
65
  | **NVIDIA GeForce RTX 4090** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
66
 
67
  **macOS**
68
 
69
+ | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
70
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
71
  | **MacBook Pro M4** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
72
  | **MacBook Pro M4** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
73
 
74
  **Windows**
75
 
76
+ | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
77
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
78
  | **Windows** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
79
  | **Windows** | GPU | TODO | TODO | TODO | TODO | TODO | TODO |
80
 
81
  **IoT**
82
 
83
+ | Device                                     | Backend | Quantization scheme | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Model size (MB) | CPU Memory (MB) |
84
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
85
  | **Raspberry Pi 5 16GB** | CPU | TODO | TODO | TODO | TODO | TODO | TODO |
86
  | **Qualcomm IQ-8275 EVK** | NPU | TODO | TODO | TODO | TODO | TODO | TODO |