LiteRT-LM
marissaw commited on
Commit
3d723db
·
verified ·
1 Parent(s): e84e283

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -98,7 +98,7 @@ Running Gemma inference on the web is currently supported through [LLM Inference
98
 
99
  Benchmarked in Chrome on a MacBook Pro 2024 (Apple M4 Max) with 1024 prefill tokens and 256 decode tokens, but the model can support context lengths up to 128K.
100
 
101
- | Device | Backend | Prefill (tokens/sec) | Decode (tokens/sec) | Initialization time (sec) | Model size (MB) | CPU Memory (GB) | GPU Memory (MB) |
102
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
103
  | Web | GPU | 4,676 | 73.9 | 1.1 | 2004 | 1.5 | 1.8 |
104
 
 
98
 
99
  Benchmarked in Chrome on a MacBook Pro 2024 (Apple M4 Max) with 1024 prefill tokens and 256 decode tokens, but the model can support context lengths up to 128K.
100
 
101
+ | Device | Backend | Prefill (tokens/sec) | Decode (tokens/sec) | Initialization time (sec) | Model size (MB) | CPU Memory (GB) | GPU Memory (GB) |
102
  | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
103
  | Web | GPU | 4,676 | 73.9 | 1.1 | 2004 | 1.5 | 1.8 |
104