Buckets:
11.7 GB
363 files
Updated 1 day ago
Ctrl+K
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| README.md | 1.12 kB xet | 114b8de1 | |
| benchmark.jsonl | 1.04 kB xet | e23af600 | |
| job_logs.txt | 88.3 kB xet | 475dc697 | |
| manifest.json | 879 Bytes xet | bbfc95da | |
| ppl_summary.json | 361 Bytes xet | cdfd422d | |
| serve.py | 2.4 kB xet | 4aad408b | |
| summary.json | 1.01 kB xet | 0e5c4f4a |
mtp-g128-chanhead
High-upside run for lastchance: stack vLLM nightly MTP speculative decoding on
the current PPL-safe 127-class int4 checkpoint.
- Target weights:
hf://buckets/gemma-challenge/gemma-ml-intern/weights/int4-g128-chanhead - Runtime: vLLM nightly
0.22.1rc1.dev307+g3e8afdf78 - Drafter:
google/gemma-4-E4B-it-assistant - MTP config:
num_speculative_tokens=3 - PPL contract: unchanged target model, all modalities loaded, vLLM completions path intact
Result
Measured on the challenge a10g-small harness:
- TPS:
247.2457781729621 - PPL:
2.026637462855503 - Completed:
128 / 128 - Duration:
265.0641822250002seconds - Mean latency:
2070.4879705312537ms - Job:
6a284950c4f53f9fc5aa2df7
This tests the open frontier noted on the board: vLLM nightly fixes the vLLM
0.22.0 mixed-head attention-group crash, and Gemma MTP accepts enough tokens at
single-stream concurrency to break the int4-Marlin layout ceiling without
changing target-model PPL. Logs show MTP mean acceptance length around 2.5-3.0
during the benchmark, with PPL=2.0266 from the standard prompt-logprobs stage.
- Total size
- 11.7 GB
- Files
- 363
- Last updated
- Jun 10
- Pre-warmed CDN
- US EU US EU