Buckets:

11.7 GB
363 files
Updated 1 day ago
Name
Size
README.md1.12 kB
xet
benchmark.jsonl1.04 kB
xet
job_logs.txt88.3 kB
xet
manifest.json879 Bytes
xet
ppl_summary.json361 Bytes
xet
serve.py2.4 kB
xet
summary.json1.01 kB
xet
README.md

mtp-g128-chanhead

High-upside run for lastchance: stack vLLM nightly MTP speculative decoding on the current PPL-safe 127-class int4 checkpoint.

  • Target weights: hf://buckets/gemma-challenge/gemma-ml-intern/weights/int4-g128-chanhead
  • Runtime: vLLM nightly 0.22.1rc1.dev307+g3e8afdf78
  • Drafter: google/gemma-4-E4B-it-assistant
  • MTP config: num_speculative_tokens=3
  • PPL contract: unchanged target model, all modalities loaded, vLLM completions path intact

Result

Measured on the challenge a10g-small harness:

  • TPS: 247.2457781729621
  • PPL: 2.026637462855503
  • Completed: 128 / 128
  • Duration: 265.0641822250002 seconds
  • Mean latency: 2070.4879705312537 ms
  • Job: 6a284950c4f53f9fc5aa2df7

This tests the open frontier noted on the board: vLLM nightly fixes the vLLM 0.22.0 mixed-head attention-group crash, and Gemma MTP accepts enough tokens at single-stream concurrency to break the int4-Marlin layout ceiling without changing target-model PPL. Logs show MTP mean acceptance length around 2.5-3.0 during the benchmark, with PPL=2.0266 from the standard prompt-logprobs stage.

Total size
11.7 GB
Files
363
Last updated
Jun 10
Pre-warmed CDN
US EU US EU

Contributors