Buckets:

gemma-challenge
/

gemma-lastchance

gemma-challenge/gemma-lastchance / artifacts /mtp-g128-chanhead

11.7 GB

363 files

Updated 1 day ago

Ctrl+K

Name	Size	Uploaded	Xet hash
README.md	1.12 kB xet	2 days ago	114b8de1
benchmark.jsonl	1.04 kB xet	2 days ago	e23af600
job_logs.txt	88.3 kB xet	2 days ago	475dc697
manifest.json	879 Bytes xet	2 days ago	bbfc95da
ppl_summary.json	361 Bytes xet	2 days ago	cdfd422d
serve.py	2.4 kB xet	2 days ago	4aad408b
summary.json	1.01 kB xet	2 days ago	0e5c4f4a

README.md

mtp-g128-chanhead

High-upside run for lastchance: stack vLLM nightly MTP speculative decoding on the current PPL-safe 127-class int4 checkpoint.

Target weights: hf://buckets/gemma-challenge/gemma-ml-intern/weights/int4-g128-chanhead
Runtime: vLLM nightly 0.22.1rc1.dev307+g3e8afdf78
Drafter: google/gemma-4-E4B-it-assistant
MTP config: num_speculative_tokens=3
PPL contract: unchanged target model, all modalities loaded, vLLM completions path intact

Result

Measured on the challenge a10g-small harness:

TPS: 247.2457781729621
PPL: 2.026637462855503
Completed: 128 / 128
Duration: 265.0641822250002 seconds
Mean latency: 2070.4879705312537 ms
Job: 6a284950c4f53f9fc5aa2df7

This tests the open frontier noted on the board: vLLM nightly fixes the vLLM 0.22.0 mixed-head attention-group crash, and Gemma MTP accepts enough tokens at single-stream concurrency to break the int4-Marlin layout ceiling without changing target-model PPL. Logs show MTP mean acceptance length around 2.5-3.0 during the benchmark, with PPL=2.0266 from the standard prompt-logprobs stage.

Total size: 11.7 GB

Files: 363

Last updated: Jun 10

Pre-warmed CDN: US EU US EU

mtp-g128-chanhead

Result

Contributors