MTP adding same mistakes ?

#37
by Dumbledore2 - opened

i notice that when using mtp , it seems to make more mistakes with complex prompts

@Dumbledore2 Good thing to flag β€” but MTP shouldn't be doing that. Speculative decoding is lossless by design: the draft model only proposes tokens; the main model then verifies them and accepts only the ones it would have produced itself, replacing any it wouldn't. So in a correct setup, MTP changes speed, never quality.

Two things most likely explain what you're seeing:

  1. Sampling variance. At temp 1.0 every run is random, and MTP on vs off consume the RNG differently β€” so you get different answers each time, not worse ones, exactly like re-rolling without MTP. Proper test: set temp 0 (greedy) with a fixed prompt and compare MTP on vs off β€” the outputs should be token-for-token identical. If they are, MTP isn't adding mistakes; you were comparing two different random draws. Complex prompts are longer, so that run-to-run variance just shows up more.

  2. A buggy build. If the greedy outputs do dld bug, not MTP. Some recent builds have aregression in the Gemma 4 MTP draft path; the known-good one is b9553 β€” drop to that and the divergence goes away.

So MTP isn't degrading the model β€” you're either catching normal sampling variance or a bad build.

Dumbledore2 changed discussion status to closed
Dumbledore2 changed discussion status to open
Dumbledore2 changed discussion status to closed

Sign up or log in to comment