[Request] GGUF version with MTP (Multi-Token Prediction) support

#10
by rggaini - opened

llama.cpp

now officially supports MTP

描述 (Description):

Hi author, thanks for the great model!

Since

(Multi-Token Prediction), could you please upload a GGUF version with MTP enabled?

Enabling MTP would significantly boost the inference speed (tokens/s) via speculative decoding. It would be very helpful for local deployment.

Thanks!

Sign up or log in to comment