[Request] GGUF version with MTP (Multi-Token Prediction) support

#26
by rggaini - opened

llama.cpp

now officially supports MTP

(Description):

Hi author, thanks for the great model!

Since

(Multi-Token Prediction), could you please upload a GGUF version with MTP enabled?

Enabling MTP would significantly boost the inference speed (tokens/s) via speculative decoding. It would be very helpful for local deployment.

Thanks!

Hi there,

Thank you for sharing this great model with the community!

I'm writing to kindly request if you could upload a GGUF version with MTP (Multi-Token Prediction) enabled. Since llama.cpp has officially added support for MTP, leveraging this feature would allow for speculative decoding, which can significantly improve inference speed (tokens/sec) for local deployment.

If the model architecture supports it, enabling MTP during the GGUF conversion would be a huge help for users running models on edge devices.

Thanks again for your hard work and consideration!

Best regards.

Sign up or log in to comment