Llama server now support this?

#2
by ZombieWormHole - opened

"It is not currently possible to inference Gemma 4 26B A4B NVFP4 using official llama.cpp" has this changed since llama.cpp merged?

Thanks for sharing the model!

Yes, llama.cpp now supports this quant. I've just tested it on server-cuda13 version. You can no longer rely on my custom docker image.
Updated the model description.

Thanks Sasha!

ZombieWormHole changed discussion status to closed

Sign up or log in to comment