Smaller quants?

#1
by dnhkng - opened

Can this work with smaller quants, like IQ3_S?

Yeah, would be great

Unsloth AI org

Will check and confirm if it can work!

I wouldn't mind a simple Q4_0 as well πŸ™πŸ»

Thanks Daniel! I gather the diffusion architecture is particularly sensitive to integer quantisation. Perhaps a nvfp4 gguf is finally in order?

I would be nice too get models that we would fit in a 16GB card. like the original 26B-A4B you made. I would Hope we could have IQ4_XS that fits with a okay contex, but looking at the size I would imagine that we can only hold IQ3_XXS maybe at 16GB unless some more magic is done.

This comment has been hidden (marked as Off-Topic)
This comment has been hidden (marked as Resolved)

Oh crap ignore my last comment wrong repo

Sign up or log in to comment