Request: Qwen3.6-35B-A3B quantization (2.5bpw)

#1
by tranvantruong801 - opened

Hi UnstableLlama,
I really appreciate your high-quality EXL3 conversions. If you have some spare compute, could you please consider converting the new "Qwen3.6-35B-A3B" model to the exl3 format at 2.5bpw?
This specific bitrate would be very helpful for those of us with limited VRAM. Thank you so much for your great work!

For sure! Thanks for the tip, always happy to see which quants people actually want. I'll try to get that done tonight and uploaded tomorrow, when I do I will let you know here. If you want, you can join the exllama discord for more quant requests or general help.

https://discord.gg/AD2mVhZzf

UnstableLlama changed discussion status to closed
UnstableLlama changed discussion status to open

Hi,
I'd like to ask if there's a lower boundary, where it's considred the modell is too "dumb" with delta PPL above +- 0.2, delta KLD ??, Same Top % below 93% or what ever?
If the task is all day long context coding and text and number understanding and such.

Hey ghit, I don’t have hard numbers but Turboderp has said that 0.05 and below is considered “imperceptible” and he thinks anything 0.2 and below is good enough to use.

Also, uploading a 2.49bpw right now, will be done within a couple hours.

This is a 2.46bpw quantized with the -hq argument for extra MoE precision, bringing the total up to 2.49 (plus head)

https://huggingface.co/UnstableLlama/Qwen3.6-35B-A3B-exl3-2.49bpw

Sign up or log in to comment