Request: Qwen3.6-35B-A3B quantization (2.5bpw)

•

Hi UnstableLlama,
I really appreciate your high-quality EXL3 conversions. If you have some spare compute, could you please consider converting the new "Qwen3.6-35B-A3B" model to the exl3 format at 2.5bpw?
This specific bitrate would be very helpful for those of us with limited VRAM. Thank you so much for your great work!

UnstableLlama

Owner May 14

For sure! Thanks for the tip, always happy to see which quants people actually want. I'll try to get that done tonight and uploaded tomorrow, when I do I will let you know here. If you want, you can join the exllama discord for more quant requests or general help.

https://discord.gg/AD2mVhZzf

UnstableLlama changed discussion status to closed May 14

UnstableLlama changed discussion status to open May 15

ghit72

May 15

Hi,
I'd like to ask if there's a lower boundary, where it's considred the modell is too "dumb" with delta PPL above +- 0.2, delta KLD ??, Same Top % below 93% or what ever?
If the task is all day long context coding and text and number understanding and such.

UnstableLlama

Owner May 15

Hey ghit, I don’t have hard numbers but Turboderp has said that 0.05 and below is considered “imperceptible” and he thinks anything 0.2 and below is good enough to use.

Also, uploading a 2.49bpw right now, will be done within a couple hours.

UnstableLlama

Owner May 15

•

edited May 15

This is a 2.46bpw quantized with the -hq argument for extra MoE precision, bringing the total up to 2.49 (plus head)

https://huggingface.co/UnstableLlama/Qwen3.6-35B-A3B-exl3-2.49bpw

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment