Will this work on spicyneuron/Qwen3.5-397B-A17B-MLX-2.6bit ?

#32

by openSourcerer9000 - opened 15 days ago

I have long context issues with this model, especially in interactive chat, where things get long enough it will literally re-answer a previous message in the context history, ignoring the latest message.

Are there any differences to account for in the larger model templates?

Thanks

szwedek

14 days ago

Have you set preserve_thinking?

openSourcerer9000

14 days ago

We have AI now, why do we need to preserve thinking?

openSourcerer9000

14 days ago

I'm trying the template but it seems to break randomly.

Is that a param I can set from lm studio? Or a modification needed for the template?

froggeric

Owner 12 days ago

Yes, it works fine since it’s the same base architecture. The long context repetition is typically an issue with the context window size configuration or KV-caching in MLX or LM Studio rather than the template itself. Double-check that your context limits match the model, and make sure preserve_thinking is enabled to keep the reasoning chain intact.

openSourcerer9000

12 days ago

Like I mentioned I tried it in and it breaks randomly, I had to switch back to the original template., has anyone tried it on this specific model? For any 2 bit 397?

Preserve thinking doesn't seem to be an option for this model either.

froggeric

Owner 12 days ago

Honestly, you can't expect much from a brain deranged 2bit quantisation.

openSourcerer9000

12 days ago

Lol, these benchmarks say otherwise
https://kaitchup.substack.com/p/lessons-from-gguf-evaluations-ternary

That may be another Factor though.

I was hoping it was a template issue and this would address it, I don't see my problem in the readme though. It may just be an issue with thia specific quant, I'll probably need to just download another model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment