Will this work on spicyneuron/Qwen3.5-397B-A17B-MLX-2.6bit ?

#32
by openSourcerer9000 - opened

I have long context issues with this model, especially in interactive chat, where things get long enough it will literally re-answer a previous message in the context history, ignoring the latest message.

Are there any differences to account for in the larger model templates?

Thanks

Have you set preserve_thinking?

We have AI now, why do we need to preserve thinking?

I'm trying the template but it seems to break randomly.

Is that a param I can set from lm studio? Or a modification needed for the template?

Yes, it works fine since it’s the same base architecture. The long context repetition is typically an issue with the context window size configuration or KV-caching in MLX or LM Studio rather than the template itself. Double-check that your context limits match the model, and make sure preserve_thinking is enabled to keep the reasoning chain intact.

Like I mentioned I tried it in and it breaks randomly, I had to switch back to the original template., has anyone tried it on this specific model? For any 2 bit 397?

Preserve thinking doesn't seem to be an option for this model either.

Honestly, you can't expect much from a brain deranged 2bit quantisation.

Lol, these benchmarks say otherwise
https://kaitchup.substack.com/p/lessons-from-gguf-evaluations-ternary

That may be another Factor though.

I was hoping it was a template issue and this would address it, I don't see my problem in the readme though. It may just be an issue with thia specific quant, I'll probably need to just download another model.

Sign up or log in to comment