Imatrix GGUF

#1
by Austriani - opened

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

Owner

No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision).

No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision)

Okay, thanks.

Austriani changed discussion status to closed
Owner

No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision)

Okay, thanks.

Yeah sorry, I have a full plate already as they say, I can't add more workload on top of everything.

Owner

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.

Sorry for answering so late. I'm interested of course, but I actually got an idea to use IQK quantizations, these are special quants for ik_llama.cpp (llama.cpp fork). There is only 1 popular IQK creator - ubergaem, if you would do it as well, I would appreciate it a lot.

I would actually do my own imatrix/IQK quants if I had a GPU, but renting GPU in my country costs 4x than in country like USA, because of different salaries.

If you want to know what model I want you to quantize - your Qwen3.5-27B-Heretic-v2, or if you want, different Qwen3.5-27B-Heretic model.

You can search on the internet or ask any AI model how to make IQK quantization, because its a bit long process. Anyways, if you don't want to dig into it, I wouldn't be against basic imatrix quantization, its good as well. If you not doing IQK quants, then I would prefer IQ4_XS quantization.

Owner
β€’
edited Mar 9

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.

Sorry for answering so late. I'm interested of course, but I actually got an idea to use IQK quantizations, these are special quants for ik_llama.cpp (llama.cpp fork). There is only 1 popular IQK creator - ubergaem, if you would do it as well, I would appreciate it a lot.

I would actually do my own imatrix/IQK quants if I had a GPU, but renting GPU in my country costs 4x than in country like USA, because of different salaries.

If you want to know what model I want you to quantize - your Qwen3.5-27B-Heretic-v2, or if you want, different Qwen3.5-27B-Heretic model.

You can search on the internet or ask any AI model how to make IQK quantization, because its a bit long process. Anyways, if you don't want to dig into it, I wouldn't be against basic imatrix quantization, its good as well. If you not doing IQK quants, then I would prefer IQ4_XS quantization.

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

llmfan46 changed discussion status to open

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).

Anyways, thank you once again for making IQK quantization!

Owner
β€’
edited Mar 9

Anyways, thank you once again for making IQK quantization!

No problem, but keep in mind I am still building the tools needed for this and I have to fulfill other people's requests, so if you can wait a few days you will have IQK, in the meantime I hop you can make use of the imatrix Quantizations that I just posted.

Owner

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).

Anyways, thank you once again for making IQK quantization!

What IQK quant are you looking for? Just wondering because you did not mention it before.

Owner
β€’
edited Mar 10

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).

Anyways, thank you once again for making IQK quantization!

Since you didn't specify which quant type, I went ahead and made IQ4_K and IQ4_KSS. Hopefully one of these is what you were looking for, if not, let me know and I can make the one you need: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-IQK-GGUF

If you like my work and find the releases helpful, consider subscribing to my Patreon (https://patreon.com/LLMfan46) or sending me a tip on Ko-Fi (https://ko-fi.com/llmfan46), your support helps cover compute costs and motivates more releases!

llmfan46 changed discussion status to closed

Sign up or log in to comment