---
license: mit
language:
- zh
- en
pipeline_tag: text-generation
base_model:
- THUDM/GLM-Z1-9B-0414
library_name: transformers
tags:
- abliterated
- uncensored
---
# Melvin56/GLM-Z1-9B-0414-abliterated-GGUF

Original Model : [huihui-ai/GLM-Z1-9B-0414-abliterated](https://huggingface.co/huihui-ai/GLM-Z1-9B-0414-abliterated)

Llama.cpp build: 1d735c0b (5165)

I used imatrix to create all these quants using this [Dataset](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).

With Llama.cpp(Ollama and LM Studio were not tested), you'll need to add these [specific commands](https://github.com/ggml-org/llama.cpp/issues/12946) :

```
--override-kv glm4.rope.dimension_count=int:64 \
--override-kv tokenizer.ggml.eos_token_id=int:151336 \ 
--chat-template chatglm4
```

|               | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute |
| :------------ | :---------: | :------------: | :---: | :----: | :-----: | :---: | :------: | :----: | :------: |
| K-quants      |      ✅     |       ✅      |   ✅  |   ✅   |    ✅   |  ✅  |   ✅ 🐢5  |  ✅ 🐢5 |    ❌    |
| I-quants      |    ✅ 🐢4   |     ✅ 🐢4    |  ✅ 🐢4 |   ✅   |    ✅   | Partial¹ |    ❌    |   ❌  |    ❌    |
```
✅: feature works
🚫: feature does not work
❓: unknown, please contribute if you can test it youself
🐢: feature is slow
¹: IQ3_S and IQ1_S, see #5886
²: Only with -ngl 0
³: Inference is 50% slower
⁴: Slower than K-quants of comparable size
⁵: Slower than cuBLAS/rocBLAS on similar cards
⁶: Only q8_0 and iq4_nl
```