--- license: mit language: - zh - en pipeline_tag: text-generation base_model: - THUDM/GLM-Z1-9B-0414 library_name: transformers tags: - abliterated - uncensored --- # Melvin56/GLM-Z1-9B-0414-abliterated-GGUF Original Model : [huihui-ai/GLM-Z1-9B-0414-abliterated](https://huggingface.co/huihui-ai/GLM-Z1-9B-0414-abliterated) Llama.cpp build: 1d735c0b (5165) I used imatrix to create all these quants using this [Dataset](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8). With Llama.cpp(Ollama and LM Studio were not tested), you'll need to add these [specific commands](https://github.com/ggml-org/llama.cpp/issues/12946) : ``` --override-kv glm4.rope.dimension_count=int:64 \ --override-kv tokenizer.ggml.eos_token_id=int:151336 \ --chat-template chatglm4 ``` | | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute | | :------------ | :---------: | :------------: | :---: | :----: | :-----: | :---: | :------: | :----: | :------: | | K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢5 | ✅ 🐢5 | ❌ | | I-quants | ✅ 🐢4 | ✅ 🐢4 | ✅ 🐢4 | ✅ | ✅ | Partial¹ | ❌ | ❌ | ❌ | ``` ✅: feature works 🚫: feature does not work ❓: unknown, please contribute if you can test it youself 🐢: feature is slow ¹: IQ3_S and IQ1_S, see #5886 ²: Only with -ngl 0 ³: Inference is 50% slower ⁴: Slower than K-quants of comparable size ⁵: Slower than cuBLAS/rocBLAS on similar cards ⁶: Only q8_0 and iq4_nl ```