--- library_name: gguf license: apache-2.0 base_model: huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2 quantized_by: koorbmeh --- # Qwen2.5-7B-Instruct-Abliterated-v2 Q4_K_M This is a Q4_K_M quantized version of the [huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2) model. ## Model Details - **Base Model**: huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2 - **Quantization**: Q4_K_M (4-bit quantization with K-quantization) - **File Size**: ~4.36 GB - **VRAM Usage**: ~4.5 GB - **Format**: GGUF ## Usage ### With Ollama ```bash # Download the GGUF file # Then create a Modelfile: cat > Modelfile << EOF FROM ./qwen2.5-abliterated-v2-q4_K_M.gguf TEMPLATE "{{{{ if .System }}}}<|im_start|>system {{{{ .System }}}}<|im_end|> {{{{ end }}}}{{{{ if .Prompt }}}}<|im_start|>user {{{{ .Prompt }}}}<|im_end|> {{{{ end }}}}<|im_start|>assistant " PARAMETER stop "<|im_start|>" PARAMETER stop "<|im_end|>" PARAMETER stop "<|endoftext|>" EOF # Import into Ollama ollama create qwen2.5-abliterated-v2-q4 -f Modelfile ``` ### With llama.cpp ```bash # Download the GGUF file and use with llama.cpp ./llama-cli -m qwen2.5-abliterated-v2-q4_K_M.gguf -p "Your prompt here" ``` ## Quantization Details - **Original Size**: 14.19 GB (FP16) - **Quantized Size**: 4.36 GB (Q4_K_M) - **Reduction**: ~69.3% - **Quantization Method**: llama.cpp Q4_K_M ## Performance This quantized model maintains high quality while using significantly less VRAM: - Full precision: ~8 GB VRAM - Q4_K_M: ~4.5 GB VRAM ## Credits - Base model: [huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2) - Quantization tool: [llama.cpp](https://github.com/ggerganov/llama.cpp)