---
library_name: gguf
license: apache-2.0
base_model: huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2
quantized_by: koorbmeh
---

# Qwen2.5-7B-Instruct-Abliterated-v2 Q4_K_M

This is a Q4_K_M quantized version of the [huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2) model.

## Model Details

- **Base Model**: huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2
- **Quantization**: Q4_K_M (4-bit quantization with K-quantization)
- **File Size**: ~4.36 GB
- **VRAM Usage**: ~4.5 GB
- **Format**: GGUF

## Usage

### With Ollama

```bash
# Download the GGUF file
# Then create a Modelfile:
cat > Modelfile << EOF
FROM ./qwen2.5-abliterated-v2-q4_K_M.gguf
TEMPLATE "{{{{ if .System }}}}<|im_start|>system
{{{{ .System }}}}<|im_end|>
{{{{ end }}}}{{{{ if .Prompt }}}}<|im_start|>user
{{{{ .Prompt }}}}<|im_end|>
{{{{ end }}}}<|im_start|>assistant
"
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"
EOF

# Import into Ollama
ollama create qwen2.5-abliterated-v2-q4 -f Modelfile
```

### With llama.cpp

```bash
# Download the GGUF file and use with llama.cpp
./llama-cli -m qwen2.5-abliterated-v2-q4_K_M.gguf -p "Your prompt here"
```

## Quantization Details

- **Original Size**: 14.19 GB (FP16)
- **Quantized Size**: 4.36 GB (Q4_K_M)
- **Reduction**: ~69.3%
- **Quantization Method**: llama.cpp Q4_K_M

## Performance

This quantized model maintains high quality while using significantly less VRAM:
- Full precision: ~8 GB VRAM
- Q4_K_M: ~4.5 GB VRAM

## Credits

- Base model: [huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2)
- Quantization tool: [llama.cpp](https://github.com/ggerganov/llama.cpp)