GGUF
conversational
How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="koorbmeh/qwen2.5-7b-instruct-abliterated-v2-q4_K_M",
	filename="qwen2.5-abliterated-v2-q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Qwen2.5-7B-Instruct-Abliterated-v2 Q4_K_M

This is a Q4_K_M quantized version of the huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2 model.

Model Details

  • Base Model: huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2
  • Quantization: Q4_K_M (4-bit quantization with K-quantization)
  • File Size: ~4.36 GB
  • VRAM Usage: ~4.5 GB
  • Format: GGUF

Usage

With Ollama

# Download the GGUF file
# Then create a Modelfile:
cat > Modelfile << EOF
FROM ./qwen2.5-abliterated-v2-q4_K_M.gguf
TEMPLATE "{{{{ if .System }}}}<|im_start|>system
{{{{ .System }}}}<|im_end|>
{{{{ end }}}}{{{{ if .Prompt }}}}<|im_start|>user
{{{{ .Prompt }}}}<|im_end|>
{{{{ end }}}}<|im_start|>assistant
"
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"
EOF

# Import into Ollama
ollama create qwen2.5-abliterated-v2-q4 -f Modelfile

With llama.cpp

# Download the GGUF file and use with llama.cpp
./llama-cli -m qwen2.5-abliterated-v2-q4_K_M.gguf -p "Your prompt here"

Quantization Details

  • Original Size: 14.19 GB (FP16)
  • Quantized Size: 4.36 GB (Q4_K_M)
  • Reduction: ~69.3%
  • Quantization Method: llama.cpp Q4_K_M

Performance

This quantized model maintains high quality while using significantly less VRAM:

  • Full precision: ~8 GB VRAM
  • Q4_K_M: ~4.5 GB VRAM

Credits

Downloads last month
43
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for koorbmeh/qwen2.5-7b-instruct-abliterated-v2-q4_K_M

Base model

Qwen/Qwen2.5-7B
Quantized
(13)
this model