Quantized Coding Assistant (GGUF)

This repository provides a GGUF quantized version of a Qwen2-based coding assistant model for local inference. It is intended to support code-focused questions in a repository-grounded setting.

Model Details

  • Base model: Qwen/Qwen2.5-Coder-7B-Instruct
  • Format: GGUF
  • Architecture: Qwen2
  • Model size: 8B parameters
  • Quantization: 4-bit Q4_K_M
  • File size: 4.68 GB

Notes

This repository contains a quantized GGUF model for inference. The corresponding LoRA adapter repository contains the adapter weights and configuration used during fine-tuning. The adapter was built on top of Qwen/Qwen2.5-Coder-7B-Instruct with LoRA rank 16, alpha 16, dropout 0.05, targeting q_proj, k_proj, v_proj, and o_proj.

Downloads last month
31
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for catalin-pangaleanu/qwen25coder-7b-quantized-gguf

Base model

Qwen/Qwen2.5-7B
Quantized
(190)
this model