STiFLeR7
/

Phi2-GPTQ

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions

🧠 Phi-2 GPTQ (Quantized)

This repository provides a 4-bit GPTQ quantized version of the Phi-2 model by Microsoft, optimized for efficient inference using gptqmodel.

📌 Model Details

Base Model: Microsoft Phi-2
Quantization: GPTQ (4-bit)
Quantizer: GPTQModel
Framework: PyTorch + HuggingFace Transformers
Device Support: CUDA (GPU)
License: Apache 2.0

🚀 Features

✅ Lightweight: 4-bit quantization significantly reduces memory usage
✅ Fast Inference: Ideal for deployment on consumer GPUs
✅ Compatible: Works with transformers, optimum, and gptqmodel
✅ CUDA-accelerated: Automatically uses GPU for speed

📚 Usage

This model is ready-to-use with the Hugging Face transformers library.

🧪 Intended Use

Research and development
Prototyping generative applications
Fast inference environments with limited GPU memory

📖 References

Microsoft Phi-2: https://huggingface.co/microsoft/phi-2
GPTQModel: https://github.com/ModelCoud/GPTQModel
Transformers: https://github.com/huggingface/transformers

⚖️ License

This model is distributed under the Apache License 2.0.

Downloads last month: 6

Safetensors

Model size

3B params

Tensor type

I32

·

F16

·

Model tree for STiFLeR7/Phi2-GPTQ

Base model

microsoft/phi-2

Quantized

(58)

this model