---
base_model:
- meta-llama/Llama-2-7b-hf
base_model_relation: quantized
license: llama2
---
# Model Card

- Base model: `meta-llama/Llama-2-7b-hf`
- Quantization method: Memory constrained MSQ with Q-Palette
- Target bit-width: 3
- Backend kernel: Q-Palette kernel
- Calibration data: RedPajama ([Hessian](https://huggingface.co/relaxml/Hessians-Llama-2-7b-6144))

# How to run
- Follow the instruction in https://github.com/snu-mllab/Q-Palette.

# References
- [Model Paper](https://arxiv.org/abs/2509.20214)