How to use from
Ollama
ollama run hf.co/zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF:
Quick Links

EdgeRazor Logo

EdgeRazor for Lightweight LLMs

arXiv EdgeRazor GitHub EdgeRazor PyPI EdgeRazor

Contents

Model Overview

Model Bit-Widths

Mixed-Precision Recipe Bit-Width This Repo GGUF Type
100% 4-bit + 0% 1.58-bit 4 โœ”๏ธ Q4_0
50% 4-bit + 50% 1.58-bit 2.79 โœ–๏ธ Not supported
12.5% 4-bit + 87.5% 1.58-bit 1.88 โœ–๏ธ Not supported
0% 4-bit + 100% 1.58-bit 1.58 โœ”๏ธ TQ1_0, TQ2_0

Get Started

Use llama.cpp to conduct efficient inference on edge devices.

Check the cli.sh script for basic usage.

Model list:

Citation

If you find our project useful in your research, please consider kindly citing our papers โœ๏ธ:

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
  journal={arXiv preprint arXiv:2605.04062}
}
Downloads last month
1,010
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF

Finetuned
Qwen/Qwen3-1.7B
Quantized
(276)
this model

Spaces using zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF 2

Collection including zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF

Paper for zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF