How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF:
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF:"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

EdgeRazor Logo

EdgeRazor for Lightweight LLMs

arXiv EdgeRazor GitHub EdgeRazor PyPI EdgeRazor

Contents

Model Overview

Model Bit-Widths

Mixed-Precision Recipe Bit-Width This Repo GGUF Type
100% 4-bit + 0% 1.58-bit 4 โœ”๏ธ Q4_0
50% 4-bit + 50% 1.58-bit 2.79 โœ–๏ธ Not supported
12.5% 4-bit + 87.5% 1.58-bit 1.88 โœ–๏ธ Not supported
0% 4-bit + 100% 1.58-bit 1.58 โœ”๏ธ TQ1_0, TQ2_0

Get Started

Use llama.cpp to conduct efficient inference on edge devices.

Check the cli.sh script for basic usage.

Model list:

Citation

If you find our project useful in your research, please consider kindly citing our papers โœ๏ธ:

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
  journal={arXiv preprint arXiv:2605.04062}
}
Downloads last month
1,010
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF

Finetuned
Qwen/Qwen3-1.7B
Quantized
(276)
this model

Spaces using zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF 2

Collection including zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF

Paper for zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF