--- base_model: Qwen/Qwen3-1.7B pipeline_tag: text-generation tags: - qwen3 - edgerazor - quantization license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE ---
## Contents - [Contents](#contents) - [Model Overview](#model-overview) - [Model Bit-Widths](#model-bit-widths) - [Get Started](#get-started) - [Citation](#citation) ## Model Overview - Base Model: [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) - Training: [zhangsq-nju/EdgeRazor](https://github.com/zhangsq-nju/EdgeRazor) - Inference: [ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp) ## Model Bit-Widths | Mixed-Precision Recipe | Bit-Width | This Repo | GGUF Type | | ---------------------------- | --------- | --------- | ------------- | | 100% 4-bit + 0% 1.58-bit | 4 | ✔️ | Q4_0 | | 50% 4-bit + 50% 1.58-bit | 2.79 | ✖️ | Not supported | | 12.5% 4-bit + 87.5% 1.58-bit | 1.88 | ✖️ | Not supported | | 0% 4-bit + 100% 1.58-bit | 1.58 | ✔️ | TQ1_0, TQ2_0 | ## Get Started Use llama.cpp to conduct efficient inference on edge devices. Check the [cli.sh](./cli.sh) script for basic usage. Model list: - `Qwen3-1.7B-BF16.gguf`: BF16 model from the original Qwen3-1.7B - `Qwen3-1.7B-EdgeRazor-Q4_0.gguf`: Q4_0 model from the [Qwen3-1.7B-EdgeRazor-4bit](https://huggingface.co/zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit) - `Qwen3-1.7B-EdgeRazor-TQ1_0.gguf`: TQ1_0 model from [Qwen3-1.7B-EdgeRazor-1.58bit](https://huggingface.co/zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit) - `Qwen3-1.7B-EdgeRazor-TQ2_0.gguf`: TQ2_0 model from [Qwen3-1.7B-EdgeRazor-1.58bit](https://huggingface.co/zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit) ## Citation If you find our project useful in your research, please consider kindly citing our papers ✏️: ``` @article{zhangsh-edgerazor, title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation}, author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang}, year={2026}, journal={arXiv preprint arXiv:2605.04062} } ```