renhehuang
/

qwen3-1.7b-coffee-sft-quanto-int4

Text Generation

traditional-chinese

task-oriented-dialogue

8-bit precision

Model card Files Files and versions

qwen3-1.7b-coffee-sft-quanto-int4 / README.md

renhehuang's picture

Update model card with quantization details

f7e2376 verified 3 months ago

|

History Blame Contribute Delete

3.04 kB

	---
	license: apache-2.0
	language:
	- zh
	base_model: renhehuang/qwen3-1.7b-coffee-sft
	tags:
	- conversational
	- sft
	- coffee
	- traditional-chinese
	- qwen3
	- task-oriented-dialogue
	- quantized
	- int4
	- quanto
	datasets:
	- renhehuang/coffee-order-zhtw
	pipeline_tag: text-generation
	---

	# Qwen3-1.7B Coffee Order Assistant — INT4 量化版

	此為 [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) 的 INT4 量化版本，使用 [optimum-quanto](https://github.com/huggingface/optimum-quanto) 量化。

	\| \| 原始模型 \| 本量化模型 \|
	\|---\|---\|---\|
	\| 精度 \| FP32 \| INT4 \|
	\| 大小 \| ~6.45 GB \| ~1.45 GB \|
	\| 壓縮比 \| — \| 4.5x \|

	適合部署至 Jetson Nano、Raspberry Pi 等低記憶體邊緣裝置。

	## 使用方式

	```python
	from optimum.quanto import QuantizedModelForCausalLM
	from transformers import AutoTokenizer
	import torch

	model_name = "renhehuang/qwen3-1.7b-coffee-sft-quanto-int4"
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = QuantizedModelForCausalLM.from_pretrained(model_name)

	messages = [
	{"role": "system", "content": "你是一位專業的咖啡點餐助理，負責協助使用者完成點餐。菜單包含：美式、拿鐵、燕麥奶拿鐵、鮮奶。"},
	{"role": "user", "content": "我想要一杯冰拿鐵"}
	]

	input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=128,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	)

	response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
	print(response)
	```

	## 量化資訊

	\| 項目 \| 值 \|
	\|------\|-----\|
	\| 量化工具 \| [optimum-quanto](https://github.com/huggingface/optimum-quanto) \|
	\| 量化精度 \| INT4 (qint4) \|
	\| 量化範圍 \| weights only \|
	\| 原始模型 \| [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) \|
	\| 基礎模型 \| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) \|

	## 支援的菜單

	\| 飲品 \| 溫度選項 \| 加購選項 \|
	\|------\|----------\|----------\|
	\| 美式 \| 冰/熱 \| 加一份濃縮 \|
	\| 拿鐵 \| 冰/熱 \| 加一份濃縮 \|
	\| 燕麥奶拿鐵 \| 冰/熱 \| 加一份濃縮 \|
	\| 鮮奶 \| 冰/熱 \| 加一份濃縮 \|

	## 限制與注意事項

	- 此模型僅針對咖啡點餐場景訓練，不適用於一般對話
	- 菜單項目固定，無法處理菜單外的飲品
	- INT4 量化可能造成些微品質下降，但在點餐場景中影響不大

	## 授權

	本模型基於 Apache 2.0 授權發布。

	## 引用

	```bibtex
	@misc{qwen3-coffee-sft-quanto-int4,
	author = {Ren-He Huang},
	title = {Qwen3-1.7B Coffee Order Assistant (INT4 Quantized)},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft-quanto-int4}
	}
	```