renhehuang
/

qwen3-1.7b-coffee-sft-quanto-int8

@@ -1,9 +1,84 @@
 ---
 tags:
-- model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 ---
+license: apache-2.0
+language:
+  - zh
+base_model: renhehuang/qwen3-1.7b-coffee-sft
 tags:
+  - conversational
+  - sft
+  - coffee
+  - traditional-chinese
+  - qwen3
+  - task-oriented-dialogue
+  - quantized
+  - int8
+  - quanto
+datasets:
+  - renhehuang/coffee-order-zhtw
+pipeline_tag: text-generation
 ---
+# Qwen3-1.7B Coffee Order Assistant — INT8 量化版
+此為 [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) 的 **INT8 量化版本**，使用 [optimum-quanto](https://github.com/huggingface/optimum-quanto) 量化。
+| | 原始模型 | INT8 量化 | INT4 量化 |
+|---|---|---|---|
+| 精度 | FP32 | **INT8** | INT4 |
+| 大小 | ~6.45 GB | **~1.91 GB** | ~1.29 GB |
+| 品質 | 基準 | **幾乎無損** | 略有下降 |
+> `embed_tokens` 和 `lm_head` 保持 FP16 精度，避免量化過度造成輸出品質下降。
+適合部署至 **Jetson Nano**、Raspberry Pi 等邊緣裝置。
+## 使用方式
+```python
+from optimum.quanto import QuantizedModelForCausalLM
+from transformers import AutoTokenizer
+import torch
+model_name = "renhehuang/qwen3-1.7b-coffee-sft-quanto-int8"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = QuantizedModelForCausalLM.from_pretrained(model_name)
+messages = [
+    {"role": "system", "content": "你是一位專業的咖啡點餐助理，負責協助使用者完成點餐。菜單包含：美式、拿鐵、燕麥奶拿鐵、鮮奶。"},
+    {"role": "user", "content": "我想要一杯冰拿鐵"}
+]
+input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=128,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.9,
+    )
+response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
+print(response)
+```
+## 量化資訊
+| 項目 | 值 |
+|------|-----|
+| 量化工具 | [optimum-quanto](https://github.com/huggingface/optimum-quanto) |
+| 量化精度 | INT8 (qint8) |
+| 量化範圍 | weights only（排除 embed_tokens、lm_head） |
+| 原始模型 | [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) |
+| 基礎模型 | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |
+## 其他版本
+- 原始模型：[renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft)
+- INT4 量化：[renhehuang/qwen3-1.7b-coffee-sft-quanto-int4](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft-quanto-int4)
+## 授權
+本模型基於 Apache 2.0 授權發布。