renhehuang commited on
Commit
f7e2376
·
verified ·
1 Parent(s): a19dada

Update model card with quantization details

Browse files
Files changed (1) hide show
  1. README.md +100 -5
README.md CHANGED
@@ -1,9 +1,104 @@
1
  ---
 
 
 
 
2
  tags:
3
- - model_hub_mixin
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
7
- - Code: [More Information Needed]
8
- - Paper: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ base_model: renhehuang/qwen3-1.7b-coffee-sft
6
  tags:
7
+ - conversational
8
+ - sft
9
+ - coffee
10
+ - traditional-chinese
11
+ - qwen3
12
+ - task-oriented-dialogue
13
+ - quantized
14
+ - int4
15
+ - quanto
16
+ datasets:
17
+ - renhehuang/coffee-order-zhtw
18
+ pipeline_tag: text-generation
19
  ---
20
 
21
+ # Qwen3-1.7B Coffee Order Assistant INT4 量化版
22
+
23
+ 此為 [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) **INT4 量化版本**,使用 [optimum-quanto](https://github.com/huggingface/optimum-quanto) 量化。
24
+
25
+ | | 原始模型 | 本量化模型 |
26
+ |---|---|---|
27
+ | 精度 | FP32 | INT4 |
28
+ | 大小 | ~6.45 GB | **~1.45 GB** |
29
+ | 壓縮比 | — | 4.5x |
30
+
31
+ 適合部署至 **Jetson Nano**、Raspberry Pi 等低記憶體邊緣裝置。
32
+
33
+ ## 使用方式
34
+
35
+ ```python
36
+ from optimum.quanto import QuantizedModelForCausalLM
37
+ from transformers import AutoTokenizer
38
+ import torch
39
+
40
+ model_name = "renhehuang/qwen3-1.7b-coffee-sft-quanto-int4"
41
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
42
+ model = QuantizedModelForCausalLM.from_pretrained(model_name)
43
+
44
+ messages = [
45
+ {"role": "system", "content": "你是一位專業的咖啡點餐助理,負責協助使用者完成點餐。菜單包含:美式、拿鐵、燕麥奶拿鐵、鮮奶。"},
46
+ {"role": "user", "content": "我想要一杯冰拿鐵"}
47
+ ]
48
+
49
+ input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
50
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
51
+
52
+ with torch.no_grad():
53
+ outputs = model.generate(
54
+ **inputs,
55
+ max_new_tokens=128,
56
+ do_sample=True,
57
+ temperature=0.7,
58
+ top_p=0.9,
59
+ )
60
+
61
+ response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
62
+ print(response)
63
+ ```
64
+
65
+ ## 量化資訊
66
+
67
+ | 項目 | 值 |
68
+ |------|-----|
69
+ | 量化工具 | [optimum-quanto](https://github.com/huggingface/optimum-quanto) |
70
+ | 量化精度 | INT4 (qint4) |
71
+ | 量化範圍 | weights only |
72
+ | 原始模型 | [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) |
73
+ | 基礎模型 | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |
74
+
75
+ ## 支援的菜單
76
+
77
+ | 飲品 | 溫度選項 | 加購選項 |
78
+ |------|----------|----------|
79
+ | 美式 | 冰/熱 | 加一份濃縮 |
80
+ | 拿鐵 | 冰/熱 | 加一份濃縮 |
81
+ | 燕麥奶拿鐵 | 冰/熱 | 加一份濃縮 |
82
+ | 鮮奶 | 冰/熱 | 加一份濃縮 |
83
+
84
+ ## 限制與注意事項
85
+
86
+ - 此模型僅針對咖啡點餐場景訓練,不適用於一般對話
87
+ - 菜單項目固定,無法處理菜單外的飲品
88
+ - INT4 量化可能造成些微品質下降,但在點餐場景中影響不大
89
+
90
+ ## 授權
91
+
92
+ 本模型基於 Apache 2.0 授權發布。
93
+
94
+ ## 引用
95
+
96
+ ```bibtex
97
+ @misc{qwen3-coffee-sft-quanto-int4,
98
+ author = {Ren-He Huang},
99
+ title = {Qwen3-1.7B Coffee Order Assistant (INT4 Quantized)},
100
+ year = {2025},
101
+ publisher = {HuggingFace},
102
+ url = {https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft-quanto-int4}
103
+ }
104
+ ```