renhehuang commited on
Commit
85d8de3
·
verified ·
1 Parent(s): be54436

Update model card

Browse files
Files changed (1) hide show
  1. README.md +80 -5
README.md CHANGED
@@ -1,9 +1,84 @@
1
  ---
 
 
 
 
2
  tags:
3
- - model_hub_mixin
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
7
- - Code: [More Information Needed]
8
- - Paper: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ base_model: renhehuang/qwen3-1.7b-coffee-sft
6
  tags:
7
+ - conversational
8
+ - sft
9
+ - coffee
10
+ - traditional-chinese
11
+ - qwen3
12
+ - task-oriented-dialogue
13
+ - quantized
14
+ - int8
15
+ - quanto
16
+ datasets:
17
+ - renhehuang/coffee-order-zhtw
18
+ pipeline_tag: text-generation
19
  ---
20
 
21
+ # Qwen3-1.7B Coffee Order Assistant INT8 量化版
22
+
23
+ 此為 [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) **INT8 量化版本**,使用 [optimum-quanto](https://github.com/huggingface/optimum-quanto) 量化。
24
+
25
+ | | 原始模型 | INT8 量化 | INT4 量化 |
26
+ |---|---|---|---|
27
+ | 精度 | FP32 | **INT8** | INT4 |
28
+ | 大小 | ~6.45 GB | **~1.91 GB** | ~1.29 GB |
29
+ | 品質 | 基準 | **幾乎無損** | 略有下降 |
30
+
31
+ > `embed_tokens` 和 `lm_head` 保持 FP16 精度,避免量化過度造成輸出品質下降。
32
+
33
+ 適合部署至 **Jetson Nano**、Raspberry Pi 等邊緣裝置。
34
+
35
+ ## 使用方式
36
+
37
+ ```python
38
+ from optimum.quanto import QuantizedModelForCausalLM
39
+ from transformers import AutoTokenizer
40
+ import torch
41
+
42
+ model_name = "renhehuang/qwen3-1.7b-coffee-sft-quanto-int8"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
44
+ model = QuantizedModelForCausalLM.from_pretrained(model_name)
45
+
46
+ messages = [
47
+ {"role": "system", "content": "你是一位專業的咖啡點餐助理,負責協助使用者完成點餐。菜單包含:美式、拿鐵、燕麥奶拿鐵、鮮奶。"},
48
+ {"role": "user", "content": "我想要一杯冰拿鐵"}
49
+ ]
50
+
51
+ input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
52
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
53
+
54
+ with torch.no_grad():
55
+ outputs = model.generate(
56
+ **inputs,
57
+ max_new_tokens=128,
58
+ do_sample=True,
59
+ temperature=0.7,
60
+ top_p=0.9,
61
+ )
62
+
63
+ response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
64
+ print(response)
65
+ ```
66
+
67
+ ## 量化資訊
68
+
69
+ | 項目 | 值 |
70
+ |------|-----|
71
+ | 量化工具 | [optimum-quanto](https://github.com/huggingface/optimum-quanto) |
72
+ | 量化精度 | INT8 (qint8) |
73
+ | 量化範圍 | weights only(排除 embed_tokens、lm_head) |
74
+ | 原始模型 | [renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft) |
75
+ | 基礎模型 | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |
76
+
77
+ ## 其他版本
78
+
79
+ - 原始模型:[renhehuang/qwen3-1.7b-coffee-sft](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft)
80
+ - INT4 量化:[renhehuang/qwen3-1.7b-coffee-sft-quanto-int4](https://huggingface.co/renhehuang/qwen3-1.7b-coffee-sft-quanto-int4)
81
+
82
+ ## 授權
83
+
84
+ 本模型基於 Apache 2.0 授權發布。