Instructions to use AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E", dtype="auto") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E", dtype="auto")HY-MT1.5-1.8B_GPTQ_INT4-AX620E
This version of HY-MT1.5-1.8B_GPTQ_INT4 has been converted to run on the Axera NPU using w4a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: > 5.1-patch1-dirty.
Please note that the context of the model is 2k and the maximum prefill length is 1k.
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo:
https://huggingface.co/tencent/HY-MT1.5-1.8B
How to Convert LLM from Huggingface to axmodel
Support Platform
- AX620E
- AX620E DEMO Board
| Chips | ttft | w4a16 |
|---|---|---|
| AX620E | 11538.6 ms (512 prefill) | 4.05 tokens/sec |
How to use
Download all files from this repository to the device
$ tree -L 1
.
├── assets
├── config.json
├── gradio_demo.py
├── hymt1-5_1k_ax620e_axmodel
├── hymt1-5_tokenizer
├── infer_axmodel.py
├── infer_torch.py
├── README.md
└── utils
5 directories, 5 files
Install transformer
pip install transformers==4.57.1
Inference with AX620E Demo Board
Start the OpenAI-compatible API with axllm serve:
axllm serve . --port 8000
本仓库也附带一个 aarch64 axllm 二进制,可直接在本仓库目录下尝试运行:
chmod +x ./bin/axllm
./bin/axllm serve . --port 8000
该二进制与 AX650 仓库中的打包产物同源,来源和校验信息记录在 bin/axllm.version.json 中。当前已完成 AX650 上的 HY-MT OpenAI API 验证,AX620E 板端请结合实机环境继续确认。
Interactive translation using the C++ Gradio Demo:
python3 gradio_cpp_backend.py --api_base http://127.0.0.1:8000 --model AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E
English Translate to Chinese:
Chinese Translate to Japanese:
If you want to run translation tasks from the command-line terminal, you can run the following command:
$ ./run_hymt1-5_1.8b_ax620e.sh
[I][ Init][ 267]: LLM init ok
[I][ Init][ 269]: Left CMM:3711 MB
Type "q" to exit, Ctrl+c to stop current running
prompt(输入q退出) >> 今天是个好日子,适合读书和运动.
[I][ Run][ 349]: input token num : 23, prefill_split_num : 1
[I][ Run][ 388]: input_num_token:23
[I][ Run][ 581]: ttft: 157.15 ms
Today is a great day. It’s the perfect time to read and exercise.
[N][ Run][ 719]: hit eos,avg 13.61 token/s
[I][ Run][ 724]: decode profile: infer 58.079 ms/token, cache_copy 0.110, post 14.071, callback 0.018, tokens 17
Interactive conversations using the Python Gradio Demo:
$ python3 gradio_demo.py --axmodel_path hymt1-5_1k_ax620e_axmodel --max_seq_len 1023
English Translate to Chinese:
Chinese Translate to Japanese:
Run the following command on the Axera board to start a chat conversation:
$ python3 infer_axmodel.py -q "It’s on the house."
# output
Init InferenceSession: 100%|██████████████████████████████████████████████████████████| 32/32 [00:02<00:00, 14.55it/s]
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1-dirty 43f8606b-dirty
Model loaded successfully!
slice_indices: [0]
Slice prefill done: 0
answer >> 这是免费的。
If you are testing on an AX620E demo board, run the command below:
python3 gradio_demo.py --axmodel_path hymt1-5_1k_ax620e_axmodel --max_seq_len 1023
- Downloads last month
- 5
Model tree for AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E
Base model
tencent/HY-MT1.5-1.8B



# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E")