---
library_name: transformers
license: bsd-3-clause
base_model:
- tencent/HY-MT1.5-1.8B
tags:
- HY-MT1.5
- HY-MT1.5-1.8B
- HY-MT1.5-1.8B_GPTQ_INT4
- Int4
- translation
language:
- zh
- en
- fr
- pt
- es
- ja
- tr
- ru
- ar
- ko
- th
- it
- de
- vi
- ms
- id
- tl
- hi
- pl
- cs
- nl
- km
- my
- fa
- gu
- ur
- te
- mr
- he
- bn
- ta
- uk
- bo
- kk
- mn
- ug
---

# HY-MT1.5-1.8B_GPTQ_INT4-AX620E

This version of HY-MT1.5-1.8B_GPTQ_INT4 has been converted to run on the Axera NPU using **w4a16** quantization.

This model has been optimized with the following LoRA: 

Compatible with Pulsar2 version: > 5.1-patch1-dirty.

Please note that the context of the model is 2k and the maximum prefill length is 1k.

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo:

https://huggingface.co/tencent/HY-MT1.5-1.8B

[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4.axera/tree/main/model_convert)

[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl) 

[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)

## Support Platform

- AX620E
  - AX620E DEMO Board
 
|Chips|ttft|w4a16|
|--|--|--|
|AX620E| 11538.6 ms (512 prefill) | 4.05 tokens/sec|


## How to use

Download all files from this repository to the device

```sh
$ tree -L 1
.
├── assets
├── config.json
├── gradio_demo.py
├── hymt1-5_1k_ax620e_axmodel
├── hymt1-5_tokenizer
├── infer_axmodel.py
├── infer_torch.py
├── README.md
└── utils

5 directories, 5 files
```

### Install transformer

```
pip install transformers==4.57.1
```

### Inference with AX620E Demo Board

Start the OpenAI-compatible API with `axllm serve`:

```sh
axllm serve . --port 8000
```

本仓库也附带一个 aarch64 `axllm` 二进制，可直接在本仓库目录下尝试运行：

```sh
chmod +x ./bin/axllm
./bin/axllm serve . --port 8000
```

该二进制与 AX650 仓库中的打包产物同源，来源和校验信息记录在 `bin/axllm.version.json` 中。当前已完成 AX650 上的 HY-MT OpenAI API 验证，AX620E 板端请结合实机环境继续确认。

Interactive translation using the `C++ Gradio Demo`:

```sh
python3 gradio_cpp_backend.py --api_base http://127.0.0.1:8000 --model AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4-AX620E
```

English Translate to Chinese:

![demo_1](assets/gradio_cpp_demo_0.png)

Chinese Translate to Japanese:

![demo_2](assets/gradio_cpp_demo_1.png)

If you want to run translation tasks from the command-line terminal, you can run the following command:

```sh
$ ./run_hymt1-5_1.8b_ax620e.sh
[I][                            Init][ 267]: LLM init ok
[I][                            Init][ 269]: Left CMM:3711 MB
Type "q" to exit, Ctrl+c to stop current running
prompt(输入q退出) >> 今天是个好日子,适合读书和运动.
[I][                             Run][ 349]: input token num : 23, prefill_split_num : 1
[I][                             Run][ 388]: input_num_token:23
[I][                             Run][ 581]: ttft: 157.15 ms
Today is a great day. It’s the perfect time to read and exercise.

[N][                             Run][ 719]: hit eos,avg 13.61 token/s

[I][                             Run][ 724]: decode profile: infer 58.079 ms/token, cache_copy 0.110, post 14.071, callback 0.018, tokens 17
```

---

Interactive conversations using the `Python Gradio Demo`:

```bash
$ python3 gradio_demo.py --axmodel_path hymt1-5_1k_ax620e_axmodel --max_seq_len 1023
```

English Translate to Chinese:

![demo_1](assets/gradio_demo_0.png)

Chinese Translate to Japanese:

![demo_2](assets/gradio_demo_1.png)

---

Run the following command on the Axera board to start a chat conversation:

```sh
$ python3 infer_axmodel.py -q "It’s on the house."

# output
Init InferenceSession: 100%|██████████████████████████████████████████████████████████| 32/32 [00:02<00:00, 14.55it/s]
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1-dirty 43f8606b-dirty
Model loaded successfully!
slice_indices: [0]
Slice prefill done: 0
answer >> 这是免费的。
```

If you are testing on an `AX620E` demo board, run the command below:

```sh
python3 gradio_demo.py --axmodel_path hymt1-5_1k_ax620e_axmodel --max_seq_len 1023
```