Image-Text-to-Text
Transformers
English
Chinese
Qwen3-VL
Qwen3-VL-2B-Instruct
Qwen3-VL-4B-Instruct
Int4
VLM
GPTQ
Instructions to use AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384
- SGLang
How to use AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384 with Docker Model Runner:
docker model run hf.co/AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4-AX630C-P256-CTX384
| import base64 | |
| import glob | |
| from openai import OpenAI | |
| import cv2 | |
| BASE_URL = "http://localhost:8000/v1" | |
| def img_to_data_url(img_path: str): | |
| img = cv2.imread(img_path) | |
| if img is None: | |
| raise FileNotFoundError(f"Cannot read image: {img_path}") | |
| ok, buf = cv2.imencode(".jpg", img) | |
| if not ok: | |
| raise RuntimeError("cv2.imencode failed") | |
| b64 = base64.b64encode(buf).decode("ascii") | |
| return f"data:image/jpeg;base64,{b64}" | |
| def test(openai_messages): | |
| client = OpenAI(api_key="not-needed", base_url=BASE_URL) | |
| stream = client.chat.completions.create( | |
| model="AXERA-TECH/Qwen3-VL-2B-Instruct-GPTQ-Int4", | |
| messages=openai_messages, | |
| stream=True, | |
| ) | |
| out_chunks = [] | |
| for ev in stream: | |
| delta = ev.choices[0].delta | |
| if delta and delta.content: | |
| out_chunks.append(delta.content) | |
| print(delta.content, end="", flush=True) | |
| print() | |
| assistant_text = "".join(out_chunks).strip() | |
| def test_image(): | |
| image_data = img_to_data_url("../demo_cv308/frame_0075.jpg") | |
| openai_messages = { | |
| "role": "user", | |
| "content": [ | |
| {"type": "text", "text": "描述一下这张图片"}, | |
| {"type": "image_url", "image_url": image_data}, | |
| ], | |
| } | |
| test(openai_messages) | |
| def test_video(): | |
| image_list = glob.glob("../demo_cv308/*.jpg") | |
| image_list.sort() | |
| image_data_list = [img_to_data_url(img) for img in image_list] | |
| openai_messages = { | |
| "role": "user", | |
| "content": [ | |
| {"type": "text", "text": "描述一下这个视频"}, | |
| {"type": "image_url", "is_video":True, "image_url": image_data_list}, | |
| ], | |
| } | |
| test(openai_messages) | |
| test_video() | |