Image-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
feature-extraction
vision
multimodal
safety
content-moderation
qwen2.5-vl
image-classification
vision-language
conversational
custom_code
text-generation-inference
Instructions to use etri-vilab/SafeQwen2.5-VL-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use etri-vilab/SafeQwen2.5-VL-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="etri-vilab/SafeQwen2.5-VL-32B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForVision2Seq processor = AutoProcessor.from_pretrained("etri-vilab/SafeQwen2.5-VL-32B", trust_remote_code=True) model = AutoModelForVision2Seq.from_pretrained("etri-vilab/SafeQwen2.5-VL-32B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use etri-vilab/SafeQwen2.5-VL-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "etri-vilab/SafeQwen2.5-VL-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "etri-vilab/SafeQwen2.5-VL-32B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/etri-vilab/SafeQwen2.5-VL-32B
- SGLang
How to use etri-vilab/SafeQwen2.5-VL-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "etri-vilab/SafeQwen2.5-VL-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "etri-vilab/SafeQwen2.5-VL-32B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "etri-vilab/SafeQwen2.5-VL-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "etri-vilab/SafeQwen2.5-VL-32B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use etri-vilab/SafeQwen2.5-VL-32B with Docker Model Runner:
docker model run hf.co/etri-vilab/SafeQwen2.5-VL-32B
File size: 3,708 Bytes
7cc9477 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | # License for SafeQwen2.5-VL
The SafeQwen2.5-VL project, including all modifications and independently developed code by **Electronics and Telecommunications Research Institute (ETRI)**, is licensed under the Apache License, Version 2.0 (the "License"). This document outlines the license grant, the definition of this derivative work, and the required notices.
---
## 1. Definition of Work
### Model Name
**SafeQwen2.5-VL**
### Reference Publication
This model (SafeQwen2.5-VL) is the official model presented in the academic paper:
> **"HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model"**
> [https://arxiv.org/abs/2506.04704](https://arxiv.org/abs/2506.04704)
### Base Models & License Provenance
This model is a **Derivative Work** based on the following Qwen2.5-VL models (developed by the Qwen Team, Alibaba Cloud):
- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
- [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
As of November 17, 2025, these base models were provided under the `apache-2.0` license, as specified in their Hugging Face repository metadata (note: a formal LICENSE file was not present in the repositories at this time). This SafeQwen2.5-VL license is predicated on the apache-2.0 license grant of the base models.
### Modifications by ETRI
This work integrates an independently developed **Visual Guard Module (VGM)** to classify harmful image inputs and generate safe text responses. All modifications and additions are the work of ETRI.
---
## 2. Apache License 2.0 Grant
This entire project (SafeQwen2.5-VL) is licensed under the **Apache License, Version 2.0**.
```
Copyright 2025 Electronics and Telecommunications Research Institute (ETRI)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
---
## 3. Required Notices for Derivative Work
In compliance with **Section 4 of the Apache License 2.0**, the following notices are provided.
### a) Statement of Modifications
This work (SafeQwen2.5-VL) is a derivative of the original Qwen2.5-VL models. The following significant modifications were made by ETRI:
- Integration of a **Visual Guard Module (VGM)** for harmful content classification
- Fine-tuning of the model for safety alignment based on the VGM's outputs
- *[List other major modifications made by ETRI here, if any]*
### b) Original Copyright Notice (from Qwen2.5-VL)
This project incorporates components from the Qwen2.5-VL models. Users of this software must also comply with the terms of the original license, including the retention of its copyright notices.
The original Qwen models are typically accompanied by a notice similar to:
```
Copyright (c) Alibaba Cloud.
```
> **Note:** If the original Qwen2.5-VL distribution included a NOTICE file, the contents of that NOTICE file must also be included in this distribution.
---
## 4. Attribution and Contact
This SafeQwen2.5-VL model was developed by the **Electronics and Telecommunications Research Institute (ETRI)** in the Republic of Korea.
For any questions regarding the SafeQwen2.5-VL model or its licensing, please contact:
**Youngwan Lee**
Email: [yw.lee@etri.re.kr](mailto:yw.lee@etri.re.kr) |