Instructions to use Qwen/Qwen3.5-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen3.5-27B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/Qwen3.5-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-27B")
model = AutoModelForMultimodalLM.from_pretrained("Qwen/Qwen3.5-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Qwen/Qwen3.5-27B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen3.5-27B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.5-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen3.5-27B

SGLang

How to use Qwen/Qwen3.5-27B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen3.5-27B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.5-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen3.5-27B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.5-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen3.5-27B with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen3.5-27B
```

vLLM Version conflict issue

#19

by innosynth - opened Feb 28

Discussion

innosynth

Feb 28

People might face current nightly version of vLLM is not supported with this model, i have fixed in my build ping for whl file

drguolai

Feb 28

Where is the working whl file of vLLM?
thanks

innosynth

Mar 1

•

edited Mar 1

https://drive.google.com/file/d/1u53vaR-HOicoGsPcsIQx4IGbwhDSIr1h/view?usp=sharing

Please use up the above link and let us know if it not works

xxang

Mar 1

Installing the vllm version in the link, there are still the following errors when using lm-eval：

  File "/home/wangshuo/wangshuo01/Qwen3.5/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 18, in <module>
    from vllm import LLM, SamplingParams, TokensPrompt
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/__init__.py", line 70, in __getattr__
    module = import_module(module_name, __package__)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 15, in <module>
    from vllm.beam_search import (
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/beam_search.py", line 9, in <module>
    from vllm.multimodal.inputs import MultiModalInputs, mm_inputs
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/multimodal/__init__.py", line 14, in <module>
    from .registry import MultiModalRegistry
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 13, in <module>
    from .cache import (
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/multimodal/cache.py", line 14, in <module>
    from vllm.distributed.device_communicators.shm_object_storage import (
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/distributed/__init__.py", line 4, in <module>
    from .communication_op import *
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/distributed/communication_op.py", line 9, in <module>
    from .parallel_state import get_tp_group
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 256, in <module>
    direct_register_custom_op(
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/utils/torch_utils.py", line 785, in direct_register_custom_op
    from vllm.platforms import current_platform
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/platforms/__init__.py", line 252, in __getattr__
    _current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/utils/import_utils.py", line 111, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/platforms/cuda.py", line 16, in <module>
    import vllm._C  # noqa
    ^^^^^^^^^^^^^^
ImportError: /home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/_C.abi3.so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib

innosynth

Mar 5

update your torch version

Name: torch
Version: 2.10.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org
Author:
Author-email: PyTorch Team packages@pytorch.org
License: BSD-3-Clause
Requires: cuda-bindings, filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvshmem-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions
Required-by: compressed-tensors, flashinfer-python, torchaudio, torchvision, vllm, xgrammar

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment