Instructions to use Qwen/Qwen3.5-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen3.5-27B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Qwen/Qwen3.5-27B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-27B") model = AutoModelForMultimodalLM.from_pretrained("Qwen/Qwen3.5-27B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Qwen/Qwen3.5-27B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen3.5-27B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3.5-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Qwen/Qwen3.5-27B
- SGLang
How to use Qwen/Qwen3.5-27B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3.5-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3.5-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3.5-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3.5-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Qwen/Qwen3.5-27B with Docker Model Runner:
docker model run hf.co/Qwen/Qwen3.5-27B
vLLM Version conflict issue
People might face current nightly version of vLLM is not supported with this model, i have fixed in my build ping for whl file
Where is the working whl file of vLLM?
thanks
https://drive.google.com/file/d/1u53vaR-HOicoGsPcsIQx4IGbwhDSIr1h/view?usp=sharing
Please use up the above link and let us know if it not works
Installing the vllm version in the link, there are still the following errors when using lm-eval:
File "/home/wangshuo/wangshuo01/Qwen3.5/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 18, in <module>
from vllm import LLM, SamplingParams, TokensPrompt
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/__init__.py", line 70, in __getattr__
module = import_module(module_name, __package__)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 15, in <module>
from vllm.beam_search import (
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/beam_search.py", line 9, in <module>
from vllm.multimodal.inputs import MultiModalInputs, mm_inputs
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/multimodal/__init__.py", line 14, in <module>
from .registry import MultiModalRegistry
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 13, in <module>
from .cache import (
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/multimodal/cache.py", line 14, in <module>
from vllm.distributed.device_communicators.shm_object_storage import (
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/distributed/__init__.py", line 4, in <module>
from .communication_op import *
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/distributed/communication_op.py", line 9, in <module>
from .parallel_state import get_tp_group
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 256, in <module>
direct_register_custom_op(
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/utils/torch_utils.py", line 785, in direct_register_custom_op
from vllm.platforms import current_platform
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/platforms/__init__.py", line 252, in __getattr__
_current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/utils/import_utils.py", line 111, in resolve_obj_by_qualname
module = importlib.import_module(module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/platforms/cuda.py", line 16, in <module>
import vllm._C # noqa
^^^^^^^^^^^^^^
ImportError: /home/wangshuo/wangshuo01/anaconda3/envs/EVAL3/lib/python3.12/site-packages/vllm/_C.abi3.so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib
update your torch version
Name: torch
Version: 2.10.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org
Author:
Author-email: PyTorch Team packages@pytorch.org
License: BSD-3-Clause
Requires: cuda-bindings, filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvshmem-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions
Required-by: compressed-tensors, flashinfer-python, torchaudio, torchvision, vllm, xgrammar