Instructions to use 0xSero/Qwen3.6-28B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 0xSero/Qwen3.6-28B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="0xSero/Qwen3.6-28B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("0xSero/Qwen3.6-28B") model = AutoModelForMultimodalLM.from_pretrained("0xSero/Qwen3.6-28B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use 0xSero/Qwen3.6-28B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "0xSero/Qwen3.6-28B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xSero/Qwen3.6-28B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/0xSero/Qwen3.6-28B
- SGLang
How to use 0xSero/Qwen3.6-28B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "0xSero/Qwen3.6-28B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xSero/Qwen3.6-28B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "0xSero/Qwen3.6-28B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xSero/Qwen3.6-28B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use 0xSero/Qwen3.6-28B with Docker Model Runner:
docker model run hf.co/0xSero/Qwen3.6-28B
error with vLLM
Does it run with vLLM?
The original model works with my settings but for this one
I get this error
(APIServer pid=3726) raise TypeError(
(APIServer pid=3726) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5_moe.Qwen3_5MoeConfig'>, but found type: <class 'transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig'>
vlllm v0.19.1
Does it run with vLLM?
The original model works with my settings but for this one
I get this error
(APIServer pid=3726) raise TypeError(
(APIServer pid=3726) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5_moe.Qwen3_5MoeConfig'>, but found type: <class 'transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig'>vlllm v0.19.1
I will test and let you know. Thanks for the heads up
Fixed β thanks for the report.
Root cause: config.json had model_type: "qwen3_5_moe_text" (the inner text-config name). vLLM's Qwen3_5MoeConfig is registered against model_type: "qwen3_5_moe", so the auto-mapping was instantiating Qwen3_5MoeTextConfig instead and tripping the type check you saw.
Patched in commit c69703c β single field flip to model_type: "qwen3_5_moe". Please redownload config.json and retry on vLLM 0.19.1.
Thanks for the update.
The error changed.
Could you run the model with vLLM?
Now I have:
(EngineCore pid=5665) File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 262, in load_fuse
d_expert_weights
(EngineCore pid=5665) curr_expert_weight = loaded_weight[expert_id]
(EngineCore pid=5665) ~~~~~~~~~~~~~^^^^^^^^^^^
(EngineCore pid=5665) IndexError: index 205 is out of bounds for dimension 0 with size 205
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
I tried vLLM 0.19.1 and 0.21
Yes i was able to run it, let me double for you
Thanks for the update.
The error changed.
Could you run the model with vLLM?Now I have:
(EngineCore pid=5665) File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 262, in load_fuse
d_expert_weights
(EngineCore pid=5665) curr_expert_weight = loaded_weight[expert_id]
(EngineCore pid=5665) ~~~~~~~~~~~~~^^^^^^^^^^^
(EngineCore pid=5665) IndexError: index 205 is out of bounds for dimension 0 with size 205
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]I tried vLLM 0.19.1 and 0.21
Updated fix pushed β commit 450c429.
The previous patch (c69703c) switched model_type to qwen3_5_moe but left all fields at the top level in flat form. Qwen3_5MoeConfig.__init__ expects a text_config dict β without one it creates a default Qwen3_5MoeTextConfig() with num_experts=256, which causes IndexError: index 205 is out of bounds on the 205-expert REAP checkpoint.
This commit uses the proper wrapper structure: outer model_type: "qwen3_5_moe" with all MoE fields (including num_experts: 205) nested inside text_config. Redownload config.json and retry.
Thanks!
It works. I'm running large scale evals to check where it retains accuracy the most.
I'll probably publish the results next week.