Instructions to use 0xSero/Qwen3.6-28B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 0xSero/Qwen3.6-28B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="0xSero/Qwen3.6-28B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("0xSero/Qwen3.6-28B")
model = AutoModelForMultimodalLM.from_pretrained("0xSero/Qwen3.6-28B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use 0xSero/Qwen3.6-28B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "0xSero/Qwen3.6-28B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "0xSero/Qwen3.6-28B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/0xSero/Qwen3.6-28B

SGLang

How to use 0xSero/Qwen3.6-28B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "0xSero/Qwen3.6-28B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "0xSero/Qwen3.6-28B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "0xSero/Qwen3.6-28B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "0xSero/Qwen3.6-28B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use 0xSero/Qwen3.6-28B with Docker Model Runner:
```
docker model run hf.co/0xSero/Qwen3.6-28B
```

error with vLLM

by bnjmnmarie - opened 25 days ago

Discussion

bnjmnmarie

25 days ago

Does it run with vLLM?

The original model works with my settings but for this one
I get this error
(APIServer pid=3726) raise TypeError(
(APIServer pid=3726) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5_moe.Qwen3_5MoeConfig'>, but found type: <class 'transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig'>

vlllm v0.19.1

0xSero

Owner 24 days ago

Does it run with vLLM?

The original model works with my settings but for this one
I get this error
(APIServer pid=3726) raise TypeError(
(APIServer pid=3726) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5_moe.Qwen3_5MoeConfig'>, but found type: <class 'transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig'>

vlllm v0.19.1

I will test and let you know. Thanks for the heads up

0xSero

Owner 23 days ago

Fixed — thanks for the report.

Root cause: config.json had model_type: "qwen3_5_moe_text" (the inner text-config name). vLLM's Qwen3_5MoeConfig is registered against model_type: "qwen3_5_moe", so the auto-mapping was instantiating Qwen3_5MoeTextConfig instead and tripping the type check you saw.

Patched in commit c69703c — single field flip to model_type: "qwen3_5_moe". Please redownload config.json and retry on vLLM 0.19.1.

bnjmnmarie

22 days ago

Thanks for the update.
The error changed.
Could you run the model with vLLM?

Now I have:
(EngineCore pid=5665) File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 262, in load_fuse
d_expert_weights
(EngineCore pid=5665) curr_expert_weight = loaded_weight[expert_id]
(EngineCore pid=5665) ~~~~~~~~~~~~~^^^^^^^^^^^
(EngineCore pid=5665) IndexError: index 205 is out of bounds for dimension 0 with size 205
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]

I tried vLLM 0.19.1 and 0.21

0xSero

Owner 22 days ago

Yes i was able to run it, let me double for you

Thanks for the update.
The error changed.
Could you run the model with vLLM?

Now I have:
(EngineCore pid=5665) File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 262, in load_fuse
d_expert_weights
(EngineCore pid=5665) curr_expert_weight = loaded_weight[expert_id]
(EngineCore pid=5665) ~~~~~~~~~~~~~^^^^^^^^^^^
(EngineCore pid=5665) IndexError: index 205 is out of bounds for dimension 0 with size 205
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]

I tried vLLM 0.19.1 and 0.21

0xSero

Owner 22 days ago

Updated fix pushed — commit 450c429.

The previous patch (c69703c) switched model_type to qwen3_5_moe but left all fields at the top level in flat form. Qwen3_5MoeConfig.__init__ expects a text_config dict — without one it creates a default Qwen3_5MoeTextConfig() with num_experts=256, which causes IndexError: index 205 is out of bounds on the 205-expert REAP checkpoint.

This commit uses the proper wrapper structure: outer model_type: "qwen3_5_moe" with all MoE fields (including num_experts: 205) nested inside text_config. Redownload config.json and retry.

bnjmnmarie

20 days ago

Thanks!

It works. I'm running large scale evals to check where it retains accuracy the most.
I'll probably publish the results next week.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment