error with vLLM

#3
by bnjmnmarie - opened

Does it run with vLLM?

The original model works with my settings but for this one
I get this error
(APIServer pid=3726) raise TypeError(
(APIServer pid=3726) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5_moe.Qwen3_5MoeConfig'>, but found type: <class 'transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig'>

vlllm v0.19.1

Does it run with vLLM?

The original model works with my settings but for this one
I get this error
(APIServer pid=3726) raise TypeError(
(APIServer pid=3726) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5_moe.Qwen3_5MoeConfig'>, but found type: <class 'transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig'>

vlllm v0.19.1

I will test and let you know. Thanks for the heads up

Fixed β€” thanks for the report.

Root cause: config.json had model_type: "qwen3_5_moe_text" (the inner text-config name). vLLM's Qwen3_5MoeConfig is registered against model_type: "qwen3_5_moe", so the auto-mapping was instantiating Qwen3_5MoeTextConfig instead and tripping the type check you saw.

Patched in commit c69703c β€” single field flip to model_type: "qwen3_5_moe". Please redownload config.json and retry on vLLM 0.19.1.

Thanks for the update.
The error changed.
Could you run the model with vLLM?

Now I have:
(EngineCore pid=5665) File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 262, in load_fuse
d_expert_weights
(EngineCore pid=5665) curr_expert_weight = loaded_weight[expert_id]
(EngineCore pid=5665) ~~~~~~~~~~~~~^^^^^^^^^^^
(EngineCore pid=5665) IndexError: index 205 is out of bounds for dimension 0 with size 205
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]

I tried vLLM 0.19.1 and 0.21

Yes i was able to run it, let me double for you

Thanks for the update.
The error changed.
Could you run the model with vLLM?

Now I have:
(EngineCore pid=5665) File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 262, in load_fuse
d_expert_weights
(EngineCore pid=5665) curr_expert_weight = loaded_weight[expert_id]
(EngineCore pid=5665) ~~~~~~~~~~~~~^^^^^^^^^^^
(EngineCore pid=5665) IndexError: index 205 is out of bounds for dimension 0 with size 205
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]

I tried vLLM 0.19.1 and 0.21

Updated fix pushed β€” commit 450c429.

The previous patch (c69703c) switched model_type to qwen3_5_moe but left all fields at the top level in flat form. Qwen3_5MoeConfig.__init__ expects a text_config dict β€” without one it creates a default Qwen3_5MoeTextConfig() with num_experts=256, which causes IndexError: index 205 is out of bounds on the 205-expert REAP checkpoint.

This commit uses the proper wrapper structure: outer model_type: "qwen3_5_moe" with all MoE fields (including num_experts: 205) nested inside text_config. Redownload config.json and retry.

Thanks!

It works. I'm running large scale evals to check where it retains accuracy the most.
I'll probably publish the results next week.

Sign up or log in to comment