--- license: apache-2.0 base_model: Qwen/Qwen3.5-35B-A3B tags: - qwen3.5 - moe - text-only - vllm --- # Qwen3.5-35B-A3B Text-Only Text-only weights extracted from [Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) (VLM, Mixture-of-Experts) for use with vLLM's `Qwen3_5MoeForCausalLM` architecture. ## What this is Qwen3.5 MoE models are natively multimodal (VLM). Their HuggingFace checkpoints use `Qwen3_5MoeForConditionalGeneration` with weights prefixed as `model.language_model.*`. This repo provides the **language model backbone only**, with: - `architectures: ["Qwen3_5MoeForCausalLM"]` - `model_type: "qwen3_5_moe_text"` - Weight keys at `model.layers.*` (standard causal LM format, no `language_model.` prefix) - Vision encoder and MTP weights removed ## Model structure - **Architecture**: Hybrid GatedDeltaNet + Full Attention, Sparse Mixture-of-Experts - **Total parameters**: ~35B (3B active per token) - **Dtype**: bfloat16 ## How to use with vLLM ```python from vllm import LLM llm = LLM(model="codecho/Qwen3.5-35B-A3B-text-only", trust_remote_code=True, tensor_parallel_size=2) ```