--- language: - en - ko license: other license_name: solar-apache-2.0 tags: - upstage - solar - moe - 100b - llm - fp8 base_model: - upstage/Solar-Open-100B --- This is [upstage/Solar-Open-100B](https://huggingface.co/upstage/Solar-Open-100B) compressed with `llm-compressor` 0.13.0 using a data-free recipe. ## This model requires a fork of vLLM. Create and activate a Python virtual environment ```bash uv venv --python 3.12 --seed source .venv/bin/activate ``` Install Solar Open's optimized vLLM ```bash VLLM_PRECOMPILED_WHEEL_LOCATION="https://github.com/vllm-project/vllm/releases/download/v0.12.0/vllm-0.12.0-cp38-abi3-manylinux_2_31_x86_64.whl" \ VLLM_USE_PRECOMPILED=1 \ uv pip install git+https://github.com/UpstageAI/vllm.git@v0.12.0-solar-open ``` ## This model implements custom Logits Processors Start the vLLM server (For 4x48 GPUs) ```bash vllm serve upstage/Solar-Open-100B \ --trust-remote-code \ --enable-auto-tool-choice \ --tool-call-parser solar_open \ --reasoning-parser solar_open \ --logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \ --logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \ --tensor-parallel-size 4 ``` For 96GB GPUs you should be able to drop down to tp2. ## Reasoning Effort This is not documented in the upstream model card but the solar_open_logits_processor implements `reasoning_effort` with two values: `medium` and `high` See [solar_open_logits_processor.py](https://github.com/UpstageAI/vllm/blob/c9a05e077cd82df8cab4f729396c178c29c81aa8/vllm/model_executor/models/solar_open_logits_processor.py#L59) Example sampling configuration: ``` { "temperature": 0.8, "top_p": 0.95, "top_k": 50, "reasoning_effort": "medium" } ```