---
language:
- en
- ko
license: other
license_name: solar-apache-2.0
tags:
- upstage
- solar
- moe
- 100b
- llm
- fp8
base_model:
- upstage/Solar-Open-100B
---

This is [upstage/Solar-Open-100B](https://huggingface.co/upstage/Solar-Open-100B) compressed with `llm-compressor` 0.13.0 using a data-free recipe.

## This model requires a fork of vLLM.

Create and activate a Python virtual environment
```bash
uv venv --python 3.12 --seed
source .venv/bin/activate
```

Install Solar Open's optimized vLLM
```bash
VLLM_PRECOMPILED_WHEEL_LOCATION="https://github.com/vllm-project/vllm/releases/download/v0.12.0/vllm-0.12.0-cp38-abi3-manylinux_2_31_x86_64.whl" \
VLLM_USE_PRECOMPILED=1 \
uv pip install git+https://github.com/UpstageAI/vllm.git@v0.12.0-solar-open
```

## This model implements custom Logits Processors

Start the vLLM server (For 4x48 GPUs)

```bash
vllm serve upstage/Solar-Open-100B \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser solar_open \
    --reasoning-parser solar_open \
    --logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \
    --logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \
    --tensor-parallel-size 4
```

For 96GB GPUs you should be able to drop down to tp2.

## Reasoning Effort

This is not documented in the upstream model card but the solar_open_logits_processor implements `reasoning_effort` with two values: `medium` and `high`

See [solar_open_logits_processor.py](https://github.com/UpstageAI/vllm/blob/c9a05e077cd82df8cab4f729396c178c29c81aa8/vllm/model_executor/models/solar_open_logits_processor.py#L59)

Example sampling configuration:

```
{
    "temperature": 0.8,
    "top_p": 0.95,
    "top_k": 50,
    "reasoning_effort": "medium"
}
```