---
title: ALIA Vapol Runtime Demo
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
---

# ALIA-40B Distill Vapol Demo Space

This Space is a lightweight Gradio scaffold for demonstrating the practical utility of
`apol/alia-40b-distill-vapol` without loading 40B weights inside the Space by default.

The app has two execution paths:

- **Endpoint path:** if endpoint environment variables are configured, the app sends the selected prompt to an external HF Inference Endpoint or OpenAI-compatible chat/completions endpoint.
- **Deterministic demo path:** when no endpoint is configured, the app uses small local draft responses and applies a runtime repair layer modeled on `scripts/repair_eval_responses.py` from the ALIA Vapol workspace.

## What It Demonstrates

The demo focuses on the same deployment-shaped behaviors described in the model card:

- structured JSON output against a compact schema;
- tool-call behavior when required arguments are missing;
- RAG answer synthesis with explicit citation labels;
- simple code repair formatting for `average(xs)`.

The deterministic fallback is not a substitute for model inference. It is a deployable illustration of how ALIA Vapol can be paired with runtime validators and high-confidence repair for formal outputs.

## Optional Endpoint Configuration

Set one of these groups of Space secrets or variables.

### OpenAI-Compatible Endpoint

Use this for vLLM, TGI OpenAI-compatible mode, LM Studio proxies, or hosted gateways.

```text
OPENAI_BASE_URL=https://your-endpoint.example.com/v1
OPENAI_API_KEY=...
OPENAI_MODEL=apol/alia-40b-distill-vapol
```

### HF Inference Endpoint

Use this for a dedicated Hugging Face Inference Endpoint that accepts generation payloads.

```text
HF_INFERENCE_ENDPOINT_URL=https://your-endpoint.endpoints.huggingface.cloud
HF_TOKEN=...
HF_MODEL_ID=apol/alia-40b-distill-vapol
```

If neither path is configured, the Space still runs entirely as a deterministic demo.

## Run Locally

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py
```

On Windows PowerShell:

```powershell
py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python app.py
```

## Deploy To Hugging Face Spaces

Create a Gradio Space, then upload this folder's contents:

```text
app.py
README.md
requirements.txt
```

The Space does not download or initialize local 40B model weights unless you add that behavior yourself. For a practical public demo, keep inference outside the Space and configure the endpoint secrets above.

## Notes

- The runtime repair layer only handles high-confidence, validator-shaped failures.
- Model-only results and runtime-repaired results should be reported separately.
- The app is intentionally small so it can run on default CPU Spaces.