--- title: ALIA Vapol Runtime Demo colorFrom: blue colorTo: green sdk: gradio app_file: app.py pinned: false license: apache-2.0 --- # ALIA-40B Distill Vapol Demo Space This Space is a lightweight Gradio scaffold for demonstrating the practical utility of `apol/alia-40b-distill-vapol` without loading 40B weights inside the Space by default. The app has two execution paths: - **Endpoint path:** if endpoint environment variables are configured, the app sends the selected prompt to an external HF Inference Endpoint or OpenAI-compatible chat/completions endpoint. - **Deterministic demo path:** when no endpoint is configured, the app uses small local draft responses and applies a runtime repair layer modeled on `scripts/repair_eval_responses.py` from the ALIA Vapol workspace. ## What It Demonstrates The demo focuses on the same deployment-shaped behaviors described in the model card: - structured JSON output against a compact schema; - tool-call behavior when required arguments are missing; - RAG answer synthesis with explicit citation labels; - simple code repair formatting for `average(xs)`. The deterministic fallback is not a substitute for model inference. It is a deployable illustration of how ALIA Vapol can be paired with runtime validators and high-confidence repair for formal outputs. ## Optional Endpoint Configuration Set one of these groups of Space secrets or variables. ### OpenAI-Compatible Endpoint Use this for vLLM, TGI OpenAI-compatible mode, LM Studio proxies, or hosted gateways. ```text OPENAI_BASE_URL=https://your-endpoint.example.com/v1 OPENAI_API_KEY=... OPENAI_MODEL=apol/alia-40b-distill-vapol ``` ### HF Inference Endpoint Use this for a dedicated Hugging Face Inference Endpoint that accepts generation payloads. ```text HF_INFERENCE_ENDPOINT_URL=https://your-endpoint.endpoints.huggingface.cloud HF_TOKEN=... HF_MODEL_ID=apol/alia-40b-distill-vapol ``` If neither path is configured, the Space still runs entirely as a deterministic demo. ## Run Locally ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt python app.py ``` On Windows PowerShell: ```powershell py -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt python app.py ``` ## Deploy To Hugging Face Spaces Create a Gradio Space, then upload this folder's contents: ```text app.py README.md requirements.txt ``` The Space does not download or initialize local 40B model weights unless you add that behavior yourself. For a practical public demo, keep inference outside the Space and configure the endpoint secrets above. ## Notes - The runtime repair layer only handles high-confidence, validator-shaped failures. - Model-only results and runtime-repaired results should be reported separately. - The app is intentionally small so it can run on default CPU Spaces.