# LLM runner — populate a submission's answers Standalone script that runs one or more LLMs over an intake submission's questions, using the trial's parsed SAP as the only source, and writes the completed `output.json` (+ `output.R`) per model to local files. The script only **reads** from HF and **writes outputs locally** — it never uploads anything to Hugging Face. You can also run it **fully local** (local input files, no HF access, no `HF_TOKEN`). ## Setup (uv) The script declares its dependencies inline (PEP 723), so `uv run` creates an isolated environment and installs them automatically — no manual venv/pip. ```bash # install uv once: https://docs.astral.sh/uv/ (curl -LsSf https://astral.sh/uv/install.sh | sh) export ANTHROPIC_API_KEY=... # for claude-* models export OPENAI_API_KEY=... # for gpt-* models # only if you read inputs from HF (not needed for fully-local mode): export HF_TOKEN=hf_... # read access to the private intake_form_data repo uv run run_llm.py --submission NCT02578680__EricZ ``` (Prefer plain pip? `pip install -r requirements.txt` then `python run_llm.py ...`.) ## Run — fully local (no HF) Point it at a local submission JSON and a local SAP file: ```bash python run_llm.py \ --submission NCT02578680__EricZ \ --submission-file ./sub.json \ --sap-file ./sap.lines.json \ --models claude-opus-4-8 gpt-4o ``` - `--submission-file` — a submission record, or a bare `{trial_id, username, prompts}`. - `--sap-file` — a `sap.lines.json` (rebuilt with page markers) **or** any `.txt`/`.md` (used as-is). - `--submission` is still used only to name the output folder. ## Run — reading inputs from HF ```bash python run_llm.py --submission NCT02578680__EricZ # pin an exact submission version (default is the latest): python run_llm.py --submission NCT02578680__EricZ \ --version 2026-06-07T17-23-05-870000+00-00.json # pick a specific SAP doc + models: python run_llm.py --submission NCT02578680__EricZ \ --doc-id 10.1056_nejmoa1801005 \ --models claude-opus-4-8 gpt-4o ``` `--version` accepts the bare filename, a full repo path, or a URL-encoded `%2B` (it normalizes to the literal `+`). ## What it does 1. **Submission** — loads the latest version of `submissions//.json` from `trialdesignbench/intake_form_data`. 2. **SAP** — maps the NCT id (prefix of the submission name) to a `documents/` folder via `data/tdr.parquet` (one NCT can map to several docs — use `--doc-id` to pick), then rebuilds SAP text **with page markers** from `sap.lines.json` (so answers can cite page numbers, as the prompt requires). 3. **Prompt block** — turns the questions into the null-placeholder block (`extraction_only` → `extracted_value: null`; `derivation_required` → `dimensions: {inputs_used, method, calculated_value}`). 4. **Models** — sends the task prompt (system) + SAP + prompt block to each model and parses the returned ```json``` (output.json) and ```r``` (output.R) blocks. ## Outputs ``` out// prompt_block.json # the null block the models were asked to fill sap.txt # SAP text fed to the models (with page markers) /output.json # completed block /output.R # R for derivation questions /raw.txt # raw model response (for debugging) /error.txt # present only if the call failed ``` ## Notes / caveats - **Context size:** parsed SAPs are large (e.g. NCT02578680 ≈ 480K chars ≈ ~120K tokens). That fits Claude (large context) but may exceed smaller context windows (e.g. gpt-4o's 128K). For those, use a long-context model, or pre-trim the SAP. The script does not truncate. - **Closed-book:** the prompt forbids outside knowledge; the script only ever sends the one SAP document. - **Models are pluggable:** any `claude-*` id routes to Anthropic, any `gpt-*`/`o1-*`/`o3-*` to OpenAI. Edit `DEFAULT_MODELS` or pass `--models`. - Anthropic calls cache the SAP-bearing block, so re-runs are cheaper. ```