Spaces:

trialdesignbench
/

tdb-intake

Running

App Files Files Community

tdb-intake / README.md

tttjjj

Make submissions editable; load by trial_id + username

089c7d1 21 days ago

7.46 kB

title: TDB Intake
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false

Trial Design Benchmark — Intake

A Streamlit intake form for trial statisticians. Submissions are saved to a Hugging Face Dataset repo. An Admin page (in the sidebar) lets reviewers triage submissions (pending / reviewed / needs_fix).

What it does

Form (app.py) — statisticians enter trial_id, username, and a list of questions. Each question has:
- design_element (dropdown — when "Others" is picked, a free-text input appears)
- question_type (dropdown — extraction_only / derivation_required)
- question (free text)
- Rubrics auto-generated by question type:
  - extraction_only → 1 rubric: output.json
  - derivation_required → 4 rubrics: output.json × {Inputs used, Calculated value, Method} + output.R × {Reproducibility}
- Each rubric collects points, tolerance, criterion.
- Load existing submission — re-enter the same trial_id + username and click Load to pull a previous submission back into the form, edit it, and Submit again to update.
Admin page (pages/1_Admin.py) — password-gated review console. A submission can be reviewed many times by different people: each review (status + reviewer name + comment) is written as its own file under reviews/<submission>/, and the page shows the full timeline. The current status is the most recent review's status.

Run locally

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

Without HF env vars set, submissions land in ./data/submissions/<...>.json on disk — fine for dev.

Deploy on Hugging Face Spaces

1. Create a private HF Dataset repo

Sign in at https://huggingface.co
Click your avatar → New Dataset
Owner: your username (e.g. ttt-77)
Name: e.g. tdb-intake-submissions
Visibility: Private
Create. Leave it empty.

2. Generate an HF access token

https://huggingface.co/settings/tokens → New token
Token type: Write
Save the hf_... string.

3. Create the Space

Click your avatar → New Space
Name: e.g. tdb-intake
SDK: Streamlit
Visibility: your choice (public works; the form is intended for public submission, only data needs to be private)
Create — HF gives you a git repo URL.

4. Push this code to the Space

git remote add hf https://huggingface.co/spaces/<your-username>/tdb-intake
git push hf main

Or, in the HF Space's Settings → Repository, link this GitHub repo and HF will auto-sync on push.

5. Add Space secrets

In the Space → Settings → Variables and secrets → add as secrets:

Name	Value
`HF_TOKEN`	the token from step 2
`HF_DATASET_REPO`	`<your-username>/tdb-intake-submissions`
`HF_DATASET_BRANCH`	`main` (optional, defaults to `main`)
`ADMIN_PASSWORD`	a password to share with reviewers

The Space will restart automatically and pick up the new secrets.

6. Test

Open the Space URL → fill the form → Submit. A file lands in submissions/<trial_id>__<username>.json in the dataset repo. Submitting again with the same trial_id + username updates that file.
Open the Admin page (left sidebar) → enter password → see the submission with status pending → add a review (your name + status + comment). It appears in the review timeline and a new file lands under reviews/<submission>/. Add more reviews to build up the history.

Dataset layout

One submission file per (trial_id, username) pair — submitting again updates the same file, so a submission can be loaded back and edited. (Edit history is preserved in the dataset's git commits.) Each review is a separate file, so a submission can be reviewed many times by different people and concurrent reviews never conflict.

submissions/<trial>__<user>.json            # the submission (upserted on each submit)
reviews/<trial>__<user>/<stamp>__<rev>.json # one file per review

To edit an existing submission: on the form, enter the same trial_id + username and click Load existing submission, edit, then Submit.

Submission file (`submissions/*.json`)

{
  "submissionId": "submissions/NCT0001__jdoe.json",
  "createdAt": "2026-06-01T...",
  "updatedAt": "2026-06-04T...",
  "submittedAt": "2026-06-01T...",
  "trial_id": "NCT0001",
  "username": "jdoe",
  "comparison": {
    "trial_id": "NCT0001",
    "username": "jdoe",
    "prompts": [
      {
        "id": "P-001",
        "design_element": "Sample size and power",
        "design_element_other": "",
        "question": "Total target PFS events",
        "question_type": "derivation_required",
        "rubrics": [
          {"artifact": "output.json", "dimension": "Inputs used",      "points": "5", "criterion": "...", "tolerance": "..."},
          {"artifact": "output.json", "dimension": "Calculated value", "points": "5", "criterion": "...", "tolerance": "±5%"},
          {"artifact": "output.json", "dimension": "Method",            "points": "5", "criterion": "...", "tolerance": "..."},
          {"artifact": "output.R",    "dimension": "Reproducibility",   "points": "5", "criterion": "...", "tolerance": "..."}
        ]
      }
    ]
  }
}

Review file (`reviews/<submission>/*.json`)

{
  "submissionId": "submissions/NCT0001__jdoe__2026-06-01T...Z.json",
  "at": "2026-06-01T16:00:00+00:00",
  "reviewer": "Dr. Lee",
  "status": "needs_fix",
  "note": "still missing the power assumption"
}

The current status of a submission is derived as the most recent review's status (or pending if it has no reviews yet).

Load everything in Python

from huggingface_hub import snapshot_download
import json, glob, os

local = snapshot_download("ttt-77/tdb-intake-submissions", repo_type="dataset")

submissions = {
    os.path.basename(f)[:-5]: json.load(open(f))
    for f in glob.glob(f"{local}/submissions/*.json")
}
# reviews grouped by submission base name
reviews = {}
for f in glob.glob(f"{local}/reviews/*/*.json"):
    base = os.path.basename(os.path.dirname(f))
    reviews.setdefault(base, []).append(json.load(open(f)))
for base in reviews:
    reviews[base].sort(key=lambda r: r["at"])  # oldest first

Project structure

.
├── app.py                  # main intake form (entry point for HF Space)
├── pages/
│   └── 1_Admin.py          # admin review page (shown in sidebar)
├── lib/
│   ├── __init__.py
│   ├── schema.py           # constants, defaults, validators
│   └── storage.py          # HF Dataset I/O + local fs fallback + admin password check
├── requirements.txt
└── README.md

Privacy notes

The dataset repo should be private.
HF_TOKEN and ADMIN_PASSWORD live only in Space secrets — never commit them.
Rotate the token periodically.

Extending with Python ML libs

Adding NLP / model checks is now a few lines in lib/. Examples:

spaCy for entity extraction on submitted SAP excerpts
sentence-transformers for semantic dedup of similar questions
huggingface_hub.InferenceClient for LLM-as-judge on the criterion text
pandas directly in the admin page for batch stats / CSV export