File size: 8,622 Bytes
2aabc58
 
 
 
 
 
 
 
 
 
 
58d2e52
8a03f63
2aabc58
8a03f63
64e3b80
8a03f63
02ff7d5
 
 
 
 
2aabc58
 
 
 
23b513a
 
 
 
a9aec68
b67309b
8a03f63
 
 
 
2aabc58
 
 
8a03f63
 
64e3b80
8a03f63
2aabc58
8a03f63
64e3b80
8a03f63
64e3b80
 
 
 
 
2aabc58
8a03f63
64e3b80
8a03f63
64e3b80
2aabc58
64e3b80
8a03f63
2aabc58
 
 
 
 
 
 
 
 
8a03f63
 
2aabc58
 
8a03f63
 
2aabc58
8a03f63
2aabc58
8a03f63
2aabc58
8a03f63
2aabc58
 
 
 
 
 
8a03f63
2aabc58
8a03f63
2aabc58
8a03f63
a9aec68
eb0abff
64e3b80
eb0abff
 
a9aec68
 
 
 
 
eb0abff
 
a9aec68
 
eb0abff
 
a9aec68
 
 
089c7d1
a9aec68
64e3b80
 
 
a9aec68
 
 
64e3b80
 
2aabc58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64e3b80
 
8a03f63
a9aec68
eb0abff
 
 
b67309b
 
eb0abff
 
b67309b
 
eb0abff
 
 
b67309b
 
 
eb0abff
 
8a03f63
64e3b80
2aabc58
eb0abff
2aabc58
 
eb0abff
a9aec68
 
 
 
 
eb0abff
a9aec68
 
 
 
 
2aabc58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64e3b80
8a03f63
 
 
64e3b80
2aabc58
 
8a03f63
2aabc58
8a03f63
2aabc58
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
---
title: TDB Intake
emoji: πŸ”¬
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.39.0"
app_file: app.py
pinned: false
---

# Trial Design Benchmark β€” Intake

A Streamlit intake form for trial statisticians. Submissions are saved to a **Hugging Face Dataset** repo. An **Admin** page (in the sidebar) lets reviewers triage submissions (`pending` / `reviewed` / `needs_fix`).

## What it does

- **Reference PDF panel** β€” a wide two-column layout shows, on the left,
  open-in-new-tab links to the document's `sap.pdf` / `protocol.pdf` in the
  public `trialdesignbench/source` dataset. The entered `trial_id` is used
  directly as the document id (e.g. `10.1200_jco.22.01989`), so there's no
  ambiguous NCT→document mapping.
- **Form (`app.py`)** β€” statisticians enter `trial_id`, `username`, and a list of questions. Each question has:
  - `design_element` (dropdown β€” when "Others" is picked, a free-text input appears)
  - `question_type` (dropdown β€” `extraction_only` / `derivation_required`)
  - `question` (free text)
  - **Rubric dimensions** auto-generated by question type:
    - `extraction_only` β†’ 1 dimension on `output.json`
    - `derivation_required` β†’ 3 dimensions on `output.json`: {Inputs used, Calculated value, Method}
  - Under each dimension you can add **multiple criteria**; each criterion has its own `criterion` text, `importance` (HIGH / medium / low), and `tolerance`.
  - **Versions** β€” every Submit saves a new version. Re-enter the same `trial_id` + `username`, click **Find versions**, pick one, and **Load selected version** to pull it back into the form for editing; Submit then saves a new version.
- **Admin page (`pages/1_Admin.py`)** β€” password-gated review console. Shows **only the latest version of each trial** (one row per `trial_id` + `username`). The questionnaire is rendered in the same layout as the form (read-only). Reviewers can add reviews **per question** *and* an overall review; review history covers **all versions** (each review tagged with its version, and per-question reviews tied to their question). The trial's current status reflects the latest version's most recent overall review. Each review is its own file under `reviews/<trial>__<user>/<version>/`. (Submitters can still see and load all their own versions on the form.)

## Run locally

```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py
```

Without HF env vars set, submissions land in `./data/submissions/<...>.json` on disk β€” fine for dev.

## Deploy on Hugging Face Spaces

### 1. Create a private HF Dataset repo

- Sign in at <https://huggingface.co>
- Click your avatar β†’ **New Dataset**
- Owner: your username (e.g. `ttt-77`)
- Name: e.g. `tdb-intake-submissions`
- Visibility: **Private**
- Create. Leave it empty.

### 2. Generate an HF access token

- <https://huggingface.co/settings/tokens> β†’ **New token**
- Token type: **Write**
- Save the `hf_...` string.

### 3. Create the Space

- Click your avatar β†’ **New Space**
- Name: e.g. `tdb-intake`
- SDK: **Streamlit**
- Visibility: your choice (public works; the *form* is intended for public submission, only *data* needs to be private)
- Create β€” HF gives you a git repo URL.

### 4. Push this code to the Space

```bash
git remote add hf https://huggingface.co/spaces/<your-username>/tdb-intake
git push hf main
```

Or, in the HF Space's **Settings β†’ Repository**, link this GitHub repo and HF will auto-sync on push.

### 5. Add Space secrets

In the Space β†’ **Settings β†’ Variables and secrets** β†’ add as **secrets**:

| Name | Value |
| --- | --- |
| `HF_TOKEN` | the token from step 2 |
| `HF_DATASET_REPO` | `<your-username>/tdb-intake-submissions` |
| `HF_DATASET_BRANCH` | `main` (optional, defaults to `main`) |
| `ADMIN_PASSWORD` | a password to share with reviewers |

The Space will restart automatically and pick up the new secrets.

### 6. Test

- Open the Space URL β†’ fill the form β†’ **Submit**. A file lands in `submissions/<trial_id>__<username>/<stamp>.json` in the dataset repo. Submitting again saves another version in the same folder.
- Open the **Admin** page (left sidebar) β†’ enter password β†’ see the submission with status `pending` β†’ add a review (your name + status + comment). It appears in the review timeline and a new file lands under `reviews/<submission>/`. Add more reviews to build up the history.

## Dataset layout

Every submit saves a **new version** under a per-pair folder β€” nothing is
overwritten, so the full version history is kept and any version can be loaded
back. Each review is a **separate file** keyed to a specific version, so a
version can be reviewed many times by different people and concurrent reviews
never conflict.

```text
submissions/<trial>__<user>/<stamp>.json            # one file per version
reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json  # one file per review of that version
```

To load/edit a previous version: on the form, enter the same `trial_id` +
`username`, click **Find versions**, pick a version, click **Load selected
version**, edit, then **Submit** (which saves a new version).

### Submission file (`submissions/<trial>__<user>/<stamp>.json`)

```json
{
  "submissionId": "submissions/NCT0001__jdoe/2026-06-04T...Z.json",
  "version": "2026-06-04T...Z",
  "submittedAt": "2026-06-04T...",
  "trial_id": "NCT0001",
  "username": "jdoe",
  "comparison": {
    "trial_id": "NCT0001",
    "username": "jdoe",
    "prompts": [
      {
        "id": "P-001",
        "design_element": "Sample size and power",
        "design_element_other": "",
        "question": "Total target PFS events",
        "question_type": "derivation_required",
        "rubrics": [
          {"artifact": "output.json", "dimension": "Inputs used",      "points": "5", "criterion": "...", "tolerance": "..."},
          {"artifact": "output.json", "dimension": "Calculated value", "points": "5", "criterion": "...", "tolerance": "Β±5%"},
          {"artifact": "output.json", "dimension": "Method",            "points": "5", "criterion": "...", "tolerance": "..."},
          {"artifact": "output.R",    "dimension": "Reproducibility",   "points": "5", "criterion": "...", "tolerance": "..."}
        ]
      }
    ]
  }
}
```

### Review file (`reviews/<trial>__<user>/<stamp>/*.json`)

```json
{
  "submissionId": "submissions/NCT0001__jdoe/2026-06-04T...Z.json",
  "at": "2026-06-04T16:00:00+00:00",
  "reviewer": "Dr. Lee",
  "status": "needs_fix",
  "note": "still missing the power assumption",
  "question_id": "P-002"
}
```

`question_id` ties the review to a specific question; an empty `question_id`
means an overall (whole-version) review. The trial's **current status** is the
most recent *overall* review on the latest version (or `pending` if none).

### Load everything in Python

```python
from huggingface_hub import snapshot_download
import json, glob, os

local = snapshot_download("ttt-77/tdb-intake-submissions", repo_type="dataset")

# every version: submissions/<trial>__<user>/<stamp>.json
submissions = [json.load(open(f)) for f in glob.glob(f"{local}/submissions/*/*.json")]

# reviews: reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json
# key = "<trial>__<user>/<stamp>" (matches a submission's submissionId minus prefix/suffix)
reviews = {}
for f in glob.glob(f"{local}/reviews/*/*/*.json"):
    pair, ver = f.split("/reviews/")[1].split("/")[:2]
    reviews.setdefault(f"{pair}/{ver}", []).append(json.load(open(f)))
for key in reviews:
    reviews[key].sort(key=lambda r: r["at"])  # oldest first
```

## Project structure

```text
.
β”œβ”€β”€ app.py                  # main intake form (entry point for HF Space)
β”œβ”€β”€ pages/
β”‚   └── 1_Admin.py          # admin review page (shown in sidebar)
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ schema.py           # constants, defaults, validators
β”‚   └── storage.py          # HF Dataset I/O + local fs fallback + admin password check
β”œβ”€β”€ requirements.txt
└── README.md
```

## Privacy notes

- The dataset repo should be **private**.
- `HF_TOKEN` and `ADMIN_PASSWORD` live only in Space secrets β€” never commit them.
- Rotate the token periodically.

## Extending with Python ML libs

Adding NLP / model checks is now a few lines in `lib/`. Examples:

- `spaCy` for entity extraction on submitted SAP excerpts
- `sentence-transformers` for semantic dedup of similar questions
- `huggingface_hub.InferenceClient` for LLM-as-judge on the criterion text
- `pandas` directly in the admin page for batch stats / CSV export