tttjjj commited on
Commit
a9aec68
·
1 Parent(s): 089c7d1

Save every submit as a new version; let users pick a version to load

Browse files

- save_submission() now writes submissions/<trial>__<user>/<stamp>.json per
submit (no overwrite); full version history is kept.
- add list_versions(trial_id, username); reviews are keyed per-version under
reviews/<trial>__<user>/<stamp>/.
- Form: 'Find versions' lists all versions for a trial_id+username; a selectbox
+ 'Load selected version' pulls the chosen one in for editing.
- Admin/list_submissions treats each version as its own row with its own
reviews; README + Python loader updated for the nested layout.

Files changed (3) hide show
  1. README.md +27 -27
  2. app.py +74 -18
  3. lib/storage.py +66 -36
README.md CHANGED
@@ -23,7 +23,7 @@ A Streamlit intake form for trial statisticians. Submissions are saved to a **Hu
23
  - `extraction_only` → 1 rubric: `output.json`
24
  - `derivation_required` → 4 rubrics: `output.json` × {Inputs used, Calculated value, Method} + `output.R` × {Reproducibility}
25
  - Each rubric collects `points`, `tolerance`, `criterion`.
26
- - **Load existing submission** — re-enter the same `trial_id` + `username` and click Load to pull a previous submission back into the form, edit it, and Submit again to update.
27
  - **Admin page (`pages/1_Admin.py`)** — password-gated review console. A submission can be reviewed many times by different people: each review (status + reviewer name + comment) is written as its own file under `reviews/<submission>/`, and the page shows the full timeline. The current status is the most recent review's status.
28
 
29
  ## Run locally
@@ -85,33 +85,33 @@ The Space will restart automatically and pick up the new secrets.
85
 
86
  ### 6. Test
87
 
88
- - Open the Space URL → fill the form → **Submit**. A file lands in `submissions/<trial_id>__<username>.json` in the dataset repo. Submitting again with the same trial_id + username updates that file.
89
  - Open the **Admin** page (left sidebar) → enter password → see the submission with status `pending` → add a review (your name + status + comment). It appears in the review timeline and a new file lands under `reviews/<submission>/`. Add more reviews to build up the history.
90
 
91
  ## Dataset layout
92
 
93
- One submission file per `(trial_id, username)` pair — submitting again
94
- **updates** the same file, so a submission can be loaded back and edited.
95
- (Edit history is preserved in the dataset's git commits.) Each review is a
96
- **separate file**, so a submission can be reviewed many times by different
97
- people and concurrent reviews never conflict.
98
 
99
  ```text
100
- submissions/<trial>__<user>.json # the submission (upserted on each submit)
101
- reviews/<trial>__<user>/<stamp>__<rev>.json # one file per review
102
  ```
103
 
104
- To edit an existing submission: on the form, enter the same `trial_id` +
105
- `username` and click **Load existing submission**, edit, then **Submit**.
 
106
 
107
- ### Submission file (`submissions/*.json`)
108
 
109
  ```json
110
  {
111
- "submissionId": "submissions/NCT0001__jdoe.json",
112
- "createdAt": "2026-06-01T...",
113
- "updatedAt": "2026-06-04T...",
114
- "submittedAt": "2026-06-01T...",
115
  "trial_id": "NCT0001",
116
  "username": "jdoe",
117
  "comparison": {
@@ -136,7 +136,7 @@ To edit an existing submission: on the form, enter the same `trial_id` +
136
  }
137
  ```
138
 
139
- ### Review file (`reviews/<submission>/*.json`)
140
 
141
  ```json
142
  {
@@ -159,17 +159,17 @@ import json, glob, os
159
 
160
  local = snapshot_download("ttt-77/tdb-intake-submissions", repo_type="dataset")
161
 
162
- submissions = {
163
- os.path.basename(f)[:-5]: json.load(open(f))
164
- for f in glob.glob(f"{local}/submissions/*.json")
165
- }
166
- # reviews grouped by submission base name
167
  reviews = {}
168
- for f in glob.glob(f"{local}/reviews/*/*.json"):
169
- base = os.path.basename(os.path.dirname(f))
170
- reviews.setdefault(base, []).append(json.load(open(f)))
171
- for base in reviews:
172
- reviews[base].sort(key=lambda r: r["at"]) # oldest first
173
  ```
174
 
175
  ## Project structure
 
23
  - `extraction_only` → 1 rubric: `output.json`
24
  - `derivation_required` → 4 rubrics: `output.json` × {Inputs used, Calculated value, Method} + `output.R` × {Reproducibility}
25
  - Each rubric collects `points`, `tolerance`, `criterion`.
26
+ - **Versions** — every Submit saves a new version. Re-enter the same `trial_id` + `username`, click **Find versions**, pick one, and **Load selected version** to pull it back into the form for editing; Submit then saves a new version.
27
  - **Admin page (`pages/1_Admin.py`)** — password-gated review console. A submission can be reviewed many times by different people: each review (status + reviewer name + comment) is written as its own file under `reviews/<submission>/`, and the page shows the full timeline. The current status is the most recent review's status.
28
 
29
  ## Run locally
 
85
 
86
  ### 6. Test
87
 
88
+ - Open the Space URL → fill the form → **Submit**. A file lands in `submissions/<trial_id>__<username>/<stamp>.json` in the dataset repo. Submitting again saves another version in the same folder.
89
  - Open the **Admin** page (left sidebar) → enter password → see the submission with status `pending` → add a review (your name + status + comment). It appears in the review timeline and a new file lands under `reviews/<submission>/`. Add more reviews to build up the history.
90
 
91
  ## Dataset layout
92
 
93
+ Every submit saves a **new version** under a per-pair folder nothing is
94
+ overwritten, so the full version history is kept and any version can be loaded
95
+ back. Each review is a **separate file** keyed to a specific version, so a
96
+ version can be reviewed many times by different people and concurrent reviews
97
+ never conflict.
98
 
99
  ```text
100
+ submissions/<trial>__<user>/<stamp>.json # one file per version
101
+ reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json # one file per review of that version
102
  ```
103
 
104
+ To load/edit a previous version: on the form, enter the same `trial_id` +
105
+ `username`, click **Find versions**, pick a version, click **Load selected
106
+ version**, edit, then **Submit** (which saves a new version).
107
 
108
+ ### Submission file (`submissions/<trial>__<user>/<stamp>.json`)
109
 
110
  ```json
111
  {
112
+ "submissionId": "submissions/NCT0001__jdoe/2026-06-04T...Z.json",
113
+ "version": "2026-06-04T...Z",
114
+ "submittedAt": "2026-06-04T...",
 
115
  "trial_id": "NCT0001",
116
  "username": "jdoe",
117
  "comparison": {
 
136
  }
137
  ```
138
 
139
+ ### Review file (`reviews/<trial>__<user>/<stamp>/*.json`)
140
 
141
  ```json
142
  {
 
159
 
160
  local = snapshot_download("ttt-77/tdb-intake-submissions", repo_type="dataset")
161
 
162
+ # every version: submissions/<trial>__<user>/<stamp>.json
163
+ submissions = [json.load(open(f)) for f in glob.glob(f"{local}/submissions/*/*.json")]
164
+
165
+ # reviews: reviews/<trial>__<user>/<stamp>/<revstamp>__<rev>.json
166
+ # key = "<trial>__<user>/<stamp>" (matches a submission's submissionId minus prefix/suffix)
167
  reviews = {}
168
+ for f in glob.glob(f"{local}/reviews/*/*/*.json"):
169
+ pair, ver = f.split("/reviews/")[1].split("/")[:2]
170
+ reviews.setdefault(f"{pair}/{ver}", []).append(json.load(open(f)))
171
+ for key in reviews:
172
+ reviews[key].sort(key=lambda r: r["at"]) # oldest first
173
  ```
174
 
175
  ## Project structure
app.py CHANGED
@@ -22,7 +22,12 @@ from lib.schema import (
22
  next_question_id,
23
  rubrics_for_type,
24
  )
25
- from lib.storage import get_submission_by_key, hf_configured, save_submission
 
 
 
 
 
26
 
27
  st.set_page_config(
28
  page_title="TDB Intake",
@@ -44,6 +49,9 @@ if "last_result" not in st.session_state:
44
  # per-question widgets get fresh keys and actually show the new values.
45
  if "form_nonce" not in st.session_state:
46
  st.session_state.form_nonce = 0
 
 
 
47
 
48
 
49
  # ------------- callbacks -------------------------------------------------
@@ -70,36 +78,59 @@ def _save_draft() -> None:
70
  st.session_state.last_result = {"kind": "draft", "msg": "Draft saved in this browser session."}
71
 
72
 
73
- def _load() -> None:
74
- """Load an existing submission by (trial_id, username) into the form."""
75
  trial_id = st.session_state.trial_id.strip()
76
  username = st.session_state.username.strip()
77
  if not trial_id or not username:
 
78
  st.session_state.last_result = {
79
  "kind": "error",
80
- "msg": "Enter trial_id and username, then click Load.",
81
  }
82
  return
83
  try:
84
- record = get_submission_by_key(trial_id, username)
85
  except Exception as e:
86
- st.session_state.last_result = {"kind": "error", "msg": f"Load failed: {e}"}
 
87
  return
88
- if not record:
 
89
  st.session_state.last_result = {
90
  "kind": "info",
91
- "msg": f"No existing submission for `{trial_id}` / `{username}`. "
92
- "Add questions and Submit to create one.",
93
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  return
95
  prompts = (record.get("comparison") or {}).get("prompts") or []
96
  st.session_state.questions = prompts
97
  st.session_state.form_nonce += 1 # force question widgets to refresh
98
- updated = record.get("updatedAt") or record.get("submittedAt") or ""
99
  st.session_state.last_result = {
100
  "kind": "success",
101
- "msg": f"Loaded {len(prompts)} question(s) (last updated {updated}). "
102
- "Edit and Submit to update.",
103
  }
104
 
105
 
@@ -117,13 +148,17 @@ def _submit() -> None:
117
  }
118
  try:
119
  result = save_submission(trial_id, username, comparison)
120
- verb = "Updated" if result.get("updated") else "Submitted"
121
  st.session_state.last_result = {
122
  "kind": "success",
123
- "msg": f"{verb}: `{result['submissionId']}`. "
124
- "You can keep editing and Submit again to update.",
125
  "url": result.get("url"),
126
  }
 
 
 
 
 
127
  # Keep the form populated so the user can continue editing.
128
  except Exception as e:
129
  st.session_state.last_result = {"kind": "error", "msg": f"Submit failed: {e}"}
@@ -149,11 +184,32 @@ with c2:
149
  st.text_input("username", key="username", placeholder="e.g., jdoe")
150
 
151
  st.button(
152
- "Load existing submission",
153
- on_click=_load,
154
- help="If you already submitted for this trial_id + username, load it back to edit.",
155
  )
156
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  st.divider()
158
 
159
  # ------------- questions list --------------------------------------------
 
22
  next_question_id,
23
  rubrics_for_type,
24
  )
25
+ from lib.storage import (
26
+ get_submission,
27
+ hf_configured,
28
+ list_versions,
29
+ save_submission,
30
+ )
31
 
32
  st.set_page_config(
33
  page_title="TDB Intake",
 
49
  # per-question widgets get fresh keys and actually show the new values.
50
  if "form_nonce" not in st.session_state:
51
  st.session_state.form_nonce = 0
52
+ # Versions found for the current trial_id + username (after "Find versions").
53
+ if "versions" not in st.session_state:
54
+ st.session_state.versions = []
55
 
56
 
57
  # ------------- callbacks -------------------------------------------------
 
78
  st.session_state.last_result = {"kind": "draft", "msg": "Draft saved in this browser session."}
79
 
80
 
81
+ def _find_versions() -> None:
82
+ """Look up all saved versions for the current trial_id + username."""
83
  trial_id = st.session_state.trial_id.strip()
84
  username = st.session_state.username.strip()
85
  if not trial_id or not username:
86
+ st.session_state.versions = []
87
  st.session_state.last_result = {
88
  "kind": "error",
89
+ "msg": "Enter trial_id and username, then click Find versions.",
90
  }
91
  return
92
  try:
93
+ versions = list_versions(trial_id, username)
94
  except Exception as e:
95
+ st.session_state.versions = []
96
+ st.session_state.last_result = {"kind": "error", "msg": f"Lookup failed: {e}"}
97
  return
98
+ st.session_state.versions = versions
99
+ if not versions:
100
  st.session_state.last_result = {
101
  "kind": "info",
102
+ "msg": f"No versions yet for `{trial_id}` / `{username}`. "
103
+ "Add questions and Submit to create the first one.",
104
  }
105
+ else:
106
+ st.session_state.last_result = {
107
+ "kind": "success",
108
+ "msg": f"Found {len(versions)} version(s). Pick one below and click "
109
+ "“Load selected version”.",
110
+ }
111
+
112
+
113
+ def _load_selected() -> None:
114
+ """Load the version chosen in the version selectbox into the form."""
115
+ sub_id = st.session_state.get("version_select")
116
+ if not sub_id:
117
+ st.session_state.last_result = {"kind": "error", "msg": "Pick a version first."}
118
+ return
119
+ try:
120
+ record = get_submission(sub_id)
121
+ except Exception as e:
122
+ st.session_state.last_result = {"kind": "error", "msg": f"Load failed: {e}"}
123
+ return
124
+ if not record:
125
+ st.session_state.last_result = {"kind": "error", "msg": "That version could not be loaded."}
126
  return
127
  prompts = (record.get("comparison") or {}).get("prompts") or []
128
  st.session_state.questions = prompts
129
  st.session_state.form_nonce += 1 # force question widgets to refresh
 
130
  st.session_state.last_result = {
131
  "kind": "success",
132
+ "msg": f"Loaded version {record.get('version', '')} "
133
+ f"({len(prompts)} question(s)). Edit and Submit to save a new version.",
134
  }
135
 
136
 
 
148
  }
149
  try:
150
  result = save_submission(trial_id, username, comparison)
 
151
  st.session_state.last_result = {
152
  "kind": "success",
153
+ "msg": f"Saved as new version `{result['version']}`. "
154
+ "Use “Find versions” to see all versions.",
155
  "url": result.get("url"),
156
  }
157
+ # Refresh the version list so the new version shows up.
158
+ try:
159
+ st.session_state.versions = list_versions(trial_id, username)
160
+ except Exception:
161
+ pass
162
  # Keep the form populated so the user can continue editing.
163
  except Exception as e:
164
  st.session_state.last_result = {"kind": "error", "msg": f"Submit failed: {e}"}
 
184
  st.text_input("username", key="username", placeholder="e.g., jdoe")
185
 
186
  st.button(
187
+ "Find versions",
188
+ on_click=_find_versions,
189
+ help="List all previously submitted versions for this trial_id + username.",
190
  )
191
 
192
+ versions = st.session_state.versions
193
+ if versions:
194
+ options = [v["submissionId"] for v in versions]
195
+ labels = {
196
+ v["submissionId"]: f"{v['submittedAt']} · v{v['version']} · "
197
+ f"{v['num_questions']} question(s)"
198
+ for v in versions
199
+ }
200
+ vc1, vc2 = st.columns([3, 1])
201
+ with vc1:
202
+ st.selectbox(
203
+ "Select a version to load",
204
+ options=options,
205
+ format_func=lambda sid: labels.get(sid, sid),
206
+ key="version_select",
207
+ )
208
+ with vc2:
209
+ st.write("")
210
+ st.write("")
211
+ st.button("Load selected version", on_click=_load_selected, use_container_width=True)
212
+
213
  st.divider()
214
 
215
  # ------------- questions list --------------------------------------------
lib/storage.py CHANGED
@@ -59,7 +59,27 @@ def _stamp(iso: Optional[str] = None) -> str:
59
  return (iso or _now_iso()).replace(":", "-").replace(".", "-")
60
 
61
 
 
 
 
 
 
62
  def _base_id(submission_id: str) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  """'submissions/foo.json' -> 'foo'"""
64
  name = submission_id.split("/")[-1]
65
  return name[:-5] if name.endswith(".json") else name
@@ -134,56 +154,66 @@ def _all_files() -> List[str]:
134
 
135
  # ---- public API ----------------------------------------------------------
136
 
137
- def submission_id_for(trial_id: str, username: str) -> str:
138
- """Stable submission id (path) for a (trial_id, username) pair.
139
-
140
- One submission per pair — submitting again updates the same file, so a
141
- submission can be loaded back and edited.
142
- """
143
- return f"{SUBMISSIONS_PREFIX}/{_safe(trial_id)}__{_safe(username)}.json"
144
-
145
-
146
- def get_submission_by_key(trial_id: str, username: str) -> Optional[Dict[str, Any]]:
147
- """Load an existing submission by (trial_id, username), or None."""
148
- return get_submission(submission_id_for(trial_id, username))
149
-
150
-
151
  def save_submission(trial_id: str, username: str, comparison: Dict[str, Any]) -> Dict[str, Any]:
152
- """Create or update the submission for (trial_id, username).
153
 
154
- If a submission already exists for this pair, it is updated in place
155
- (createdAt is preserved); otherwise a new one is created.
 
156
  """
157
- submission_id = submission_id_for(trial_id, username)
158
  now = _now_iso()
159
- existing = get_submission(submission_id)
160
- created_at = (existing or {}).get("createdAt") or (existing or {}).get("submittedAt") or now
161
- is_update = existing is not None
162
-
163
  record = {
164
  "submissionId": submission_id,
165
- "createdAt": created_at,
166
- "updatedAt": now,
167
- # kept for backward compatibility with older records / admin display
168
- "submittedAt": created_at,
169
  "trial_id": trial_id,
170
  "username": username,
171
  "comparison": comparison,
172
  }
173
- verb = "Update" if is_update else "Add"
174
- _write_json(submission_id, record, f"{verb} submission: {trial_id} — {username}")
 
 
 
175
  url = (
176
  f"https://huggingface.co/datasets/{HF_DATASET_REPO}"
177
  f"/blob/{HF_DATASET_BRANCH}/{submission_id}"
178
  if hf_configured
179
  else None
180
  )
181
- return {
182
- "submissionId": submission_id,
183
- "url": url,
184
- "record": record,
185
- "updated": is_update,
186
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
187
 
188
 
189
  def add_review(submission_id: str, status: str, reviewer: str, note: str = "") -> Dict[str, Any]:
@@ -245,8 +275,8 @@ def list_submissions() -> List[Dict[str, Any]]:
245
  "submissionId": sp,
246
  "trial_id": sub.get("trial_id", ""),
247
  "username": sub.get("username", ""),
 
248
  "submittedAt": sub.get("submittedAt", ""),
249
- "updatedAt": sub.get("updatedAt", sub.get("submittedAt", "")),
250
  "status": latest["status"] if latest else "pending",
251
  "reviewedAt": latest["at"] if latest else "",
252
  "reviewer": latest["reviewer"] if latest else "",
@@ -255,7 +285,7 @@ def list_submissions() -> List[Dict[str, Any]]:
255
  "submission": sub,
256
  }
257
  )
258
- result.sort(key=lambda r: r.get("updatedAt", ""), reverse=True)
259
  return result
260
 
261
 
 
59
  return (iso or _now_iso()).replace(":", "-").replace(".", "-")
60
 
61
 
62
+ def _pair_dir(trial_id: str, username: str) -> str:
63
+ """Folder holding all versions for a (trial_id, username) pair."""
64
+ return f"{SUBMISSIONS_PREFIX}/{_safe(trial_id)}__{_safe(username)}"
65
+
66
+
67
  def _base_id(submission_id: str) -> str:
68
+ """Path of a submission relative to the submissions/ prefix, without .json.
69
+
70
+ 'submissions/NCT99__jdoe/2026-...json' -> 'NCT99__jdoe/2026-...'
71
+ Used to key the matching reviews/ folder so reviews stay grouped per
72
+ (pair, version).
73
+ """
74
+ s = submission_id
75
+ if s.startswith(f"{SUBMISSIONS_PREFIX}/"):
76
+ s = s[len(SUBMISSIONS_PREFIX) + 1 :]
77
+ if s.endswith(".json"):
78
+ s = s[:-5]
79
+ return s
80
+
81
+
82
+ def _legacy_base_id(submission_id: str) -> str:
83
  """'submissions/foo.json' -> 'foo'"""
84
  name = submission_id.split("/")[-1]
85
  return name[:-5] if name.endswith(".json") else name
 
154
 
155
  # ---- public API ----------------------------------------------------------
156
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  def save_submission(trial_id: str, username: str, comparison: Dict[str, Any]) -> Dict[str, Any]:
158
+ """Save a NEW version for (trial_id, username).
159
 
160
+ Every submit creates a new version file under
161
+ submissions/<trial>__<user>/<stamp>.json nothing is overwritten, so the
162
+ full version history is kept and any version can be loaded back.
163
  """
 
164
  now = _now_iso()
165
+ version = _stamp(now)
166
+ submission_id = f"{_pair_dir(trial_id, username)}/{version}.json"
 
 
167
  record = {
168
  "submissionId": submission_id,
169
+ "version": version,
170
+ "submittedAt": now,
 
 
171
  "trial_id": trial_id,
172
  "username": username,
173
  "comparison": comparison,
174
  }
175
+ _write_json(
176
+ submission_id,
177
+ record,
178
+ f"Add submission: {trial_id} — {username} ({version})",
179
+ )
180
  url = (
181
  f"https://huggingface.co/datasets/{HF_DATASET_REPO}"
182
  f"/blob/{HF_DATASET_BRANCH}/{submission_id}"
183
  if hf_configured
184
  else None
185
  )
186
+ return {"submissionId": submission_id, "url": url, "record": record, "version": version}
187
+
188
+
189
+ def list_versions(
190
+ trial_id: str, username: str, all_files: Optional[List[str]] = None
191
+ ) -> List[Dict[str, Any]]:
192
+ """All saved versions for (trial_id, username), newest first.
193
+
194
+ Each item: submissionId, version, submittedAt, num_questions.
195
+ """
196
+ prefix = f"{_pair_dir(trial_id, username)}/"
197
+ files = all_files if all_files is not None else _all_files()
198
+ paths = sorted(
199
+ (f for f in files if f.startswith(prefix) and f.endswith(".json")),
200
+ reverse=True,
201
+ )
202
+ out: List[Dict[str, Any]] = []
203
+ for p in paths:
204
+ rec = _read_json(p)
205
+ if not rec:
206
+ continue
207
+ prompts = (rec.get("comparison") or {}).get("prompts") or []
208
+ out.append(
209
+ {
210
+ "submissionId": p,
211
+ "version": rec.get("version", ""),
212
+ "submittedAt": rec.get("submittedAt", ""),
213
+ "num_questions": len(prompts),
214
+ }
215
+ )
216
+ return out
217
 
218
 
219
  def add_review(submission_id: str, status: str, reviewer: str, note: str = "") -> Dict[str, Any]:
 
275
  "submissionId": sp,
276
  "trial_id": sub.get("trial_id", ""),
277
  "username": sub.get("username", ""),
278
+ "version": sub.get("version", ""),
279
  "submittedAt": sub.get("submittedAt", ""),
 
280
  "status": latest["status"] if latest else "pending",
281
  "reviewedAt": latest["at"] if latest else "",
282
  "reviewer": latest["reviewer"] if latest else "",
 
285
  "submission": sub,
286
  }
287
  )
288
+ result.sort(key=lambda r: r.get("submittedAt", ""), reverse=True)
289
  return result
290
 
291