tttjjj commited on
Commit
23b513a
·
1 Parent(s): 322e8c4

Rubrics: drop Reproducibility, multiple criteria per dimension, importance

Browse files

- derivation_required now has 3 dimensions on output.json (Inputs used,
Calculated value, Method) — Reproducibility/output.R removed.
- Each dimension holds multiple criteria; each criterion has its own
criterion text, importance (HIGH/medium/low), and tolerance ('points'
replaced by an importance selectbox).
- Stable criterion ids (cid) keep widget state correct across add/remove.
- Admin renders criteria read-only; content hash + load handle the new shape
(with backward-compat for old single-criterion records).

Files changed (4) hide show
  1. README.md +4 -4
  2. app.py +119 -33
  3. lib/schema.py +28 -21
  4. pages/1_Admin.py +20 -20
README.md CHANGED
@@ -24,10 +24,10 @@ A Streamlit intake form for trial statisticians. Submissions are saved to a **Hu
24
  - `design_element` (dropdown — when "Others" is picked, a free-text input appears)
25
  - `question_type` (dropdown — `extraction_only` / `derivation_required`)
26
  - `question` (free text)
27
- - **Rubrics** auto-generated by question type:
28
- - `extraction_only` → 1 rubric: `output.json`
29
- - `derivation_required` → 4 rubrics: `output.json` × {Inputs used, Calculated value, Method} + `output.R` × {Reproducibility}
30
- - Each rubric collects `points`, `tolerance`, `criterion`.
31
  - **Versions** — every Submit saves a new version. Re-enter the same `trial_id` + `username`, click **Find versions**, pick one, and **Load selected version** to pull it back into the form for editing; Submit then saves a new version.
32
  - **Admin page (`pages/1_Admin.py`)** — password-gated review console. Shows **only the latest version of each trial** (one row per `trial_id` + `username`). The questionnaire is rendered in the same layout as the form (read-only). Reviewers can add reviews **per question** *and* an overall review; review history covers **all versions** (each review tagged with its version, and per-question reviews tied to their question). The trial's current status reflects the latest version's most recent overall review. Each review is its own file under `reviews/<trial>__<user>/<version>/`. (Submitters can still see and load all their own versions on the form.)
33
 
 
24
  - `design_element` (dropdown — when "Others" is picked, a free-text input appears)
25
  - `question_type` (dropdown — `extraction_only` / `derivation_required`)
26
  - `question` (free text)
27
+ - **Rubric dimensions** auto-generated by question type:
28
+ - `extraction_only` → 1 dimension on `output.json`
29
+ - `derivation_required` → 3 dimensions on `output.json`: {Inputs used, Calculated value, Method}
30
+ - Under each dimension you can add **multiple criteria**; each criterion has its own `criterion` text, `importance` (HIGH / medium / low), and `tolerance`.
31
  - **Versions** — every Submit saves a new version. Re-enter the same `trial_id` + `username`, click **Find versions**, pick one, and **Load selected version** to pull it back into the form for editing; Submit then saves a new version.
32
  - **Admin page (`pages/1_Admin.py`)** — password-gated review console. Shows **only the latest version of each trial** (one row per `trial_id` + `username`). The questionnaire is rendered in the same layout as the form (read-only). Reviewers can add reviews **per question** *and* an overall review; review history covers **all versions** (each review tagged with its version, and per-question reviews tied to their question). The trial's current status reflects the latest version's most recent overall review. Each review is its own file under `reviews/<trial>__<user>/<version>/`. (Submitters can still see and load all their own versions on the form.)
33
 
app.py CHANGED
@@ -22,7 +22,12 @@ import json
22
 
23
  import streamlit as st
24
 
25
- from lib.schema import DESIGN_ELEMENTS, QUESTION_TYPES, rubrics_for_type
 
 
 
 
 
26
  from lib.storage import (
27
  get_submission,
28
  hf_configured,
@@ -51,8 +56,9 @@ def kq(uid: int, field: str) -> str:
51
  return f"q{uid}_{field}"
52
 
53
 
54
- def kr(uid: int, j: int, field: str) -> str:
55
- return f"q{uid}_r{j}_{field}"
 
56
 
57
 
58
  def _next_uid() -> int:
@@ -60,6 +66,11 @@ def _next_uid() -> int:
60
  return st.session_state.uid_counter
61
 
62
 
 
 
 
 
 
63
  def _next_question_id() -> str:
64
  nums = []
65
  for q in st.session_state.questions:
@@ -78,6 +89,8 @@ if "questions" not in st.session_state:
78
  st.session_state.questions = []
79
  if "uid_counter" not in st.session_state:
80
  st.session_state.uid_counter = 0
 
 
81
  if "trial_id" not in st.session_state:
82
  st.session_state.trial_id = ""
83
  if "username" not in st.session_state:
@@ -105,25 +118,46 @@ def _remove_question(idx: int) -> None:
105
  st.session_state.questions.pop(idx)
106
 
107
 
 
 
 
 
 
108
  def _on_type_change(uid: int) -> None:
109
- """Regenerate rubric structure (and seed blank value fields) on type change."""
110
  qt = st.session_state.get(kq(uid, "qt"), "")
111
  q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
112
  if q is None:
113
  return
114
- # Clear old rubric value fields.
115
- for j in range(len(q["rubrics"])):
116
- for f in ("points", "tolerance", "criterion"):
117
- st.session_state.pop(kr(uid, j, f), None)
118
- # New structure (artifact + dimension fixed by type).
119
- new_rubrics = [
120
- {"artifact": r["artifact"], "dimension": r["dimension"]}
121
- for r in rubrics_for_type(qt)
122
- ]
 
 
123
  q["rubrics"] = new_rubrics
124
- for j in range(len(new_rubrics)):
125
- for f in ("points", "tolerance", "criterion"):
126
- st.session_state[kr(uid, j, f)] = ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
 
128
 
129
  def _build_prompts() -> list:
@@ -145,9 +179,16 @@ def _build_prompts() -> list:
145
  {
146
  "artifact": rub["artifact"],
147
  "dimension": rub["dimension"],
148
- "points": st.session_state.get(kr(uid, j, "points"), ""),
149
- "tolerance": st.session_state.get(kr(uid, j, "tolerance"), ""),
150
- "criterion": st.session_state.get(kr(uid, j, "criterion"), ""),
 
 
 
 
 
 
 
151
  }
152
  for j, rub in enumerate(q["rubrics"])
153
  ],
@@ -210,20 +251,39 @@ def _load_selected() -> None:
210
  new_questions = []
211
  for qp in prompts:
212
  uid = _next_uid()
213
- rubrics = [
214
- {"artifact": r.get("artifact", ""), "dimension": r.get("dimension", "")}
215
- for r in (qp.get("rubrics") or [])
216
- ]
217
- new_questions.append({"_uid": uid, "rubrics": rubrics})
218
  st.session_state[kq(uid, "id")] = qp.get("id", "")
219
  st.session_state[kq(uid, "de")] = qp.get("design_element", "")
220
  st.session_state[kq(uid, "deother")] = qp.get("design_element_other", "")
221
  st.session_state[kq(uid, "qt")] = qp.get("question_type", "")
222
  st.session_state[kq(uid, "question")] = qp.get("question", "")
 
 
223
  for j, r in enumerate(qp.get("rubrics") or []):
224
- st.session_state[kr(uid, j, "points")] = r.get("points", "")
225
- st.session_state[kr(uid, j, "tolerance")] = r.get("tolerance", "")
226
- st.session_state[kr(uid, j, "criterion")] = r.get("criterion", "")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
227
 
228
  st.session_state.questions = new_questions
229
  st.session_state.loaded_version = record.get("version", "")
@@ -386,12 +446,38 @@ def _questions_fragment() -> None:
386
  meta_parts.append(f"**Dimension:** {rub['dimension']}")
387
  st.markdown(" · ".join(meta_parts))
388
 
389
- rc1, rc2 = st.columns(2)
390
- with rc1:
391
- st.text_input("points", key=kr(uid, j, "points"))
392
- with rc2:
393
- st.text_input("tolerance", key=kr(uid, j, "tolerance"))
394
- st.text_area("criterion", key=kr(uid, j, "criterion"), height=80)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
395
 
396
  st.button("+ Add question", on_click=_add_question)
397
 
 
22
 
23
  import streamlit as st
24
 
25
+ from lib.schema import (
26
+ DESIGN_ELEMENTS,
27
+ IMPORTANCE_OPTIONS,
28
+ QUESTION_TYPES,
29
+ dimensions_for_type,
30
+ )
31
  from lib.storage import (
32
  get_submission,
33
  hf_configured,
 
56
  return f"q{uid}_{field}"
57
 
58
 
59
+ def kc(uid: int, j: int, cid: int, field: str) -> str:
60
+ """Key for a criterion field: question uid, dimension index j, criterion id."""
61
+ return f"q{uid}_r{j}_c{cid}_{field}"
62
 
63
 
64
  def _next_uid() -> int:
 
66
  return st.session_state.uid_counter
67
 
68
 
69
+ def _next_cid() -> int:
70
+ st.session_state.cid_counter += 1
71
+ return st.session_state.cid_counter
72
+
73
+
74
  def _next_question_id() -> str:
75
  nums = []
76
  for q in st.session_state.questions:
 
89
  st.session_state.questions = []
90
  if "uid_counter" not in st.session_state:
91
  st.session_state.uid_counter = 0
92
+ if "cid_counter" not in st.session_state:
93
+ st.session_state.cid_counter = 0
94
  if "trial_id" not in st.session_state:
95
  st.session_state.trial_id = ""
96
  if "username" not in st.session_state:
 
118
  st.session_state.questions.pop(idx)
119
 
120
 
121
+ def _clear_criterion_keys(uid: int, j: int, cid: int) -> None:
122
+ for f in ("criterion", "importance", "tolerance"):
123
+ st.session_state.pop(kc(uid, j, cid, f), None)
124
+
125
+
126
  def _on_type_change(uid: int) -> None:
127
+ """Rebuild the dimension blocks (each with one starter criterion) on type change."""
128
  qt = st.session_state.get(kq(uid, "qt"), "")
129
  q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
130
  if q is None:
131
  return
132
+ # Clear all existing criterion fields.
133
+ for j, rub in enumerate(q["rubrics"]):
134
+ for cid in rub.get("criteria", []):
135
+ _clear_criterion_keys(uid, j, cid)
136
+ # New dimension blocks (artifact + dimension fixed by type), one criterion each.
137
+ new_rubrics = []
138
+ for dim in dimensions_for_type(qt):
139
+ cid = _next_cid()
140
+ new_rubrics.append(
141
+ {"artifact": dim["artifact"], "dimension": dim["dimension"], "criteria": [cid]}
142
+ )
143
  q["rubrics"] = new_rubrics
144
+
145
+
146
+ def _add_criterion(uid: int, j: int) -> None:
147
+ q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
148
+ if q is None or j >= len(q["rubrics"]):
149
+ return
150
+ q["rubrics"][j]["criteria"].append(_next_cid())
151
+
152
+
153
+ def _remove_criterion(uid: int, j: int, cid: int) -> None:
154
+ q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
155
+ if q is None or j >= len(q["rubrics"]):
156
+ return
157
+ crits = q["rubrics"][j]["criteria"]
158
+ if cid in crits:
159
+ crits.remove(cid)
160
+ _clear_criterion_keys(uid, j, cid)
161
 
162
 
163
  def _build_prompts() -> list:
 
179
  {
180
  "artifact": rub["artifact"],
181
  "dimension": rub["dimension"],
182
+ "criteria": [
183
+ {
184
+ "criterion": st.session_state.get(kc(uid, j, cid, "criterion"), ""),
185
+ "importance": st.session_state.get(
186
+ kc(uid, j, cid, "importance"), IMPORTANCE_OPTIONS[0]
187
+ ),
188
+ "tolerance": st.session_state.get(kc(uid, j, cid, "tolerance"), ""),
189
+ }
190
+ for cid in rub.get("criteria", [])
191
+ ],
192
  }
193
  for j, rub in enumerate(q["rubrics"])
194
  ],
 
251
  new_questions = []
252
  for qp in prompts:
253
  uid = _next_uid()
 
 
 
 
 
254
  st.session_state[kq(uid, "id")] = qp.get("id", "")
255
  st.session_state[kq(uid, "de")] = qp.get("design_element", "")
256
  st.session_state[kq(uid, "deother")] = qp.get("design_element_other", "")
257
  st.session_state[kq(uid, "qt")] = qp.get("question_type", "")
258
  st.session_state[kq(uid, "question")] = qp.get("question", "")
259
+
260
+ rubrics = []
261
  for j, r in enumerate(qp.get("rubrics") or []):
262
+ # New format has r["criteria"]; old format had a single
263
+ # points/criterion/tolerance on the rubric itself.
264
+ saved_crits = r.get("criteria")
265
+ if saved_crits is None:
266
+ saved_crits = [
267
+ {
268
+ "criterion": r.get("criterion", ""),
269
+ "importance": IMPORTANCE_OPTIONS[0],
270
+ "tolerance": r.get("tolerance", ""),
271
+ }
272
+ ]
273
+ cids = []
274
+ for c in saved_crits:
275
+ cid = _next_cid()
276
+ cids.append(cid)
277
+ st.session_state[kc(uid, j, cid, "criterion")] = c.get("criterion", "")
278
+ imp = c.get("importance", "")
279
+ st.session_state[kc(uid, j, cid, "importance")] = (
280
+ imp if imp in IMPORTANCE_OPTIONS else IMPORTANCE_OPTIONS[0]
281
+ )
282
+ st.session_state[kc(uid, j, cid, "tolerance")] = c.get("tolerance", "")
283
+ rubrics.append(
284
+ {"artifact": r.get("artifact", ""), "dimension": r.get("dimension", ""), "criteria": cids}
285
+ )
286
+ new_questions.append({"_uid": uid, "rubrics": rubrics})
287
 
288
  st.session_state.questions = new_questions
289
  st.session_state.loaded_version = record.get("version", "")
 
446
  meta_parts.append(f"**Dimension:** {rub['dimension']}")
447
  st.markdown(" · ".join(meta_parts))
448
 
449
+ criteria = rub.get("criteria", [])
450
+ for ci, cid in enumerate(criteria):
451
+ st.text_area(
452
+ f"criterion {ci + 1}",
453
+ key=kc(uid, j, cid, "criterion"),
454
+ height=70,
455
+ )
456
+ cc1, cc2, cc3 = st.columns([2, 2, 1])
457
+ with cc1:
458
+ st.selectbox(
459
+ "importance",
460
+ options=IMPORTANCE_OPTIONS,
461
+ key=kc(uid, j, cid, "importance"),
462
+ )
463
+ with cc2:
464
+ st.text_input("tolerance", key=kc(uid, j, cid, "tolerance"))
465
+ with cc3:
466
+ st.write("")
467
+ st.write("")
468
+ st.button(
469
+ "✕",
470
+ key=f"rmc_{uid}_{j}_{cid}",
471
+ help="Remove this criterion",
472
+ on_click=_remove_criterion,
473
+ args=(uid, j, cid),
474
+ )
475
+ st.button(
476
+ "+ Add criterion",
477
+ key=f"addc_{uid}_{j}",
478
+ on_click=_add_criterion,
479
+ args=(uid, j),
480
+ )
481
 
482
  st.button("+ Add question", on_click=_add_question)
483
 
lib/schema.py CHANGED
@@ -22,13 +22,21 @@ Status = Literal["pending", "reviewed", "needs_fix"]
22
 
23
  VALID_STATUSES: List[str] = ["pending", "reviewed", "needs_fix"]
24
 
 
 
 
 
 
 
 
 
 
25
 
26
  class Rubric(TypedDict):
 
27
  artifact: str
28
  dimension: str
29
- points: str
30
- criterion: str
31
- tolerance: str
32
 
33
 
34
  class Question(TypedDict):
@@ -40,25 +48,16 @@ class Question(TypedDict):
40
  rubrics: List[Rubric]
41
 
42
 
43
- def blank_rubric(artifact: str = "", dimension: str = "") -> Rubric:
44
- return {
45
- "artifact": artifact,
46
- "dimension": dimension,
47
- "points": "",
48
- "criterion": "",
49
- "tolerance": "",
50
- }
51
-
52
-
53
- def rubrics_for_type(qt: str) -> List[Rubric]:
54
  if qt == "extraction_only":
55
- return [blank_rubric("output.json", "")]
56
  if qt == "derivation_required":
57
  return [
58
- blank_rubric("output.json", "Inputs used"),
59
- blank_rubric("output.json", "Calculated value"),
60
- blank_rubric("output.json", "Method"),
61
- blank_rubric("output.R", "Reproducibility"),
62
  ]
63
  return []
64
 
@@ -100,8 +99,16 @@ def question_content_hash(q: dict) -> str:
100
  "question_type": q.get("question_type", ""),
101
  "rubrics": [
102
  {
103
- k: r.get(k, "")
104
- for k in ("artifact", "dimension", "points", "criterion", "tolerance")
 
 
 
 
 
 
 
 
105
  }
106
  for r in (q.get("rubrics") or [])
107
  ],
 
22
 
23
  VALID_STATUSES: List[str] = ["pending", "reviewed", "needs_fix"]
24
 
25
+ # Each criterion's importance (replaces the old numeric "points").
26
+ IMPORTANCE_OPTIONS: List[str] = ["HIGH", "medium", "low"]
27
+
28
+
29
+ class Criterion(TypedDict):
30
+ criterion: str
31
+ importance: str
32
+ tolerance: str
33
+
34
 
35
  class Rubric(TypedDict):
36
+ # A dimension block; holds one or more criteria.
37
  artifact: str
38
  dimension: str
39
+ criteria: List[Criterion]
 
 
40
 
41
 
42
  class Question(TypedDict):
 
48
  rubrics: List[Rubric]
49
 
50
 
51
+ def dimensions_for_type(qt: str):
52
+ """Fixed (artifact, dimension) blocks for a question type. The user fills in
53
+ one or more criteria under each."""
 
 
 
 
 
 
 
 
54
  if qt == "extraction_only":
55
+ return [{"artifact": "output.json", "dimension": ""}]
56
  if qt == "derivation_required":
57
  return [
58
+ {"artifact": "output.json", "dimension": "Inputs used"},
59
+ {"artifact": "output.json", "dimension": "Calculated value"},
60
+ {"artifact": "output.json", "dimension": "Method"},
 
61
  ]
62
  return []
63
 
 
99
  "question_type": q.get("question_type", ""),
100
  "rubrics": [
101
  {
102
+ "artifact": r.get("artifact", ""),
103
+ "dimension": r.get("dimension", ""),
104
+ "criteria": [
105
+ {
106
+ "criterion": c.get("criterion", ""),
107
+ "importance": c.get("importance", ""),
108
+ "tolerance": c.get("tolerance", ""),
109
+ }
110
+ for c in (r.get("criteria") or [])
111
+ ],
112
  }
113
  for r in (q.get("rubrics") or [])
114
  ],
pages/1_Admin.py CHANGED
@@ -97,26 +97,26 @@ def render_questions(
97
  rubrics = q.get("rubrics") or []
98
  if rubrics:
99
  st.markdown(f"**Rubrics ({len(rubrics)})**")
100
- for j, r in enumerate(rubrics):
101
- with st.container(border=True):
102
- meta = f"**Artifact:** `{r.get('artifact', '')}`"
103
- if r.get("dimension"):
104
- meta += f" · **Dimension:** {r['dimension']}"
105
- st.markdown(meta)
106
- rc1, rc2 = st.columns(2)
107
- with rc1:
108
- st.text_input(
109
- "points", value=r.get("points", ""), disabled=True,
110
- key=f"pt_{submission_id}_{qid}_{j}",
111
- )
112
- with rc2:
113
- st.text_input(
114
- "tolerance", value=r.get("tolerance", ""), disabled=True,
115
- key=f"to_{submission_id}_{qid}_{j}",
116
- )
117
- st.text_area(
118
- "criterion", value=r.get("criterion", ""), disabled=True,
119
- key=f"cr_{submission_id}_{qid}_{j}", height=70,
120
  )
121
 
122
  # ---- modified-since-last-review flag ----
 
97
  rubrics = q.get("rubrics") or []
98
  if rubrics:
99
  st.markdown(f"**Rubrics ({len(rubrics)})**")
100
+ for r in rubrics:
101
+ meta = f"**Artifact:** `{r.get('artifact', '')}`"
102
+ if r.get("dimension"):
103
+ meta += f" · **Dimension:** {r['dimension']}"
104
+ st.markdown(meta)
105
+ # New format: list of criteria; old format: single fields.
106
+ criteria = r.get("criteria")
107
+ if criteria is None:
108
+ criteria = [
109
+ {
110
+ "criterion": r.get("criterion", ""),
111
+ "importance": r.get("points", ""),
112
+ "tolerance": r.get("tolerance", ""),
113
+ }
114
+ ]
115
+ for ci, c in enumerate(criteria):
116
+ st.markdown(
117
+ f"&nbsp;&nbsp;{ci + 1}. _importance:_ **{c.get('importance', '') or '—'}** "
118
+ f"· _tolerance:_ {c.get('tolerance', '') or '—'} \n"
119
+ f"&nbsp;&nbsp;&nbsp;&nbsp;{c.get('criterion', '') or '_(no criterion)_'}"
120
  )
121
 
122
  # ---- modified-since-last-review flag ----