Spaces:

trialdesignbench
/

tdb-intake

Running

tttjjj commited on 10 days ago

Commit

23b513a

1 Parent(s): 322e8c4

Rubrics: drop Reproducibility, multiple criteria per dimension, importance

- derivation_required now has 3 dimensions on output.json (Inputs used,
Calculated value, Method) — Reproducibility/output.R removed.
- Each dimension holds multiple criteria; each criterion has its own
criterion text, importance (HIGH/medium/low), and tolerance ('points'
replaced by an importance selectbox).
- Stable criterion ids (cid) keep widget state correct across add/remove.
- Admin renders criteria read-only; content hash + load handle the new shape
(with backward-compat for old single-criterion records).

Files changed (4) hide show

README.md +4 -4
app.py +119 -33
lib/schema.py +28 -21
pages/1_Admin.py +20 -20

README.md CHANGED Viewed

@@ -24,10 +24,10 @@ A Streamlit intake form for trial statisticians. Submissions are saved to a **Hu
   - `design_element` (dropdown — when "Others" is picked, a free-text input appears)
   - `question_type` (dropdown — `extraction_only` / `derivation_required`)
   - `question` (free text)
-  - **Rubrics** auto-generated by question type:
-    - `extraction_only` → 1 rubric: `output.json`
-    - `derivation_required` → 4 rubrics: `output.json` × {Inputs used, Calculated value, Method} + `output.R` × {Reproducibility}
-  - Each rubric collects `points`, `tolerance`, `criterion`.
   - **Versions** — every Submit saves a new version. Re-enter the same `trial_id` + `username`, click **Find versions**, pick one, and **Load selected version** to pull it back into the form for editing; Submit then saves a new version.
 - **Admin page (`pages/1_Admin.py`)** — password-gated review console. Shows **only the latest version of each trial** (one row per `trial_id` + `username`). The questionnaire is rendered in the same layout as the form (read-only). Reviewers can add reviews **per question** *and* an overall review; review history covers **all versions** (each review tagged with its version, and per-question reviews tied to their question). The trial's current status reflects the latest version's most recent overall review. Each review is its own file under `reviews/<trial>__<user>/<version>/`. (Submitters can still see and load all their own versions on the form.)

   - `design_element` (dropdown — when "Others" is picked, a free-text input appears)
   - `question_type` (dropdown — `extraction_only` / `derivation_required`)
   - `question` (free text)
+  - **Rubric dimensions** auto-generated by question type:
+    - `extraction_only` → 1 dimension on `output.json`
+    - `derivation_required` → 3 dimensions on `output.json`: {Inputs used, Calculated value, Method}
+  - Under each dimension you can add **multiple criteria**; each criterion has its own `criterion` text, `importance` (HIGH / medium / low), and `tolerance`.
   - **Versions** — every Submit saves a new version. Re-enter the same `trial_id` + `username`, click **Find versions**, pick one, and **Load selected version** to pull it back into the form for editing; Submit then saves a new version.
 - **Admin page (`pages/1_Admin.py`)** — password-gated review console. Shows **only the latest version of each trial** (one row per `trial_id` + `username`). The questionnaire is rendered in the same layout as the form (read-only). Reviewers can add reviews **per question** *and* an overall review; review history covers **all versions** (each review tagged with its version, and per-question reviews tied to their question). The trial's current status reflects the latest version's most recent overall review. Each review is its own file under `reviews/<trial>__<user>/<version>/`. (Submitters can still see and load all their own versions on the form.)

app.py CHANGED Viewed

@@ -22,7 +22,12 @@ import json
 import streamlit as st
-from lib.schema import DESIGN_ELEMENTS, QUESTION_TYPES, rubrics_for_type
 from lib.storage import (
     get_submission,
     hf_configured,
@@ -51,8 +56,9 @@ def kq(uid: int, field: str) -> str:
     return f"q{uid}_{field}"
-def kr(uid: int, j: int, field: str) -> str:
-    return f"q{uid}_r{j}_{field}"
 def _next_uid() -> int:
@@ -60,6 +66,11 @@ def _next_uid() -> int:
     return st.session_state.uid_counter
 def _next_question_id() -> str:
     nums = []
     for q in st.session_state.questions:
@@ -78,6 +89,8 @@ if "questions" not in st.session_state:
     st.session_state.questions = []
 if "uid_counter" not in st.session_state:
     st.session_state.uid_counter = 0
 if "trial_id" not in st.session_state:
     st.session_state.trial_id = ""
 if "username" not in st.session_state:
@@ -105,25 +118,46 @@ def _remove_question(idx: int) -> None:
     st.session_state.questions.pop(idx)
 def _on_type_change(uid: int) -> None:
-    """Regenerate rubric structure (and seed blank value fields) on type change."""
     qt = st.session_state.get(kq(uid, "qt"), "")
     q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
     if q is None:
         return
-    # Clear old rubric value fields.
-    for j in range(len(q["rubrics"])):
-        for f in ("points", "tolerance", "criterion"):
-            st.session_state.pop(kr(uid, j, f), None)
-    # New structure (artifact + dimension fixed by type).
-    new_rubrics = [
-        {"artifact": r["artifact"], "dimension": r["dimension"]}
-        for r in rubrics_for_type(qt)
-    ]
     q["rubrics"] = new_rubrics
-    for j in range(len(new_rubrics)):
-        for f in ("points", "tolerance", "criterion"):
-            st.session_state[kr(uid, j, f)] = ""
 def _build_prompts() -> list:
@@ -145,9 +179,16 @@ def _build_prompts() -> list:
                     {
                         "artifact": rub["artifact"],
                         "dimension": rub["dimension"],
-                        "points": st.session_state.get(kr(uid, j, "points"), ""),
-                        "tolerance": st.session_state.get(kr(uid, j, "tolerance"), ""),
-                        "criterion": st.session_state.get(kr(uid, j, "criterion"), ""),
                     }
                     for j, rub in enumerate(q["rubrics"])
                 ],
@@ -210,20 +251,39 @@ def _load_selected() -> None:
     new_questions = []
     for qp in prompts:
         uid = _next_uid()
-        rubrics = [
-            {"artifact": r.get("artifact", ""), "dimension": r.get("dimension", "")}
-            for r in (qp.get("rubrics") or [])
-        ]
-        new_questions.append({"_uid": uid, "rubrics": rubrics})
         st.session_state[kq(uid, "id")] = qp.get("id", "")
         st.session_state[kq(uid, "de")] = qp.get("design_element", "")
         st.session_state[kq(uid, "deother")] = qp.get("design_element_other", "")
         st.session_state[kq(uid, "qt")] = qp.get("question_type", "")
         st.session_state[kq(uid, "question")] = qp.get("question", "")
         for j, r in enumerate(qp.get("rubrics") or []):
-            st.session_state[kr(uid, j, "points")] = r.get("points", "")
-            st.session_state[kr(uid, j, "tolerance")] = r.get("tolerance", "")
-            st.session_state[kr(uid, j, "criterion")] = r.get("criterion", "")
     st.session_state.questions = new_questions
     st.session_state.loaded_version = record.get("version", "")
@@ -386,12 +446,38 @@ def _questions_fragment() -> None:
                             meta_parts.append(f"**Dimension:** {rub['dimension']}")
                         st.markdown(" · ".join(meta_parts))
-                        rc1, rc2 = st.columns(2)
-                        with rc1:
-                            st.text_input("points", key=kr(uid, j, "points"))
-                        with rc2:
-                            st.text_input("tolerance", key=kr(uid, j, "tolerance"))
-                        st.text_area("criterion", key=kr(uid, j, "criterion"), height=80)
     st.button("+ Add question", on_click=_add_question)

 import streamlit as st
+from lib.schema import (
+    DESIGN_ELEMENTS,
+    IMPORTANCE_OPTIONS,
+    QUESTION_TYPES,
+    dimensions_for_type,
+)
 from lib.storage import (
     get_submission,
     hf_configured,
     return f"q{uid}_{field}"
+def kc(uid: int, j: int, cid: int, field: str) -> str:
+    """Key for a criterion field: question uid, dimension index j, criterion id."""
+    return f"q{uid}_r{j}_c{cid}_{field}"
 def _next_uid() -> int:
     return st.session_state.uid_counter
+def _next_cid() -> int:
+    st.session_state.cid_counter += 1
+    return st.session_state.cid_counter
 def _next_question_id() -> str:
     nums = []
     for q in st.session_state.questions:
     st.session_state.questions = []
 if "uid_counter" not in st.session_state:
     st.session_state.uid_counter = 0
+if "cid_counter" not in st.session_state:
+    st.session_state.cid_counter = 0
 if "trial_id" not in st.session_state:
     st.session_state.trial_id = ""
 if "username" not in st.session_state:
     st.session_state.questions.pop(idx)
+def _clear_criterion_keys(uid: int, j: int, cid: int) -> None:
+    for f in ("criterion", "importance", "tolerance"):
+        st.session_state.pop(kc(uid, j, cid, f), None)
 def _on_type_change(uid: int) -> None:
+    """Rebuild the dimension blocks (each with one starter criterion) on type change."""
     qt = st.session_state.get(kq(uid, "qt"), "")
     q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
     if q is None:
         return
+    # Clear all existing criterion fields.
+    for j, rub in enumerate(q["rubrics"]):
+        for cid in rub.get("criteria", []):
+            _clear_criterion_keys(uid, j, cid)
+    # New dimension blocks (artifact + dimension fixed by type), one criterion each.
+    new_rubrics = []
+    for dim in dimensions_for_type(qt):
+        cid = _next_cid()
+        new_rubrics.append(
+            {"artifact": dim["artifact"], "dimension": dim["dimension"], "criteria": [cid]}
+        )
     q["rubrics"] = new_rubrics
+def _add_criterion(uid: int, j: int) -> None:
+    q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
+    if q is None or j >= len(q["rubrics"]):
+        return
+    q["rubrics"][j]["criteria"].append(_next_cid())
+def _remove_criterion(uid: int, j: int, cid: int) -> None:
+    q = next((x for x in st.session_state.questions if x["_uid"] == uid), None)
+    if q is None or j >= len(q["rubrics"]):
+        return
+    crits = q["rubrics"][j]["criteria"]
+    if cid in crits:
+        crits.remove(cid)
+        _clear_criterion_keys(uid, j, cid)
 def _build_prompts() -> list:
                     {
                         "artifact": rub["artifact"],
                         "dimension": rub["dimension"],
+                        "criteria": [
+                            {
+                                "criterion": st.session_state.get(kc(uid, j, cid, "criterion"), ""),
+                                "importance": st.session_state.get(
+                                    kc(uid, j, cid, "importance"), IMPORTANCE_OPTIONS[0]
+                                ),
+                                "tolerance": st.session_state.get(kc(uid, j, cid, "tolerance"), ""),
+                            }
+                            for cid in rub.get("criteria", [])
+                        ],
                     }
                     for j, rub in enumerate(q["rubrics"])
                 ],
     new_questions = []
     for qp in prompts:
         uid = _next_uid()
         st.session_state[kq(uid, "id")] = qp.get("id", "")
         st.session_state[kq(uid, "de")] = qp.get("design_element", "")
         st.session_state[kq(uid, "deother")] = qp.get("design_element_other", "")
         st.session_state[kq(uid, "qt")] = qp.get("question_type", "")
         st.session_state[kq(uid, "question")] = qp.get("question", "")
+        rubrics = []
         for j, r in enumerate(qp.get("rubrics") or []):
+            # New format has r["criteria"]; old format had a single
+            # points/criterion/tolerance on the rubric itself.
+            saved_crits = r.get("criteria")
+            if saved_crits is None:
+                saved_crits = [
+                    {
+                        "criterion": r.get("criterion", ""),
+                        "importance": IMPORTANCE_OPTIONS[0],
+                        "tolerance": r.get("tolerance", ""),
+                    }
+                ]
+            cids = []
+            for c in saved_crits:
+                cid = _next_cid()
+                cids.append(cid)
+                st.session_state[kc(uid, j, cid, "criterion")] = c.get("criterion", "")
+                imp = c.get("importance", "")
+                st.session_state[kc(uid, j, cid, "importance")] = (
+                    imp if imp in IMPORTANCE_OPTIONS else IMPORTANCE_OPTIONS[0]
+                )
+                st.session_state[kc(uid, j, cid, "tolerance")] = c.get("tolerance", "")
+            rubrics.append(
+                {"artifact": r.get("artifact", ""), "dimension": r.get("dimension", ""), "criteria": cids}
+            )
+        new_questions.append({"_uid": uid, "rubrics": rubrics})
     st.session_state.questions = new_questions
     st.session_state.loaded_version = record.get("version", "")
                             meta_parts.append(f"**Dimension:** {rub['dimension']}")
                         st.markdown(" · ".join(meta_parts))
+                        criteria = rub.get("criteria", [])
+                        for ci, cid in enumerate(criteria):
+                            st.text_area(
+                                f"criterion {ci + 1}",
+                                key=kc(uid, j, cid, "criterion"),
+                                height=70,
+                            )
+                            cc1, cc2, cc3 = st.columns([2, 2, 1])
+                            with cc1:
+                                st.selectbox(
+                                    "importance",
+                                    options=IMPORTANCE_OPTIONS,
+                                    key=kc(uid, j, cid, "importance"),
+                                )
+                            with cc2:
+                                st.text_input("tolerance", key=kc(uid, j, cid, "tolerance"))
+                            with cc3:
+                                st.write("")
+                                st.write("")
+                                st.button(
+                                    "✕",
+                                    key=f"rmc_{uid}_{j}_{cid}",
+                                    help="Remove this criterion",
+                                    on_click=_remove_criterion,
+                                    args=(uid, j, cid),
+                                )
+                        st.button(
+                            "+ Add criterion",
+                            key=f"addc_{uid}_{j}",
+                            on_click=_add_criterion,
+                            args=(uid, j),
+                        )
     st.button("+ Add question", on_click=_add_question)

lib/schema.py CHANGED Viewed

@@ -22,13 +22,21 @@ Status = Literal["pending", "reviewed", "needs_fix"]
 VALID_STATUSES: List[str] = ["pending", "reviewed", "needs_fix"]
 class Rubric(TypedDict):
     artifact: str
     dimension: str
-    points: str
-    criterion: str
-    tolerance: str
 class Question(TypedDict):
@@ -40,25 +48,16 @@ class Question(TypedDict):
     rubrics: List[Rubric]
-def blank_rubric(artifact: str = "", dimension: str = "") -> Rubric:
-    return {
-        "artifact": artifact,
-        "dimension": dimension,
-        "points": "",
-        "criterion": "",
-        "tolerance": "",
-    }
-def rubrics_for_type(qt: str) -> List[Rubric]:
     if qt == "extraction_only":
-        return [blank_rubric("output.json", "")]
     if qt == "derivation_required":
         return [
-            blank_rubric("output.json", "Inputs used"),
-            blank_rubric("output.json", "Calculated value"),
-            blank_rubric("output.json", "Method"),
-            blank_rubric("output.R", "Reproducibility"),
         ]
     return []
@@ -100,8 +99,16 @@ def question_content_hash(q: dict) -> str:
         "question_type": q.get("question_type", ""),
         "rubrics": [
             {
-                k: r.get(k, "")
-                for k in ("artifact", "dimension", "points", "criterion", "tolerance")
             }
             for r in (q.get("rubrics") or [])
         ],

 VALID_STATUSES: List[str] = ["pending", "reviewed", "needs_fix"]
+# Each criterion's importance (replaces the old numeric "points").
+IMPORTANCE_OPTIONS: List[str] = ["HIGH", "medium", "low"]
+class Criterion(TypedDict):
+    criterion: str
+    importance: str
+    tolerance: str
 class Rubric(TypedDict):
+    # A dimension block; holds one or more criteria.
     artifact: str
     dimension: str
+    criteria: List[Criterion]
 class Question(TypedDict):
     rubrics: List[Rubric]
+def dimensions_for_type(qt: str):
+    """Fixed (artifact, dimension) blocks for a question type. The user fills in
+    one or more criteria under each."""
     if qt == "extraction_only":
+        return [{"artifact": "output.json", "dimension": ""}]
     if qt == "derivation_required":
         return [
+            {"artifact": "output.json", "dimension": "Inputs used"},
+            {"artifact": "output.json", "dimension": "Calculated value"},
+            {"artifact": "output.json", "dimension": "Method"},
         ]
     return []
         "question_type": q.get("question_type", ""),
         "rubrics": [
             {
+                "artifact": r.get("artifact", ""),
+                "dimension": r.get("dimension", ""),
+                "criteria": [
+                    {
+                        "criterion": c.get("criterion", ""),
+                        "importance": c.get("importance", ""),
+                        "tolerance": c.get("tolerance", ""),
+                    }
+                    for c in (r.get("criteria") or [])
+                ],
             }
             for r in (q.get("rubrics") or [])
         ],

pages/1_Admin.py CHANGED Viewed

@@ -97,26 +97,26 @@ def render_questions(
             rubrics = q.get("rubrics") or []
             if rubrics:
                 st.markdown(f"**Rubrics ({len(rubrics)})**")
-                for j, r in enumerate(rubrics):
-                    with st.container(border=True):
-                        meta = f"**Artifact:** `{r.get('artifact', '')}`"
-                        if r.get("dimension"):
-                            meta += f" · **Dimension:** {r['dimension']}"
-                        st.markdown(meta)
-                        rc1, rc2 = st.columns(2)
-                        with rc1:
-                            st.text_input(
-                                "points", value=r.get("points", ""), disabled=True,
-                                key=f"pt_{submission_id}_{qid}_{j}",
-                            )
-                        with rc2:
-                            st.text_input(
-                                "tolerance", value=r.get("tolerance", ""), disabled=True,
-                                key=f"to_{submission_id}_{qid}_{j}",
-                            )
-                        st.text_area(
-                            "criterion", value=r.get("criterion", ""), disabled=True,
-                            key=f"cr_{submission_id}_{qid}_{j}", height=70,
                         )
             # ---- modified-since-last-review flag ----

             rubrics = q.get("rubrics") or []
             if rubrics:
                 st.markdown(f"**Rubrics ({len(rubrics)})**")
+                for r in rubrics:
+                    meta = f"**Artifact:** `{r.get('artifact', '')}`"
+                    if r.get("dimension"):
+                        meta += f" · **Dimension:** {r['dimension']}"
+                    st.markdown(meta)
+                    # New format: list of criteria; old format: single fields.
+                    criteria = r.get("criteria")
+                    if criteria is None:
+                        criteria = [
+                            {
+                                "criterion": r.get("criterion", ""),
+                                "importance": r.get("points", ""),
+                                "tolerance": r.get("tolerance", ""),
+                            }
+                        ]
+                    for ci, c in enumerate(criteria):
+                        st.markdown(
+                            f"&nbsp;&nbsp;{ci + 1}. _importance:_ **{c.get('importance', '') or '—'}** "
+                            f"· _tolerance:_ {c.get('tolerance', '') or '—'}  \n"
+                            f"&nbsp;&nbsp;&nbsp;&nbsp;{c.get('criterion', '') or '_(no criterion)_'}"
                         )
             # ---- modified-since-last-review flag ----