# DEMPSTER'S COURT ### Game Design & Product Requirements Document — v1.0 *Owner: Product. Audience: Game Designer + Developer. Status: Ready to build.* --- ## 0. How to read this document This is the single source of truth for building Dempster's Court. It is written so that: - A **designer** can take Sections 1–8 and 11 and design every screen, case, and progression beat. - A **developer** can take Sections 9, 10, 12, and the Appendices and build the engine, the model integration, and the app. - The **cases in Section 7 are content, not examples.** They are meant to be shipped. Every number in them is real and chosen to make a specific lesson land. Numbers marked *(tunable)* are starting values for playtest balancing; everything else is structural and should not be changed without re-checking the math. A core principle runs through the whole document: **the AI model never does arithmetic, and the math engine never improvises.** The model is the *mind* of each witness — it talks, lies, hedges, and remembers in character. A deterministic Python engine does every belief calculation. Keep that wall clean and the product is buildable by two people in a hackathon timeframe. --- ## 1. Product vision ### 1.1 One-line pitch A courtroom puzzle game where you are the Judge, you interrogate AI witnesses who each believe a different version of the truth, and **you decide how to combine their conflicting testimony into a verdict — knowing that the method you choose decides who hangs.** ### 1.2 The fantasy You are not a detective hunting for one hidden clue. You are a judge drowning in *too much* testimony, all of it partial, biased, or contradictory. Your job is not to *find* the truth. Your job is to decide **how much certainty you are entitled to**, and to live with the consequences when you get it wrong. ### 1.3 The thesis (why this game exists) Certainty is not handed to you by the evidence. **Certainty is a choice about how you combine evidence — and that choice has a body count.** In an age of AI systems that are fluently, confidently wrong, a game that lets you *feel* the difference between honest doubt and manufactured certainty is the whole point. The player should leave having internalized one idea in their gut: *the same facts, combined two different ways, convict two different people.* ### 1.4 What makes it original Murder-mystery games are abundant. Dempster–Shafer theory is a textbook teaching tool. **No shipped game lets the player choose the evidence-combination rule and watch it change the verdict.** That mechanic — *the combination rule is the player's primary verb* — is the novel core. Protect it. ### 1.5 Success criteria - **Delight / "show a friend":** a player who finishes one case wants to send a friend the same case to see if they convict the same suspect. - **AI is load-bearing:** remove the model and there are no witnesses to question; remove the belief engine and it collapses into a generic whodunit. Both must be true. - **Makes you think:** the player can articulate, after playing, why trusting two confident witnesses too much can be unjust. --- ## 2. Design pillars (the non-negotiables) 1. **The rule is the verb.** The player's central, repeated, meaningful decision is *how to combine evidence*, not *who to click*. Every system must funnel toward that choice and make it feel weighty. 2. **Doubt is visible and beautiful.** Belief, plausibility, ignorance, and conflict are all shown on screen as living things. The gap between "I can prove it" and "I can't rule it out" is the emotional center of the UI. 3. **The model is a mind, not a calculator.** Witnesses are conversational, inconsistent, evasive, human. The player questions them freely. Numbers are never asked of the model and never shown as coming from it. 4. **Consequence over correctness.** You are scored on *justice*, not on matching a key. Hanging an innocent with high confidence is the worst outcome — worse than letting a case go cold. 5. **Small is the aesthetic, not the apology.** One small local model, a handful of suspects, a tight frame. The constraint is the style. --- ## 3. Player role, frame, and core vocabulary The player is **the Judge**. They never accuse in person; they call witnesses, weigh them, fuse their testimony, and pronounce a verdict. The game's hidden math operates over a **frame of discernment** = the set of suspects for the current case (the game term is **"the Dock"** — the suspects standing in the dock). Everything the player does maps to a Dempster–Shafer operation, but the player never sees jargon. The mapping: | Player-facing term | DST concept | What it means in play | |---|---|---| | The Dock | Frame of discernment Θ | The list of suspects | | A Testimony | A mass function m | One witness's distributed belief | | "I'm not sure" mass | m(Θ), ignorance | Belief the witness assigns to "could be any of them" | | Suspicion bar | Belief, Bel(A) | What you can *prove* against a suspect | | Shadow bar | Plausibility, Pl(A) | What you *can't rule out* against a suspect | | Doubt meter | Conflict mass K | How much the testimonies contradict each other | | Discounting a witness | Reliability discount α | Lowering how much a witness's mass counts | | The Method (the lever) | Combination rule | Dempster / Yager / PCR5 / Cautious | | The Verdict | Decision rule + threshold | Convict if Suspicion ≥ threshold | **Design rule:** the Appendix glossary (A) is the only place these two columns appear together. In the UI, only the left column exists. --- ## 4. Core loops ### 4.1 Moment-to-moment loop (inside a case) 1. **Read the case file** (victim, the Dock, the conviction threshold). 2. **Call a witness** from the roster (spend a Court Day — a limited resource). 3. **Interrogate** them in free-form natural language. They answer in character. 4. **Take the Deposition** — lock in that witness's current Testimony (mass) onto the Evidence Board. 5. Repeat 2–4 for as many witnesses as your Court Days allow. 6. Go to the **Fusion Bench**: optionally **discount** witnesses you distrust, then **pull a Method lever** to combine the depositions. 7. Watch the **Suspicion / Shadow bars** and the **Doubt meter** recompute live. 8. If a suspect's Suspicion ≥ threshold, **Deliver Verdict** unlocks. Or gather more / try another Method. 9. **The Reveal:** ground truth shown. Did you convict the guilty, an innocent, or no one? 10. **Verdict Card** generated (the shareable artifact). Score recorded. ### 4.2 Session loop (across cases) Play a case → unlock the next → a new Method or pathology is introduced → difficulty stacks → reach Endless Mode (procedurally generated cases). ### 4.3 Meta / retention loop - **Justice record:** a tally of guilty convicted, innocents hanged, cases gone cold. - **Verdict Cards** are collectible and shareable; friends compare rulings on the same case. - **Endless Mode** + a daily generated case give a reason to return. --- ## 5. Game systems ### 5.1 Court Days (resource) Each case grants a limited number of Court Days (typically 3–5). Calling a witness or commissioning physical evidence costs one. This forces the player to *choose whom to believe is worth hearing* — they cannot exhaustively interview everyone. It also makes replay valuable: a different witness subset yields a different evidence structure. ### 5.2 Witnesses and Testimony A witness is the model role-playing a persona with **secret knowledge** and a **reliability**. Through interrogation the player draws out testimony; when they **Take the Deposition**, the system snapshots that witness's current **mass function over the Dock** as a piece of evidence. - For **story cases**, the witness's persona and secret knowledge are authored so their testimony converges on a designed target mass (for teachable balance). The *dialogue is live*; the *target* is seeded. See 9.3 for exactly how. - For **procedural cases**, the model generates persona, knowledge, dialogue, and mass. Either way, interrogation is genuine free-form AI conversation — that is the load-bearing AI experience. ### 5.3 The Evidence Board A visible collection of taken Depositions. Each shows the witness, a one-line summary, and a small bar-strip of their mass (how they spread their suspicion, including their "I'm not sure" slice). The player can discard a deposition before fusing. ### 5.4 Discounting (reliability) Before fusing, the player can drag a witness's **reliability slider** down. Mechanically this is DST discounting: it bleeds the witness's mass toward "I'm not sure" (ignorance). A fully discounted witness contributes nothing. This is how the player handles a witness they've caught lying or who is obviously drunk/biased — *without* deleting them entirely. Discovering *who to discount* is half of many puzzles. ### 5.5 The Methods (combination rules) — the heart The player chooses how depositions are fused. Each Method is a lever on the Fusion Bench with a distinct feel and a distinct failure mode. Methods unlock across the campaign so players learn them one at a time. - **Dempster (the Zealot).** Trusts every witness fully and *throws away* whatever they disagree on, then normalizes what's left. Feels decisive. **Failure mode:** under high conflict it manufactures false certainty — it can convict, with near-total confidence, the one suspect everybody thought *least* likely (the classic paradox). Visually it should feel *too clean* when the Doubt meter is high. - **Yager (the Agnostic).** Like Dempster, but instead of throwing conflict away it pours it into "I'm not sure." **Behavior:** under high conflict it refuses to convict anyone — honest paralysis. The right tool when two equally credible witnesses contradict and you must *not* pretend to know. - **PCR5 (the Diplomat).** Redistributes each piece of conflict back to exactly the suspects who caused it, in proportion to how strongly each was accused. **Behavior:** a middle path — it still reaches a decision under conflict, but a *fair* one that doesn't crown an uninvolved third party. Unlocks after the player has felt both Dempster's recklessness and Yager's paralysis. - **Cautious (the Skeptic).** Idempotent: combining a witness with a copy of themselves changes nothing. **Behavior:** immune to manufactured corroboration. The right tool when two "witnesses" are not independent — coached, colluding, or the same rumor twice. Using Dempster on dependent sources inflates confidence; Cautious refuses to be fooled by repetition. **Design rule:** never label a Method "correct." Each case has a *just* outcome; multiple Methods may reach it or fail it, and the game shows the player what *each* Method would have done at the Reveal. ### 5.6 Suspicion (Belief) and Shadow (Plausibility) For each suspect, two stacked bars: - **Suspicion = Belief:** mass committed *specifically* to that suspect (and subsets). What you can prove. - **Shadow = Plausibility:** all mass not committed *against* that suspect. What you can't rule out. The space between them is **doubt about that specific person.** A suspect with low Suspicion but high Shadow is "I can't prove it's her, but I can't clear her either" — the most dramatically loaded state in the game. ### 5.7 The Doubt meter (conflict K) A single prominent gauge showing total conflict among the current depositions. When the player pulls the Dempster lever while this gauge is high, the UI should signal danger (see 11.4). It is the player's cue that "this confident answer might be a lie." ### 5.8 The Verdict, threshold, and outcomes Each case sets a **conviction threshold** (e.g., Suspicion ≥ 0.70). The player may only convict a suspect at or above it. Outcomes at the Reveal: - **Just Conviction** — convicted the truly guilty above threshold. Best. - **Wrongful Execution** — convicted an innocent. Worst; weighted heaviest in scoring. The higher the confidence, the worse. - **Cold Case** — no suspect reached threshold; the guilty walks. Bad, but better than hanging an innocent. (This must be a *legitimate, sometimes-correct* choice, not just a fail state.) - **Mistrial (optional)** — player convicts the guilty but only barely / via a Method the game flags as unsound. Partial credit. ### 5.9 Scoring — the Justice Ledger Track across the campaign: Just Convictions, Wrongful Executions, Cold Cases. Surface a single evocative stat ("On your watch: 11 guilty hanged, 2 innocents lost, 3 killers still free"). Do **not** show a percentage score during a case — reveal consequences narratively first, numbers second. --- ## 6. Progression & difficulty The campaign teaches the toolkit one concept per case, then stacks. Each case is built around a **witness pathology** — a structural flaw in the evidence that a specific Method or move counters. Learning to *diagnose the pathology* is the skill that deepens and makes the game replayable. | Case | Teaches | Pathology | New tool unlocked | |---|---|---|---| | 0 — The Garden Path | The core loop | None (clean, low-conflict) | Dempster (behaves nicely here) | | 1 — Two Truths | Conflict & false certainty | Two credible witnesses flatly contradict | Yager | | 2 — The Echo | Independence | Two non-independent sources inflate confidence | Cautious | | 3 — The Banquet | The full toolkit | Conflict + an unreliable witness + ignorance | PCR5 + Discounting mastery | | Endless | Mastery | Procedurally mixed | — | Pathology vocabulary the designer can reuse and combine (see also 7.6): - **The Contradiction** — two strong, opposing accusations (→ Yager / PCR5). - **The Echo** — correlated/coached witnesses (→ Cautious). - **The Drunk** — a confident but unreliable witness (→ Discount, then fuse). - **The Fog** — everyone is vague; lots of ignorance (→ read Shadow vs Suspicion; gather more). - **The Sliver** — every witness puts a tiny mass on the same innocent, who Dempster then crowns (→ the paradox trap; Yager/PCR5). --- ## 7. THE CASES (shippable content) > **Notation.** The Dock (suspects) is written as a set. A Testimony is written as a mass function, e.g. `m(Gardener)=0.9, m(Θ)=0.1`, where `Θ` is "I'm not sure / could be anyone." All masses for one witness sum to 1. `Threshold` is the Suspicion (Belief) needed to convict. The **Reveal** lines tell the developer exactly what each Method produces so the "what each Method would have done" panel can be hard-checked against the engine. --- ### CASE 0 — "The Garden Path" *(Tutorial: the loop)* **Setup.** Lord Pemberton is found dead in his greenhouse, a trowel through his heart. The estate is snowed in; the killer is inside. **The Dock:** { **Gardener**, **Niece**, **Valet** }. Θ = those three. **Court Days:** 3. **Threshold:** 0.65. **Methods available:** Dempster only. **Witnesses & authored target Testimonies:** 1. **The Cook** (reliable, saw a lot). Target: `m(Gardener)=0.7, m(Θ)=0.3`. She saw the Gardener leave the greenhouse with mud to his elbows. 2. **The Stable Boy** (reliable, partial). Target: `m(Gardener)=0.5, m(Θ)=0.5`. Heard a man arguing in the greenhouse; thinks it was the Gardener but isn't sure. 3. **The Coroner** (physical evidence, costs a Day). Target: `m(Gardener)=0.6, m(Θ)=0.4`. The trowel is the Gardener's, but anyone could have taken it. **Ground truth:** the **Gardener** did it. Low conflict — everyone points the same way, just with different confidence. **Intended play.** Call two or three witnesses, fuse with Dempster (no conflict, so it behaves), watch Suspicion on the Gardener climb past 0.65. Convict. Just Conviction. **Reveal / teaching panel.** With Cook + Stable Boy under Dempster: conflict K≈0 (no contradictions), Suspicion(Gardener) rises to ~0.85 *(tunable)*. The case exists purely to teach: call → interrogate → deposition → fuse → bars move → convict. No trap. End on a line that plants the seed: *"Justice was easy today. The witnesses agreed. They will not always agree."* --- ### CASE 1 — "Two Truths" *(Conflict, the paradox, and Yager)* **Setup.** Lord Ashby is poisoned at his own dinner. Two servants are *certain* — and certain of different people. **The Dock:** { **Gardener**, **Maid**, **Secretary** }. Θ = those three. **Court Days:** 3. **Threshold:** 0.70. **Methods available:** Dempster, **Yager (newly unlocked).** **Witnesses & authored target Testimonies (the trap):** 1. **The Butler** (highly credible). Target: `m(Gardener)=0.99, m(Secretary)=0.01`. Swears he saw the Gardener pour the wine; allows a sliver that maybe it was the Secretary. 2. **The Cook** (equally credible). Target: `m(Maid)=0.99, m(Secretary)=0.01`. Swears it was the Maid in the pantry; allows the same sliver for the Secretary. 3. *(Optional, costs a Day)* **The Apothecary** (physical). Target: `m(Gardener)=0.6, m(Θ)=0.4`. The poison matches a compound the Gardener bought. **Ground truth:** the **Gardener** did it. The Secretary is innocent and was barely suspected by anyone. **The trap (this is the whole lesson).** Fuse the Butler + Cook with **Dempster**. Their strong accusations conflict and cancel; all that survives normalization is the tiny shared sliver on the **Secretary**. Dempster crowns the Secretary with **Suspicion = 1.00** — the one person *both* witnesses thought least likely. The lever offers you certain conviction of an innocent. **The escape.** - **Yager** pours the conflict into "I'm not sure": Suspicion on everyone stays near 0; the Doubt meter is pinned high. The game is telling you honestly: *two equally credible witnesses contradict — you cannot convict on this alone.* - The correct *play* is then to spend the Day on the **Apothecary**, which breaks the symmetry toward the Gardener, and fuse sensibly (Yager across all three, or discount one accuser) until the Gardener — not the Secretary — clears threshold. **Reveal / teaching panel (developer must reproduce exactly):** - Butler ⊕ Cook under **Dempster:** K = 0.9999; m(Secretary) = 0.0001 / (1 − 0.9999) = **1.00**. → If the player convicted here: *Wrongful Execution, maximum confidence.* The harshest beat in the early game. - Butler ⊕ Cook under **Yager:** m(Secretary)=0.0001, m(Θ)=0.9999. Suspicion(all) ≈ 0; Doubt maxed. → No conviction possible; correct honest result. - With Apothecary added and a sane fusion, Suspicion(Gardener) clears 0.70 *(tunable; verify in engine)*. → Just Conviction. **Closing line (on a Dempster wrongful execution):** *"You were certain. Certainty, it turns out, is the easiest thing in the world to manufacture."* --- ### CASE 2 — "The Echo" *(Independence, and the Cautious method)* **Setup.** A merchant is strangled in a tavern. Two patrons come forward, both naming the same man, in suspiciously identical words. **The Dock:** { **Sailor**, **Innkeeper**, **Stranger** }. Θ = those three. **Court Days:** 4. **Threshold:** 0.70. **Methods:** Dempster, Yager, **Cautious (newly unlocked).** **Witnesses & authored target Testimonies (the trap):** 1. **First Patron.** Target: `m(Sailor)=0.8, m(Θ)=0.2`. Vivid, confident story blaming the Sailor. 2. **Second Patron.** Target: `m(Sailor)=0.8, m(Θ)=0.2`. *Almost word-for-word the same story.* 3. **The Barmaid** (physical/contextual, costs a Day). Target: `m(Innkeeper)=0.55, m(Θ)=0.45`. Saw the Innkeeper wipe down a rope and pocket the merchant's purse. **The independence clue.** During interrogation the model (steered by authored knowledge) lets slip — or the player asks the right question — that the two Patrons are **married and rehearsed their story.** A clue card surfaces: *"These two may not be independent."* **Ground truth:** the **Innkeeper** did it. The two Patrons are covering for the Innkeeper (a relative) by loudly framing the Sailor. **The trap.** Combine the two Patrons with **Dempster** and their agreement *multiplies*: Suspicion(Sailor) jumps to ~**0.94** — looks conclusive, clears threshold. Convict and you hang the innocent Sailor, exactly as the colluders intended. **The escape.** - Recognize the Echo and combine the two Patrons with **Cautious**: because they're the same evidence twice, Suspicion(Sailor) **stays at 0.80**, not 0.94 — repetition buys no extra certainty. *(Still above threshold — see the design note.)* - Spend a Day on the **Barmaid.** Now fuse: the Innkeeper evidence conflicts with the (un-inflated) Sailor accusation. The Doubt meter rises, Shadow on the Innkeeper grows, and a careful Judge sees the Sailor case is not safe. The just move is to **discount the colluding Patrons** (you caught them lying) and let the Barmaid carry the Innkeeper over threshold. **Design note (developer/designer, important).** The teaching beat is *feeling the number not move under Cautious.* Even though 0.80 is still above 0.70, the lesson is delivered by the contrast (0.94 vs 0.80) and by the colluders being exposed and discountable. Tune the threshold and the Patrons' base mass so that **Dempster falsely clears, Cautious creates visible hesitation, and discounting + Barmaid is the clean path to the Innkeeper.** Verify in the engine; treat these masses as *(tunable)* around the stated structure. **Reveal / teaching panel:** - Patron ⊕ Patron under **Dempster:** Suspicion(Sailor) ≈ 0.94. → with no other evidence, *Wrongful Execution.* - Patron ⊕ Patron under **Cautious:** Suspicion(Sailor) = 0.80 (idempotent — no inflation). - Discount Patrons + Barmaid: Suspicion(Innkeeper) clears threshold → *Just Conviction.* **Closing line:** *"Two voices saying the same thing are not twice as true. Sometimes they are one lie, told twice."* --- ### CASE 3 — "The Banquet" *(Graduation: the full toolkit + PCR5)* **Setup.** At a crowded banquet the host drops dead. Four guests had reason. The testimony is a mess: one strong accusation, one counter-accusation, one drunk, and a lot of people who saw nothing clearly. **The Dock:** { **Rival**, **Widow**, **Physician**, **Heir** }. Θ = those four. **Court Days:** 5. **Threshold:** 0.70. **Methods:** all four (Dempster, Yager, **PCR5 newly unlocked**, Cautious). **Witnesses & authored target Testimonies:** 1. **The Sommelier** (credible). `m(Rival)=0.8, m(Θ)=0.2`. Saw the Rival linger at the host's glass. 2. **The Widow's Maid** (credible, conflicting). `m(Physician)=0.75, m(Θ)=0.25`. Saw the Physician swap a vial. 3. **The Drunk Earl** (confident, unreliable). `m(Heir)=0.9, m(Θ)=0.1`. Loudly blames the Heir — but he's three bottles in and was face-down for half the night. 4. **The Coroner** (physical, costs a Day). `m(Physician)=0.6, m(Rival)=0.1, m(Θ)=0.3`. Poison was a medical compound; points mostly at the Physician. 5. **The Footman** (contextual, costs a Day). `m({Physician, Widow})=0.6, m(Θ)=0.4` — a *set* accusation: "it was one of the two who left early," without saying which. (Teaches set-valued mass: belief on a pair, not a person.) **Ground truth:** the **Physician** did it, with the **Widow** as accomplice (hence the set-valued testimony pointing at the pair). **The puzzle (stacked pathologies).** - The **Drunk Earl** is the *Drunk* pathology: confident mass on the Heir that must be **discounted** (his reliability is low; interrogation reveals he was unconscious). Fail to discount and the Heir contaminates every fusion. - The **Sommelier vs Widow's Maid** is a *Contradiction*: Rival vs Physician. Under **Dempster** the conflict may crown a low-mass third party or overstate the survivor; under **Yager** you get paralysis; under **PCR5** the conflict is fairly returned to the Rival and the Physician in proportion, letting the corroborating Coroner + Footman tip it justly toward the **Physician.** - The **Footman's set-valued** testimony rewards the player who understands that belief can sit on *a pair* — it raises Shadow on both Physician and Widow without convicting either alone. **Intended just path.** Discount the Drunk Earl → fuse Sommelier, Widow's Maid, Coroner, Footman with **PCR5** → Suspicion(Physician) clears 0.70 while no innocent is crowned. Convict the Physician. (A strong player may also note the Widow's high Shadow and flag her — an optional secondary conviction for bonus justice.) **Reveal / teaching panel.** Show, side by side, what each Method does to this evidence set after the Earl is discounted: - **Dempster:** overstates / risks crowning; high Doubt ignored. *(developer: report exact bars.)* - **Yager:** Physician below threshold; Cold Case risk. - **PCR5:** Physician clears threshold cleanly; → *Just Conviction.* - **Cautious:** under-commits (sources are independent here, so caution costs you) — teaches that Cautious is the *wrong* tool when witnesses really are independent. **Closing line:** *"You have learned the Methods. Now learn the harder thing: knowing which one the evidence in front of you is asking for."* --- ### 7.5 Endless Mode (procedural cases) After Case 3, unlock **Cold Cases:** the model generates a fresh murder each time. Generation contract (developer, see 9.4): - Pick a Dock of 3–4 archetypal suspects and a setting. - Pick a hidden ground-truth culprit (and optional accomplice → a set-valued clue). - Generate 3–6 witnesses, each with a persona, secret knowledge, a reliability, an **independence group** (so Echo pathologies can occur), and a target mass consistent with their knowledge and bias. - Guarantee at least one pathology from the Section 6 vocabulary so the case has a "point." - The engine validates that *at least one* Method yields a Just Conviction and that *at least one* Method yields a Wrongful Execution or Cold Case — i.e., the choice must matter. If not, regenerate. (This validation is pure Python over the authored masses; cheap.) A **Daily Case** (seeded by date) gives everyone the same generated case to compare Verdict Cards — the social hook. ### 7.6 Case authoring schema (so designers can write more) Every case is a data file (JSON/YAML), never code. Fields: ``` case_id, title, setting_blurb, victim, dock: [suspect names], # the frame Θ court_days, threshold, methods_available: [dempster|yager|pcr5|cautious], witnesses: [ { id, name, portrait_ref, persona, # persona = how they talk secret_knowledge, # what they actually know (fed to model, hidden from player) reliability: 0..1, # ground-truth reliability (drives the "drunk" pathology) independence_group: , # witnesses sharing a group are NOT independent (Echo) cost_days: 0|1, target_mass: { "": p, ... }, # authored belief, including "Θ" for ignorance is_physical_evidence: bool } ], ground_truth: { culprit, accomplice_or_null }, intended_path: free-text designer note, reveal_expectations: { dempster: ..., yager: ..., pcr5: ..., cautious: ... } # for QA ``` The model turns `persona + secret_knowledge + target_mass` into live dialogue; the engine consumes `target_mass` for all math. (See 9.3.) --- ## 8. Narrative & tone - **Setting:** a stylized, timeless courtroom — gaslit, gothic, a little unreal. Think a fable about justice rather than a specific era. Keeps it culturally neutral and evergreen. - **Voice:** spare, grave, occasionally wry. The game speaks like a narrator who has seen many verdicts and trusts none of them easily. Closing lines after each case carry the theme (samples given per case above). - **Witness writing:** witnesses are *people*, not info-dispensers — they deflect, flatter, contradict themselves, protect others. The model's small-model "weirdness" is an asset: let witnesses be a little strange. - **No gore.** Deaths are described with restraint; the drama is moral, not visceral. This keeps the public Space appropriate for all audiences. - **The recurring motif:** every closing line returns to certainty, doubt, and consequence. --- ## 9. Technical architecture (for the developer) ### 9.1 The hard wall Two subsystems that never blur: - **The Witness Engine (the model):** produces *dialogue* and a *proposed mass function*. Unreliable, creative, conversational. Output is constrained to valid JSON by a grammar; semantic correctness is not assumed. - **The Belief Engine (pure Python):** consumes mass functions and performs *all* DST math deterministically. This is the part that must be correct and is fully testable offline. ~150–250 lines. See Appendix B for exact formulas. If a calculation is ever shown to the player, it came from the Belief Engine. If a sentence is ever shown, it came from the Witness Engine. Never cross. ### 9.2 Model & runtime - **Model:** a ≤32B instruct model; recommended **7–8B** (e.g. Qwen2.5-7B-Instruct or Llama-3.1-8B-Instruct), GGUF quantized (Q4_K_M ≈ 5GB). Small is on-theme and sufficient — the model only writes short testimony + JSON. - **Runtime:** **llama.cpp** (via `llama-cpp-python`), which earns the Llama Champion badge and runs fully local (Off the Grid). - **Structured output:** a **GBNF grammar** (or JSON-schema-to-GBNF) constrains the mass-function JSON so it is always syntactically valid. The engine then validates ranges and renormalizes. Grammar constraint gives ~100% syntactic validity; never trust the *values* — clamp to [0,1] and renormalize so they sum to 1. ### 9.3 How a witness produces a mass (story vs procedural) **Conversation:** the player types a question. The system sends the model: the case context + this witness's `persona` + `secret_knowledge` + the running dialogue, and asks for an in-character reply. Pure prose, no grammar. **Deposition (mass extraction):** when the player "Takes the Deposition," the system makes a *second, separate* constrained call: "Given everything this witness has said and knows, output their belief over the suspects as JSON." The GBNF grammar forces the schema. Then: - **Story cases:** blend the model's proposed mass toward the authored `target_mass` (e.g. take the authored target as ground truth for balance, and let the live dialogue only flavor the *prose*). Simplest robust version: **use the authored `target_mass` directly for the math**, and use the model purely for dialogue. This guarantees teachable balance and removes all model-math risk. *Recommended for v1.* - **Procedural cases:** use the model's proposed mass (validated + renormalized) as the witness's `target_mass`. This is the single most important de-risking decision in the build: **in story mode the lesson never depends on the model getting numbers right.** ### 9.4 Mass-function JSON contract A witness's belief over a Dock of suspects `S = [a, b, c]` is emitted as: ```json { "focal_elements": [ { "suspects": ["a"], "mass": 0.7 }, { "suspects": ["a","b"], "mass": 0.1 }, { "suspects": ["a","b","c"], "mass": 0.2 } ], "one_line_summary": "I saw the Gardener leave with mud on his hands." } ``` - `suspects` is a non-empty subset of the Dock; the full-Dock subset represents ignorance (Θ). - masses should sum to ~1; the engine renormalizes. Empty set is forbidden by grammar. - The GBNF grammar enumerates the allowed suspect strings for the current Dock (small frame → tiny grammar). See Appendix C. ### 9.5 App & hosting - **Framework:** Gradio. **v1:** `gr.Blocks` for speed. **v2:** `gr.Server` with a custom HTML/JS (or React) courtroom frontend for the **Off-Brand** badge — `gr.Server` is a FastAPI server with Gradio's API engine, so the custom frontend keeps queuing/streaming and Spaces hosting. - **Hosting:** Hugging Face Space. Reality check: free CPU Basic = 2 vCPU / 16GB RAM (a 7B-Q4 fits in RAM but is slow on CPU); free ZeroGPU quota is minimal. **Plan:** record the demo video locally (fast laptop inference); for the public Space, accept slower CPU inference or use PRO ZeroGPU. Do not select a 32B model for the hosted build. - **Persistence (for the loop):** store Verdict Cards and the Justice Ledger (HF persistent storage or a simple key-value store). Daily Case seeded by date. Optional shared gallery of recent verdicts → virality. ### 9.6 State model (per case) ``` CaseState { case: , days_remaining: int, depositions: [ { witness_id, mass, discounted_alpha } ], active_method: enum, combined_mass: , # from Belief Engine bel: {suspect: float}, pl: {suspect: float}, conflict_K: float, verdict: { convicted: suspect|none, method, confidence } | null, outcome: just|wrongful|cold|mistrial | null } ``` All of `combined_mass, bel, pl, conflict_K` are recomputed by the Belief Engine whenever depositions, discounts, or method change. The UI is a pure function of this state. --- ## 10. Build milestones (mapped to badges) - **M1 — Belief Engine (Day 1).** Pure Python: mass type, Dempster, Yager, PCR5, Cautious, discounting, Bel/Pl, conflict K. Unit-tested against the worked numbers in the cases (esp. Case 1's K=0.9999 → Secretary=1.0). *No UI, no model.* This is the load-bearing core; prove it first. - **M2 — Witness Engine (Day 1–2).** llama.cpp + GBNF; free-form interrogation + constrained deposition call. Story-mode uses authored masses for math. - **M3 — Playable v1 (Day 2–3).** `gr.Blocks` UI: case file, call/interrogate, Evidence Board, Fusion Bench with Dempster + Yager, Suspicion/Shadow bars, Doubt meter, Verdict, Reveal, Verdict Card. Ship **Cases 0 + 1.** → badges: Off the Grid, Llama Champion. - **M4 — Depth (stretch).** Cautious + PCR5 + Discounting; Cases 2 + 3. → makes the toolkit complete. - **M5 — Off-Brand frontend (stretch).** `gr.Server` + custom courtroom UI. → Off-Brand badge. - **M6 — Fine-tune (stretch).** Train the 7–8B witness model on synthetic case data (bigger model authors cases + correct masses; distill). → Well-Tuned badge. - **M7 — Endless + Daily + gallery (stretch).** Procedural cases + shared Verdict Cards. → retention + Sharing-is-Caring + Field Notes (write the blog post on mapping conflict to motion and rules to verdicts). **Minimum winning submission = M1–M3** (two cases, two Methods, the paradox beat, a Verdict Card, a 60-second video of a Dempster wrongful execution flipping to a Yager-saved innocent). Everything above is badge-stacking. --- ## 11. UI / UX direction (for the designer) ### 11.1 Screen inventory 1. **Case File** — victim, the Dock (suspect portraits), threshold, Court Days. Entry to the case. 2. **Interrogation** — a witness portrait + chat. Free-form text input. A "Take Deposition" button. 3. **Evidence Board** — taken depositions as cards, each with a mini mass-strip and a reliability slider (discounting). 4. **Fusion Bench** — the centerpiece. Suspect portraits in the dock; under each, the Suspicion (Belief) and Shadow (Plausibility) bars; the Doubt meter prominent; the Method lever(s). "Deliver Verdict" gated by threshold. 5. **The Reveal** — ground truth; the "what each Method would have done" comparison; the outcome (Just / Wrongful / Cold). 6. **Verdict Card** — the shareable artifact. 7. **Justice Ledger** — the meta record. ### 11.2 The Fusion Bench (most important screen) - Suspects stand in a row (the Dock). Each has stacked **Suspicion** (solid, "proven") and **Shadow** (translucent, "can't rule out") bars that animate when fusion recomputes. - The **Method lever** is a physical-feeling control (a gavel, a brass lever). Switching Methods re-runs fusion live so the player *sees the bars rearrange* — this is the signature interaction. - The **Doubt meter** is a large gauge. It is the moral conscience of the screen. ### 11.3 Reading the gap Teach the player to read Suspicion-vs-Shadow without words: a suspect who is "clearly innocent" has both bars low; "proven guilty" has both high; the tense state — low Suspicion, tall Shadow — should be visually *itchy*, demanding more evidence. ### 11.4 The danger signal (the paradox) When the player selects **Dempster while the Doubt meter is high**, the resulting too-clean certainty should feel *wrong*: the convicted suspect's bar snaps to full with a cold, sterile flourish; the Doubt meter flares red behind it. The UI should make manufactured certainty feel uncanny, so that when the Reveal shows an innocent hanged, the player already half-knew. ### 11.5 The Verdict Card (share artifact) A formal "ruling" composition: case title, the convicted (or "Cold Case"), the Suspicion/Shadow at conviction, the **Method used**, and the **outcome stamp** (JUSTICE SERVED / INNOCENT LOST / CASE UNSOLVED). Designed to be screenshot-perfect. The implicit caption it invites: *"I used Dempster and hanged the wrong woman. Which rule did you use?"* --- ## 12. Risks & mitigations (PO register) | Risk | Severity | Mitigation | |---|---|---| | Feels like homework / a lecture | High | Lead with story, drama, and consequence; hide all jargon (Section 3 rule); the gut-punch Reveal carries the lesson, not text. | | "Pick a rule" is a one-time epiphany, no replay | High | Witness-pathology-as-puzzle; stacked difficulty; Endless + Daily cases; collectible Verdict Cards. | | Model emits malformed or absurd masses | Med | GBNF guarantees structure; clamp + renormalize; **story mode uses authored masses for math** (9.3) so the lesson never depends on model numbers. | | 7B on free CPU Space is slow | Med | Record video locally; accept slow public inference or use PRO ZeroGPU; keep model small. | | Scope creep blows the timeline | High | M1–M3 is a complete, winning submission; everything else is explicitly stretch/badges. | | Case balance is wrong (Methods don't flip verdicts) | Med | Engine-side QA: every case's `reveal_expectations` are asserted in tests; procedural cases are validated to have a choice that matters (7.5). | | Witnesses feel flat | Med | Strong personas; free-form interrogation; lean into small-model quirk as character. | --- ## Appendix A — Glossary (game term ↔ DST term) Reproduces the table in Section 3. This is the *only* place both vocabularies coexist; keep DST terms out of the UI. ## Appendix B — The Belief Engine: exact math (developer) Frame `Θ` = the Dock. A mass `m` assigns values in [0,1] to subsets of Θ, summing to 1, with `m(∅)=0`. **Belief and Plausibility.** ``` Bel(A) = Σ_{B ⊆ A} m(B) Pl(A) = Σ_{B ∩ A ≠ ∅} m(B) = 1 − Bel(Θ \ A) ``` **Conflict.** For two masses m1, m2: ``` K = Σ_{B ∩ C = ∅} m1(B)·m2(C) ``` **Dempster's rule** (for A ≠ ∅; result(∅)=0): ``` (m1 ⊕ m2)(A) = ( Σ_{B ∩ C = A} m1(B)·m2(C) ) / (1 − K) ``` *Worked check (Case 1):* m1(G)=.99,m1(S)=.01; m2(M)=.99,m2(S)=.01. K=.9999; only S∩S=S survives at .0001; (m1⊕m2)(S)=.0001/.0001=1.0. The engine MUST reproduce this. **Yager's rule** (conjunctive, conflict → ignorance): ``` q(A) = Σ_{B ∩ C = A} m1(B)·m2(C) m(A) = q(A) for A ≠ Θ m(Θ) = q(Θ) + q(∅) # all conflict reassigned to the whole frame ``` **PCR5** (two sources; redistribute each partial conflict to its causes): ``` start: m(A) = q(A) for all A ≠ ∅ for each pair (B, C) with B ∩ C = ∅ and m1(B)+m2(C) > 0: m(B) += m1(B)^2 · m2(C) / ( m1(B) + m2(C) ) m(C) += m2(C)^2 · m1(B) / ( m1(B) + m2(C) ) ``` (For >2 sources, fuse pairwise; PCR5 = PCR6 for two sources. Use pairwise for the game.) **Cautious rule (Denœux)** — idempotent, for non-independent sources: - Compute the **canonical decomposition** of each non-dogmatic mass into simple support functions, giving a weight `w(A)` for each `A ⊊ Θ` (via the commonality function and its log/Möbius transform). - Combine by taking, per focal element, `w₁₂(A) = min(w₁(A), w₂(A))`. - Recompose to a mass. With small frames (≤4) this is cheap. Property to test: `cautious(m, m) == m` (idempotence). **Discounting** (reliability; α = discount rate ∈ [0,1], α=0 keeps mass, α=1 → total ignorance): ``` m'(A) = (1 − α)·m(A) for A ≠ Θ m'(Θ) = (1 − α)·m(Θ) + α ``` **Decision.** Convict `A` iff `Bel(A) ≥ threshold` and `A = argmax Bel`. Otherwise Cold Case. **Implementation notes.** Represent subsets as bitmasks over the Dock (≤4 suspects → ≤16 subsets); all rules are loops over subset pairs — trivial at this scale. Renormalize and clamp every incoming mass. Unit-test each rule against the worked cases above and the idempotence property. ## Appendix C — GBNF sketch for the deposition call (developer) For a Dock of three suspects "gardener","maid","secretary", the grammar enumerates allowed subset tokens and forces the JSON shape. Conceptually: ``` root ::= "{" ws "\"focal_elements\":" ws "[" element ("," element)* "]" ws "," ws "\"one_line_summary\":" ws string ws "}" element ::= "{" ws "\"suspects\":" ws subset "," ws "\"mass\":" ws number ws "}" subset ::= "[" suspect ("," suspect)* "]" suspect ::= "\"gardener\"" | "\"maid\"" | "\"secretary\"" number ::= "0" "." [0-9]+ | "1" ("." "0"+)? ``` Generate this grammar per-case from the Dock. The grammar guarantees valid structure and valid suspect names; the engine guarantees valid values. --- *End of document. Build M1 first — prove the Belief Engine reproduces the Case 1 paradox — and the rest follows.*