Spaces:

kunalkachru23
/

nexus-enhanced-stage

Running

App Files Files Community

kunalkachru23 commited on Apr 26

Commit

b168831

verified ·

1 Parent(s): cc6bb3c

Upload folder using huggingface_hub

Browse files

Files changed (27) hide show

README.md +10 -10
docs/blog/blog_post_hf.md +5 -5
docs/deployment/DEPLOYMENT_CHECKLIST.md +14 -14
docs/deployment/HF_SPACES_DEPLOYMENT.md +10 -10
docs/guides/QUICK_START.md +2 -2
docs/pitch/DEMO_WALKTHROUGH.md +1 -1
docs/pitch/PITCH.md +6 -6
docs/pitch/PITCH_3MIN.md +1 -1
docs/pitch/VIDEO_RECORD_NOW_PACK.md +144 -0
docs/project/BEHAVIORAL_DELTA_PROOF.md +4 -4
docs/project/COMPLIANCE_LOCK_MATRIX.md +17 -17
docs/project/CURRICULUM_AND_ABLATION.md +1 -1
docs/project/FINAL_OPERATIONS_RUNBOOK.md +1 -1
docs/project/FINAL_READINESS_REPORT.md +2 -2
docs/project/IMPLEMENTATION_SUMMARY.md +14 -14
docs/project/JUDGING_EVIDENCE_INDEX.md +4 -4
docs/project/PLAN_OF_ACTION.md +13 -13
docs/project/PROJECT_STATUS.md +8 -7
docs/project/SUBTHEME_EVIDENCE_MATRIX.md +9 -9
docs/project/TEST_RESULTS_SUMMARY.md +1 -1
episode_rewards.json +1 -1
notebooks/grpo_colab_enhanced.ipynb +114 -72
notebooks/grpo_colab_v2.ipynb +2 -2
server/app.py +2 -2
server/data_models.py +1 -1
server/reward.py +1 -1
training_artifacts/pre_event_benchmark.json +1 -1

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ Use this section first during judging/review.
 - **Live environment (HF Space):** https://kunalkachru23-nexus-enhanced-stage.hf.space/
 - **3-minute pitch script:** [`docs/pitch/PITCH.md`](docs/pitch/PITCH.md)
 - **2-minute demo walkthrough:** [`docs/pitch/DEMO_WALKTHROUGH.md`](docs/pitch/DEMO_WALKTHROUGH.md)
-- **Hard-gate + rubric evidence index:** [`docs/project/JUDGING_EVIDENCE_INDEX.md`](docs/project/JUDGING_EVIDENCE_INDEX.md)
 - **Behavioral delta (before vs after):** [`docs/project/BEHAVIORAL_DELTA_PROOF.md`](docs/project/BEHAVIORAL_DELTA_PROOF.md)
 - **Compliance lock matrix:** [`docs/project/COMPLIANCE_LOCK_MATRIX.md`](docs/project/COMPLIANCE_LOCK_MATRIX.md)
 - **HF blog draft (publish-ready):** [`docs/blog/blog_post_hf.md`](docs/blog/blog_post_hf.md)
@@ -139,9 +139,9 @@ python scripts/export_reward_plot.py \
 Caption: blue line is per-episode reward, green is rolling average, red dashed line is baseline (`0.265`).
-## BRD hard gate — OpenEnv (reproduce)
-Per [`../design/hackathon_brd.md`](../design/hackathon_brd.md) Section 17, judges expect **OpenEnv (latest release)** usage, not only a custom HTTP server.
 **Local (dev machine, after `pip install "openenv>=0.2.3"`):**
@@ -163,7 +163,7 @@ openenv validate --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 **Deploying with OpenEnv:** use `openenv push . --repo-id <user>/<space> --exclude .hfignore` (or **`./gate.sh --push`**, which adds `--exclude` for you). OpenEnv does not load `.hfignore` unless you pass it via `--exclude`; omitting it does **not** break the build, it only uploads extra paths (less lean). See `docs/guides/QUICK_START.md` for a short rationale.
-`requirements.txt` **omits** `openenv` on the Space Docker image to keep builds reliable; the **Colab notebook** installs `openenv>=0.2.3` for the training hard gate. Contract-only routes (`/metadata`, `/schema`, `GET /state`, `POST /mcp`) satisfy `openenv validate --url`; episode logic uses **`/reset`**, **`/step/{session_id}`**, **`/state/{session_id}`** only.
 ## API Endpoints
@@ -192,15 +192,15 @@ openenv validate --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 - **Snorkel AI** — Rotating expert review board (4 criteria)
 - **Patronus AI** — Live schema drift in INC007 at step 18
-## Pitch, plan, and BRD evidence
 Documentation lives under [`docs/`](docs/) (guides, deployment, project status, pitch/demo scripts, blog drafts).
-- **[`docs/pitch/PITCH.md`](docs/pitch/PITCH.md)** — 3-minute spoken script + 2-minute Q&A bullets (BRD §18.1).
-- **[`docs/project/PLAN_OF_ACTION.md`](docs/project/PLAN_OF_ACTION.md)** — BRD compliance matrix + prioritized todo table.
-- **`scripts/export_reward_plot.py`** — export reward curve PNG from `--url` or `episode_rewards.json` (Criterion 3 slides). Canonical chart (tracked in git): **`docs/images/training_reward_curve.png`** (see section above).
-## Final submission checklist (hard-gate safe)
 - [ ] Space URL is live and included in final form: `https://kunalkachru23-nexus-enhanced-stage.hf.space/`
 - [ ] `openenv validate .` passes locally.
@@ -216,7 +216,7 @@ Documentation lives under [`docs/`](docs/) (guides, deployment, project status,
 ## Blog Post
-See [`docs/blog/blog_post_hf.md`](docs/blog/blog_post_hf.md) for the publish-ready HuggingFace blog draft (includes reward model deep-dive, training methodology, and demo walkthrough). Publish and add the public URL for BRD §17.3.
 ## Team

 - **Live environment (HF Space):** https://kunalkachru23-nexus-enhanced-stage.hf.space/
 - **3-minute pitch script:** [`docs/pitch/PITCH.md`](docs/pitch/PITCH.md)
 - **2-minute demo walkthrough:** [`docs/pitch/DEMO_WALKTHROUGH.md`](docs/pitch/DEMO_WALKTHROUGH.md)
+- **Compliance + judging evidence index:** [`docs/project/JUDGING_EVIDENCE_INDEX.md`](docs/project/JUDGING_EVIDENCE_INDEX.md)
 - **Behavioral delta (before vs after):** [`docs/project/BEHAVIORAL_DELTA_PROOF.md`](docs/project/BEHAVIORAL_DELTA_PROOF.md)
 - **Compliance lock matrix:** [`docs/project/COMPLIANCE_LOCK_MATRIX.md`](docs/project/COMPLIANCE_LOCK_MATRIX.md)
 - **HF blog draft (publish-ready):** [`docs/blog/blog_post_hf.md`](docs/blog/blog_post_hf.md)
 Caption: blue line is per-episode reward, green is rolling average, red dashed line is baseline (`0.265`).
+## OpenEnv (reproduce)
+Per **hackathon compliance criteria**, the submission uses **OpenEnv (latest release)** in the toolchain—not only a custom HTTP server. Reproduce validation with the commands below.
 **Local (dev machine, after `pip install "openenv>=0.2.3"`):**
 **Deploying with OpenEnv:** use `openenv push . --repo-id <user>/<space> --exclude .hfignore` (or **`./gate.sh --push`**, which adds `--exclude` for you). OpenEnv does not load `.hfignore` unless you pass it via `--exclude`; omitting it does **not** break the build, it only uploads extra paths (less lean). See `docs/guides/QUICK_START.md` for a short rationale.
+`requirements.txt` **omits** `openenv` on the Space Docker image to keep builds reliable; the **Colab notebook** installs `openenv>=0.2.3` to satisfy the **Colab + OpenEnv** portion of compliance. Contract-only routes (`/metadata`, `/schema`, `GET /state`, `POST /mcp`) satisfy `openenv validate --url`; episode logic uses **`/reset`**, **`/step/{session_id}`**, **`/state/{session_id}`** only.
 ## API Endpoints
 - **Snorkel AI** — Rotating expert review board (4 criteria)
 - **Patronus AI** — Live schema drift in INC007 at step 18
+## Pitch, plan, and compliance evidence
 Documentation lives under [`docs/`](docs/) (guides, deployment, project status, pitch/demo scripts, blog drafts).
+- **[`docs/pitch/PITCH.md`](docs/pitch/PITCH.md)** — 3-minute spoken script + 2-minute Q&A bullets (organizer pitch format).
+- **[`docs/project/PLAN_OF_ACTION.md`](docs/project/PLAN_OF_ACTION.md)** — hackathon compliance matrix + prioritized todo table.
+- **`scripts/export_reward_plot.py`** — export reward curve PNG from `--url` or `episode_rewards.json` (slides / observable improvement evidence). Canonical chart (tracked in git): **`docs/images/training_reward_curve.png`** (see section above).
+## Final submission checklist (compliance-ready)
 - [ ] Space URL is live and included in final form: `https://kunalkachru23-nexus-enhanced-stage.hf.space/`
 - [ ] `openenv validate .` passes locally.
 ## Blog Post
+See [`docs/blog/blog_post_hf.md`](docs/blog/blog_post_hf.md) for the publish-ready HuggingFace blog draft (includes reward model deep-dive, training methodology, and demo walkthrough). Publish and add the public URL to your submission package (blog or short video, per organizer requirements).
 ## Team

docs/blog/blog_post_hf.md CHANGED Viewed

@@ -9,8 +9,8 @@ authors:
 *Team Falcons — Meta PyTorch OpenEnv Hackathon Grand Finale, April 2026*
-**Live Demo:** [huggingface.co/spaces/kunalkachru23/nexus-enhanced](https://huggingface.co/spaces/kunalkachru23/nexus-enhanced)
-**Training Notebook:** [grpo_colab_v2.ipynb](https://huggingface.co/spaces/kunalkachru23/nexus-enhanced/blob/main/notebooks/grpo_colab_v2.ipynb)
 ---
@@ -98,7 +98,7 @@ Before on-site GPU training, 30 scripted baseline episodes established the floor
 Expected trained model performance after 200 GRPO steps: **0.55–0.75**
-The gap shows clear, observable training signal — satisfying BRD Criterion 3 (observable reward improvement evidence).
 ---
@@ -141,10 +141,10 @@ The expert board notices IC weaknesses and shifts its evaluation focus — simul
 ```bash
 # Live demo — no install needed
-curl -X POST https://kunalkachru23-nexus-enhanced.hf.space/demo/run/INC003 | python -m json.tool
 # Or open the web dashboard
-https://huggingface.co/spaces/kunalkachru23/nexus-enhanced
 ```
 **Training notebook** (GRPO on Qwen2.5-1.5B, cells 1–3 run without GPU):

 *Team Falcons — Meta PyTorch OpenEnv Hackathon Grand Finale, April 2026*
+**Live Demo:** [huggingface.co/spaces/kunalkachru23/nexus-enhanced-stage](https://huggingface.co/spaces/kunalkachru23/nexus-enhanced-stage)
+**Training Notebook:** [grpo_colab_v2.ipynb](https://huggingface.co/spaces/kunalkachru23/nexus-enhanced-stage/blob/main/notebooks/grpo_colab_v2.ipynb)
 ---
 Expected trained model performance after 200 GRPO steps: **0.55–0.75**
+The gap shows clear, observable training signal—supporting the hackathon rubric emphasis on **observable reward improvement**.
 ---
 ```bash
 # Live demo — no install needed
+curl -X POST https://kunalkachru23-nexus-enhanced-stage.hf.space/demo/run/INC003 | python -m json.tool
 # Or open the web dashboard
+https://huggingface.co/spaces/kunalkachru23/nexus-enhanced-stage
 ```
 **Training notebook** (GRPO on Qwen2.5-1.5B, cells 1–3 run without GPU):

docs/deployment/DEPLOYMENT_CHECKLIST.md CHANGED Viewed

@@ -85,7 +85,7 @@ python deploy_to_hf_spaces.py
 #   🎉 Deployment complete!
 # Monitor build progress:
-# https://huggingface.co/spaces/kunalkachru23/nexus-enhanced
 # Look for "Building" → "Running" status (5-10 min)
 ```
@@ -94,12 +94,12 @@ Once HF Space shows "Running" status:
 ```bash
 # Test public health endpoint
-curl https://kunalkachru23-nexus-enhanced.hf.space/health
 # Expected: {"status": "healthy", "environment": "nexus-enhanced", ...}
 # Run full remote test suite
-python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced.hf.space
 # Expected: ✅ ALL 7 TESTS PASS
 ```
@@ -107,7 +107,7 @@ python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced.hf
 ### Step 5: Verify Judge Dashboard
 Open in browser:
 ```
-https://kunalkachru23-nexus-enhanced.hf.space/
 ```
 **Expected to see**:
@@ -125,13 +125,13 @@ After deployment, run:
 ```bash
 # Full test suite against deployed environment
-python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced.hf.space
 # Individual endpoint tests:
-curl https://kunalkachru23-nexus-enhanced.hf.space/health | jq .
-curl https://kunalkachru23-nexus-enhanced.hf.space/metrics | jq .
-curl https://kunalkachru23-nexus-enhanced.hf.space/learning-curve | jq .
-curl -X POST https://kunalkachru23-nexus-enhanced.hf.space/reset \
   -H "Content-Type: application/json" \
   -d '{"incident_id": "INC003"}' | jq .
 ```
@@ -166,14 +166,14 @@ curl -X POST https://kunalkachru23-nexus-enhanced.hf.space/reset \
 ### After Phase 7 Passes:
 1. **Update Colab Notebook** (grpo_colab_v2.ipynb)
-   - Verify `BASE_URL = "https://kunalkachru23-nexus-enhanced.hf.space"`
    - Run connectivity check cell
    - Expected: ✅ Connected message
 2. **Start Training**
    - Run all cells in Colab
    - Training produces reward curves
-   - Monitor: https://kunalkachru23-nexus-enhanced.hf.space/learning-curve
 3. **Expected Training Results**
    - First 5-10 episodes: ~0.2 reward (baseline)
@@ -181,7 +181,7 @@ curl -X POST https://kunalkachru23-nexus-enhanced.hf.space/reset \
    - Episodes 30+: Convergence around 0.6-0.8
    - Total training time: ~6 hours for 50 episodes on Colab GPU
-4. **Blog Post** (Hard Gate)
    - Topic: "How NEXUS Enhanced Trains Multi-Agent Incident Response via GRPO"
    - Sections:
      - Problem statement (CrowdStrike scale)
@@ -191,7 +191,7 @@ curl -X POST https://kunalkachru23-nexus-enhanced.hf.space/reset \
    - Length: ~800-1200 words (< 2 min read)
    - Publish to HF blog or Medium
-5. **Pitch Video** (Hard Gate)
    - Duration: 3 minutes max
    - Content:
      - Show judge dashboard at `/`
@@ -222,7 +222,7 @@ curl -X POST https://kunalkachru23-nexus-enhanced.hf.space/reset \
 | Service | URL | Purpose |
 |---------|-----|---------|
-| Judge Dashboard | `https://kunalkachru23-nexus-enhanced.hf.space/` | Live metrics + curves |
 | API Health | `.../health` | Connectivity check |
 | Metrics | `.../metrics` | Training stats |
 | Learning Curve | `.../learning-curve` | Reward history |

 #   🎉 Deployment complete!
 # Monitor build progress:
+# https://huggingface.co/spaces/kunalkachru23/nexus-enhanced-stage
 # Look for "Building" → "Running" status (5-10 min)
 ```
 ```bash
 # Test public health endpoint
+curl https://kunalkachru23-nexus-enhanced-stage.hf.space/health
 # Expected: {"status": "healthy", "environment": "nexus-enhanced", ...}
 # Run full remote test suite
+python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 # Expected: ✅ ALL 7 TESTS PASS
 ```
 ### Step 5: Verify Judge Dashboard
 Open in browser:
 ```
+https://kunalkachru23-nexus-enhanced-stage.hf.space/
 ```
 **Expected to see**:
 ```bash
 # Full test suite against deployed environment
+python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 # Individual endpoint tests:
+curl https://kunalkachru23-nexus-enhanced-stage.hf.space/health | jq .
+curl https://kunalkachru23-nexus-enhanced-stage.hf.space/metrics | jq .
+curl https://kunalkachru23-nexus-enhanced-stage.hf.space/learning-curve | jq .
+curl -X POST https://kunalkachru23-nexus-enhanced-stage.hf.space/reset \
   -H "Content-Type: application/json" \
   -d '{"incident_id": "INC003"}' | jq .
 ```
 ### After Phase 7 Passes:
 1. **Update Colab Notebook** (grpo_colab_v2.ipynb)
+   - Verify `BASE_URL = "https://kunalkachru23-nexus-enhanced-stage.hf.space"`
    - Run connectivity check cell
    - Expected: ✅ Connected message
 2. **Start Training**
    - Run all cells in Colab
    - Training produces reward curves
+   - Monitor: https://kunalkachru23-nexus-enhanced-stage.hf.space/learning-curve
 3. **Expected Training Results**
    - First 5-10 episodes: ~0.2 reward (baseline)
    - Episodes 30+: Convergence around 0.6-0.8
    - Total training time: ~6 hours for 50 episodes on Colab GPU
+4. **Blog Post** (submission requirement)
    - Topic: "How NEXUS Enhanced Trains Multi-Agent Incident Response via GRPO"
    - Sections:
      - Problem statement (CrowdStrike scale)
    - Length: ~800-1200 words (< 2 min read)
    - Publish to HF blog or Medium
+5. **Pitch Video** (submission requirement)
    - Duration: 3 minutes max
    - Content:
      - Show judge dashboard at `/`
 | Service | URL | Purpose |
 |---------|-----|---------|
+| Judge Dashboard | `https://kunalkachru23-nexus-enhanced-stage.hf.space/` | Live metrics + curves |
 | API Health | `.../health` | Connectivity check |
 | Metrics | `.../metrics` | Training stats |
 | Learning Curve | `.../learning-curve` | Reward history |

docs/deployment/HF_SPACES_DEPLOYMENT.md CHANGED Viewed

@@ -63,18 +63,18 @@ Once HF Spaces shows "Running" status:
 ```bash
 # Test judge dashboard endpoint
-curl -s https://kunalkachru23-nexus-enhanced.hf.space/health | jq .
 # Test reset endpoint
-curl -s -X POST https://kunalkachru23-nexus-enhanced.hf.space/reset \
   -H "Content-Type: application/json" \
   -d '{"incident_id": "INC003"}' | jq .
 # Test metrics endpoint
-curl -s https://kunalkachru23-nexus-enhanced.hf.space/metrics | jq .
 # Test learning curve
-curl -s https://kunalkachru23-nexus-enhanced.hf.space/learning-curve | jq .
 ```
 ### Step 6: Update Colab Notebook
@@ -82,7 +82,7 @@ curl -s https://kunalkachru23-nexus-enhanced.hf.space/learning-curve | jq .
 In `notebooks/grpo_colab_v2.ipynb`, ensure BASE_URL points to deployed space:
 ```python
-BASE_URL = "https://kunalkachru23-nexus-enhanced.hf.space"  # YOUR SPACE URL
 ```
 Then run connectivity check:
@@ -102,7 +102,7 @@ cat > test_hf_space_deployment.py << 'EOF'
 import requests
 import json
-BASE_URL = "https://kunalkachru23-nexus-enhanced.hf.space"
 def test_health():
     resp = requests.get(f"{BASE_URL}/health")
@@ -193,8 +193,8 @@ python test_hf_space_deployment.py
 While Colab trains:
-1. **Watch reward curves**: https://kunalkachru23-nexus-enhanced.hf.space/learning-curve
-2. **Check metrics**: `curl https://kunalkachru23-nexus-enhanced.hf.space/metrics`
 3. **Monitor Colab logs** for reward_fn errors
 4. **Expected pattern**: First 5-10 episodes ~0.2 reward, then gradual improvement to 0.6-0.8
@@ -212,5 +212,5 @@ git push origin main
 ## Next Steps
 - **Phase 7**: Run full regression tests against deployed HF Space
-- **Blog Post**: Write HF blog explaining NEXUS architecture (hard gate)
-- **Pitch**: Prepare 3-minute demo for judges (hard gate)

 ```bash
 # Test judge dashboard endpoint
+curl -s https://kunalkachru23-nexus-enhanced-stage.hf.space/health | jq .
 # Test reset endpoint
+curl -s -X POST https://kunalkachru23-nexus-enhanced-stage.hf.space/reset \
   -H "Content-Type: application/json" \
   -d '{"incident_id": "INC003"}' | jq .
 # Test metrics endpoint
+curl -s https://kunalkachru23-nexus-enhanced-stage.hf.space/metrics | jq .
 # Test learning curve
+curl -s https://kunalkachru23-nexus-enhanced-stage.hf.space/learning-curve | jq .
 ```
 ### Step 6: Update Colab Notebook
 In `notebooks/grpo_colab_v2.ipynb`, ensure BASE_URL points to deployed space:
 ```python
+BASE_URL = "https://kunalkachru23-nexus-enhanced-stage.hf.space"  # YOUR SPACE URL
 ```
 Then run connectivity check:
 import requests
 import json
+BASE_URL = "https://kunalkachru23-nexus-enhanced-stage.hf.space"
 def test_health():
     resp = requests.get(f"{BASE_URL}/health")
 While Colab trains:
+1. **Watch reward curves**: https://kunalkachru23-nexus-enhanced-stage.hf.space/learning-curve
+2. **Check metrics**: `curl https://kunalkachru23-nexus-enhanced-stage.hf.space/metrics`
 3. **Monitor Colab logs** for reward_fn errors
 4. **Expected pattern**: First 5-10 episodes ~0.2 reward, then gradual improvement to 0.6-0.8
 ## Next Steps
 - **Phase 7**: Run full regression tests against deployed HF Space
+- **Blog Post**: Write HF blog explaining NEXUS architecture (per submission requirements)
+- **Pitch**: Prepare 3-minute demo for judges (per submission requirements)

docs/guides/QUICK_START.md CHANGED Viewed

@@ -48,7 +48,7 @@ openenv push . --repo-id kunalkachru23/nexus-enhanced-stage --exclude .hfignore
 4. Watch dashboard: https://kunalkachru23-nexus-enhanced-stage.hf.space/
 ```
-### Export reward plot for slides (BRD Criterion 3)
 ```bash
 python scripts/export_reward_plot.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 # or from local episode_rewards.json:
@@ -188,7 +188,7 @@ python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced-st
 | Reward Progress | **20%** | Chart.js curves on dashboard ← **KEY** |
 | Pipeline | 10% | GRPO on Colab GPU → HF Space API |
-**🎯 Priority**: Ensure reward curves are visible and improving to maximize Criterion 3 score.
 ---

 4. Watch dashboard: https://kunalkachru23-nexus-enhanced-stage.hf.space/
 ```
+### Export reward plot for slides (observable improvement evidence)
 ```bash
 python scripts/export_reward_plot.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 # or from local episode_rewards.json:
 | Reward Progress | **20%** | Chart.js curves on dashboard ← **KEY** |
 | Pipeline | 10% | GRPO on Colab GPU → HF Space API |
+**🎯 Priority**: Ensure reward curves are visible and improving to support the observable-improvement rubric row.
 ---

docs/pitch/DEMO_WALKTHROUGH.md CHANGED Viewed

@@ -69,7 +69,7 @@ Say:
 ## [1:50-2:00] Close
 Say:
-"We satisfy OpenEnv hard gates, show live reward improvement, and can directly inspect behavior delta in the running environment."
 ---

 ## [1:50-2:00] Close
 Say:
+"We satisfy OpenEnv validation requirements, show live reward improvement, and can directly inspect behavior delta in the running environment."
 ---

docs/pitch/PITCH.md CHANGED Viewed

@@ -1,7 +1,7 @@
-# NEXUS Enhanced — 3-minute pitch + 2-minute Q&A (BRD §18.1)
 **Event:** Meta PyTorch OpenEnv Hackathon × Scaler — Grand Finale
-**Format (verbatim BRD):** 3 min pitch + 2 min Q&A = 5 min total.
 ---
@@ -29,7 +29,7 @@
 ---
-## Observable evidence (~35 s) — Criterion 3
 - Show **dashboard** reward curve and rolling average vs **baseline** (pre-event benchmark).
 - Use one canonical metrics callout (snapshot `2026-04-24T16:48:26Z`, stage URL): **episodes 387**, **avg 0.4634**, **best 1.0032**, **+74.9% vs baseline 0.265**.
@@ -38,7 +38,7 @@
 ---
-## Training & improvement (~30 s) — Criteria 3 & 4
 - **Colab** runs minimal GRPO training against the **real** remote environment API (not a mocked reward).
 - Improvement is **observable** on the curve and in **behaviour** (shorter paths, better notifications, fewer oversight violations) — tie any checkpoint story to **what the IC does differently**, not only the scalar.
@@ -47,7 +47,7 @@
 ## Close (~20 s)
-> “NEXUS is **OpenEnv-shaped**: isolated episodes, structured actions, measurable outcomes, and a problem that stays hard after the novelty wears off. We meet the **hard gates** — OpenEnv latest in the toolchain + Colab TRL/Unsloth + HF blog/video slot — and we optimised for the **40% environment** and **30% storytelling** bars with a demo you can **drive live** in under two minutes.”
 **Stop at 3:00. Breathe. Hand off for Q&A.**
@@ -65,7 +65,7 @@ Answer in **short paragraphs**; do not invent numbers not on the dashboard.
 | **Reward hacking?** | Sparse terminal reward + oversight + coalition + tool budgets; red herrings in harder incidents. |
 | **What is partial observability?** | IC observation is a slice; specialists see tool outputs for their role; IC never sees everything at once. |
 | **INC007 in 30 s?** | Nightmare incident: multi-region blast radius; **schema drift** forces contract change mid-episode — reserved for sharp Q&A, not the full live path if time is short. |
-| **Why GRPO / TRL / Unsloth?** | BRD hard gate #2: minimal training in Colab with **HF TRL** and **Unsloth** for efficient QLoRA on Qwen-class IC policy. |
 | **What if the Space is slow?** | Training is async from Colab; dashboard refreshes on timer; auto-demo is one POST chain. |
 | **Baseline 0.265?** | Pre-event scripted benchmark documented in server; curve compares **trained vs that baseline** for “observable improvement.” |
 | **Single strongest differentiator?** | Multi-agent + sparse reward + **schema drift** on INC007 in one OpenEnv-hosted stack judges can open in the browser. |

+# NEXUS Enhanced — 3-minute pitch + 2-minute Q&A
 **Event:** Meta PyTorch OpenEnv Hackathon × Scaler — Grand Finale
+**Format (per hackathon compliance):** 3 min pitch + 2 min Q&A = 5 min total.
 ---
 ---
+## Observable evidence (~35 s) — reward improvement (judging rubric)
 - Show **dashboard** reward curve and rolling average vs **baseline** (pre-event benchmark).
 - Use one canonical metrics callout (snapshot `2026-04-24T16:48:26Z`, stage URL): **episodes 387**, **avg 0.4634**, **best 1.0032**, **+74.9% vs baseline 0.265**.
 ---
+## Training & improvement (~30 s) — improvement + pipeline coherence
 - **Colab** runs minimal GRPO training against the **real** remote environment API (not a mocked reward).
 - Improvement is **observable** on the curve and in **behaviour** (shorter paths, better notifications, fewer oversight violations) — tie any checkpoint story to **what the IC does differently**, not only the scalar.
 ## Close (~20 s)
+> “NEXUS is **OpenEnv-shaped**: isolated episodes, structured actions, measurable outcomes, and a problem that stays hard after the novelty wears off. We meet **hackathon compliance**: OpenEnv latest in the toolchain, Colab TRL/Unsloth training, and the required HF blog or short video slot—and we optimised for the **40% environment** and **30% storytelling** rubric weights with a demo you can **drive live** in under two minutes.”
 **Stop at 3:00. Breathe. Hand off for Q&A.**
 | **Reward hacking?** | Sparse terminal reward + oversight + coalition + tool budgets; red herrings in harder incidents. |
 | **What is partial observability?** | IC observation is a slice; specialists see tool outputs for their role; IC never sees everything at once. |
 | **INC007 in 30 s?** | Nightmare incident: multi-region blast radius; **schema drift** forces contract change mid-episode — reserved for sharp Q&A, not the full live path if time is short. |
+| **Why GRPO / TRL / Unsloth?** | Per compliance: minimal training in Colab with **HF TRL** and **Unsloth** for efficient QLoRA on Qwen-class IC policy. |
 | **What if the Space is slow?** | Training is async from Colab; dashboard refreshes on timer; auto-demo is one POST chain. |
 | **Baseline 0.265?** | Pre-event scripted benchmark documented in server; curve compares **trained vs that baseline** for “observable improvement.” |
 | **Single strongest differentiator?** | Multi-agent + sparse reward + **schema drift** on INC007 in one OpenEnv-hosted stack judges can open in the browser. |

docs/pitch/PITCH_3MIN.md CHANGED Viewed

@@ -86,7 +86,7 @@ The trained model learns to query the right service, notify customers proactivel
 For academia, it's a benchmarkable environment. For enterprise, it's a path toward AI-assisted incident management. For Meta and PyTorch, it demonstrates OpenEnv's potential for real-world complexity.
-**Live demo:** https://kunalkachru23-nexus-enhanced.hf.space/
 Thank you."

 For academia, it's a benchmarkable environment. For enterprise, it's a path toward AI-assisted incident management. For Meta and PyTorch, it demonstrates OpenEnv's potential for real-world complexity.
+**Live demo:** https://kunalkachru23-nexus-enhanced-stage.hf.space/
 Thank you."

docs/pitch/VIDEO_RECORD_NOW_PACK.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# NEXUS Enhanced — Record-Now Video Pack
+Target length: 1:45 to 2:00
+Presenter style: calm, clear, outcome-first
+Audience: judges / hackathon reviewers
+---
+## 1) Pre-record setup (2-3 minutes)
+Keep only these windows ready:
+- Browser tab 1: `https://kunalkachru23-nexus-enhanced-stage.hf.space/web`
+- Browser tab 2: `https://kunalkachru23-nexus-enhanced-stage.hf.space/metrics`
+- Terminal tab with this command pre-pasted:
+```bash
+curl -s -X POST https://kunalkachru23-nexus-enhanced-stage.hf.space/demo/run/INC003 | python -m json.tool
+```
+Visual quality checklist:
+- Screen recording at 1080p.
+- Terminal font size: 16+.
+- Browser zoom: 125%.
+- Hide desktop notifications.
+- Use dark mode consistently (optional, but cleaner).
+Canonical narration numbers (freeze these):
+- Episodes: `387`
+- Average reward: `0.4634`
+- Best reward: `1.0032`
+- Baseline: `0.265`
+- Improvement: `+74.9%`
+Source: `docs/project/snapshots/submission_snapshot_20260424T164826Z.md`
+---
+## 2) One-take timeline (timestamped)
+### 0:00-0:12 — Hook (camera or title card)
+Say:
+"Most incident-response AI demos are single-agent and unrealistic. NEXUS Enhanced is a multi-agent OpenEnv environment where an Incident Commander coordinates specialists under partial observability, business constraints, and schema drift."
+On screen:
+- Title card or the `/web` dashboard landing view.
+### 0:12-0:35 — What you built
+Say:
+"We built seven incident scenarios, from easier outages to a nightmare schema-drift incident. The system is deployed on Hugging Face Spaces and validated with OpenEnv-compatible workflow checks. Training runs through TRL GRPO with Unsloth in Colab."
+On screen:
+- Briefly show `/metrics` page and then back to `/web`.
+### 0:35-1:00 — Measurable improvement
+Say:
+"In the current frozen snapshot we have 387 completed episodes, average reward 0.4634 versus baseline 0.265, and best reward 1.0032. That's a 74.9% uplift over baseline."
+On screen:
+- Show dashboard training metrics and curve.
+### 1:00-1:30 — Behavioral evidence
+Say:
+"The key claim is behavior change, not only scalar gain. In INC003, the policy commits earlier to the memory-leak hypothesis, sequences runbook steps cleanly, sends proactive customer notifications, and reaches postmortem with fewer redundant actions."
+On screen:
+- Run the terminal command:
+```bash
+curl -s -X POST https://kunalkachru23-nexus-enhanced-stage.hf.space/demo/run/INC003 | python -m json.tool
+```
+- Point to:
+  - `final_phase: "postmortem"`
+  - `reward_breakdown.total`
+  - `coalition_correct: true`
+  - `notifications_sent`
+### 1:30-1:52 — Safety and robustness
+Say:
+"To reduce reward hacking, diagnosis is evidence-gated, customer score requires actual notification actions, and coordination penalizes duplicate tool calls. Oversight checks also affect final scoring."
+On screen:
+- Keep terminal result visible (reward breakdown + oversight report).
+### 1:52-2:00 — Close
+Say:
+"NEXUS combines innovation, observable improvement, and reproducible deployment for real incident-management RL. Thanks for watching."
+On screen:
+- End card with:
+  - Stage URL
+  - Repo URL
+  - Evidence docs path
+---
+## 3) Backup branch (if UI or network is slow)
+If dashboard lags, use terminal-only sequence:
+```bash
+curl -s https://kunalkachru23-nexus-enhanced-stage.hf.space/health | python -m json.tool
+curl -s https://kunalkachru23-nexus-enhanced-stage.hf.space/metrics | python -m json.tool
+curl -s https://kunalkachru23-nexus-enhanced-stage.hf.space/learning-curve | python -m json.tool
+curl -s -X POST https://kunalkachru23-nexus-enhanced-stage.hf.space/demo/run/INC003 | python -m json.tool
+```
+Narration line for fallback:
+"Even without UI, these live endpoints show health, metrics, learning progression, and full behavioral transcript from auto-demo INC003."
+---
+## 4) Retake checklist (30 seconds)
+Before final export, verify:
+- Audio is clear and steady.
+- Duration stays under 2:00.
+- Numbers match frozen snapshot.
+- Stage URL shown at least once.
+- Demo command succeeds in-frame.
+- No mention of internal-only terminology.
+---
+## 5) Upload metadata (copy/paste)
+Published video URL:
+`https://www.youtube.com/watch?v=a9YZF30tomw`
+Title:
+`NEXUS Enhanced Demo — Multi-Agent Incident Response RL (OpenEnv + GRPO)`
+Description:
+`Live demo of NEXUS Enhanced on HF Space. Shows OpenEnv-compatible environment checks, observable reward improvement, and transcript-level behavioral evidence on INC003. Stage URL: https://kunalkachru23-nexus-enhanced-stage.hf.space`
+Tags:
+`openenv, reinforcement learning, multi-agent, incident response, grpo, trl, unsloth, huggingface`

docs/project/BEHAVIORAL_DELTA_PROOF.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# Criterion 4 Behavioral Delta Proof
-This sheet demonstrates BRD §18.2 Criterion 4 intent: measurable improvement in **how the agent acts**, not only reward numbers.
 ## Fixed evaluation set (canonical)
@@ -60,9 +60,9 @@ These support that gain is not only speed; diagnosis/coordination/customer dimen
    - `GET /learning-curve`
    - confirm aggregate improvement trend and rolling average.
-## Why this satisfies Criterion 4
-Criterion 4 asks for coherent reward logic and meaningful pipeline-driven behavior change.
 NEXUS evidence shows:
 - Coherent multi-dimensional reward decomposition (MTTR, diagnosis, customer, coordination, oversight, depth).

+# Behavioral delta proof (pipeline coherence)
+This sheet demonstrates **hackathon judging intent** for reward-and-pipeline coherence: measurable improvement in **how the agent acts**, not only reward numbers.
 ## Fixed evaluation set (canonical)
    - `GET /learning-curve`
    - confirm aggregate improvement trend and rolling average.
+## Why this satisfies the pipeline-coherence rubric row
+The judging rubric asks for coherent reward logic and meaningful pipeline-driven behavior change.
 NEXUS evidence shows:
 - Coherent multi-dimensional reward decomposition (MTTR, diagnosis, customer, coordination, oversight, depth).

docs/project/COMPLIANCE_LOCK_MATRIX.md CHANGED Viewed

@@ -1,8 +1,8 @@
-# Compliance Lock Matrix (BRD-Aligned)
-Purpose: freeze hard-gate and scoring-criterion traceability so implementation changes remain compliant.
-## Hard gates (pass/fail)
 | Gate | Requirement | Project evidence | Verification command |
 |---|---|---|---|
@@ -10,11 +10,11 @@ Purpose: freeze hard-gate and scoring-criterion traceability so implementation c
 | Colab training script | Minimal Colab script with TRL/Unsloth path | `notebooks/grpo_colab_v2.ipynb` | Notebook config + run cells |
 | Public artifact | HF blog or <2 min video | `docs/blog/*`, `docs/pitch/YOUTUBE_RECORDING_SCRIPT.md` | Submission URL checklist |
-## Weighted scoring criteria map (BRD §18 — evaluation criteria)
 These four rows are what Cerebral Valley–aggregated scoring uses. The **Demonstration** column is how a judge should *see* each criterion in the live Space or repo in under five minutes.
-| Criterion | BRD weight | What judges need (verbatim intent) | NEXUS evidence | How the design demonstrates it (demo / artefact) |
 |---|---:|---|---|---|
 | **1 — Environment Innovation** | 40% | Novel, creative, or challenging; meaningfully tests agent behaviour | `server/environment.py`, `server/incidents.py`, `server/agents.py`, `server/tools.py`, `docs/project/SUBTHEME_EVIDENCE_MATRIX.md` | Open the dashboard **Training** tab → **manual validation** on **INC008** (Theme 3.2) and INC004–INC007; show **coalition**, **role-scoped tooling**, **INC007 schema drift** in Q&A or deep demo. Eight incidents = difficulty ladder + operational nuance + personalized track. |
 | **2 — Storytelling** | 30% | Clear explanation of problem, environment, and agent behaviour; demo **engaging and easy to follow** | `docs/pitch/PITCH.md`, `docs/pitch/DEMO_WALKTHROUGH.md`, `docs/project/FINAL_OPERATIONS_RUNBOOK.md` | Follow `DEMO_WALKTHROUGH.md`: metrics -> **Validation tab auto-demo** (INC003) -> optional **Guided** steps to completion. One sentence hook: “IC coordinates specialists under partial observability and contract drift.” |
@@ -23,22 +23,22 @@ These four rows are what Cerebral Valley–aggregated scoring uses. The **Demons
 ---
-## BRD content themes (§9–14 + §14 Theme 5) — how NEXUS maps
-Official hackathon **themes** (Google Doc / BRD) are distinct from the **§18 scoring criteria** above. NEXUS is architected as an **enterprise incident-command** environment; the table below states where each parent theme is **primary** (core loop), **secondary** (explicit mechanic but not the headline), or **bridge** (honest pitch link without claiming a different product genre).
-| BRD theme | § | Primary requirement (summary) | NEXUS demonstration | Verification |
 |---|---|---|---|---|
-| **Theme 1 — Multi-agent** | §9 | Cooperation / competition / negotiation / **coalition**; **partial observability**; theory-of-mind style incentives | Five specialist roles + IC; coalition votes on harder incidents; IC observation is a slice, not full state | `server/environment.py`, `server/agents.py`, INC003+ in `server/incidents.py`, manual validation + `/state/{session_id}` |
-| **Theme 2 — Long horizon & instruction following** | §10 | **Sparse / delayed** reward; task **beyond one context**; decomposition & recovery; Scale sub-theme: **non-code** business workflows (incl. **HR & IT**) | Episode-level reward on `done`; long `max_steps` incidents; **server-side** session state and tool history so the task cannot fit in one static transcript; **IT ops** coordination (not a coding benchmark) | `server/reward.py`, `server/app.py` session store, INC006–INC007 length/complexity; see Scale AI row in `SUBTHEME_EVIDENCE_MATRIX.md` |
-| **Theme 3.1 — World modeling (professional)** | §11 | Realistic tools/workflows; **anti-shortcut**; causal / persistent world | Datadog / runbook / portal-style tools; evidence-gated diagnosis; runbook steps; red herrings on harder tracks | `server/tools.py`, `server/reward.py`, `docs/project/REWARD_HACKING_DEFENSE.md` |
-| **Theme 3.2 — World modeling (personalized)** | §12 | Personal delegation / conflicts / messaging-style tasks | **Dedicated incident INC008** (EA calendar: board prep vs school concert, family thread, auto-accept root cause) using `IncidentType.PERSONAL_ASSISTANT`, same multi-agent tool loop as ops incidents. **Plus** enterprise paths: customer **notifications** and SLA framing on INC001–INC007. | Manual validation **INC008** on dashboard; `server/incidents.py` (`INC008`), `server/data_models.py` |
-| **Theme 4 — Self-improvement** | §13 | Curriculum / adaptive difficulty; recursive capability growth | **Process-wide adaptive tier:** `server/global_curriculum.py` + `GET /curriculum` — last-5 rolling avg ≥ 0.55 promotes difficulty across **HTTP/Colab sessions** (not lost per `NexusEnvironment()`). **Plus** seven-incident ladder + **Snorkel-style** rotating `expert_criteria`. Full recursive self-play is out of scope; GRPO improves policy externally. | `server/difficulty.py`, `server/global_curriculum.py`, `server/app.py` `/curriculum`, Colab GRPO |
-| **Theme 5 — Wild card** | §14 | Creative value for LLM training on a defined task | **Primary positioning option:** “Out-of-box” fusion of **multi-agent ops + schema drift + oversight + token-scaled depth bonus** in one OpenEnv-deployable stack | Pitch close in `docs/pitch/PITCH.md`; innovation narrative in criterion 1 |
-### Sub-theme bonuses (§15) — already locked in `SUBTHEME_EVIDENCE_MATRIX.md`
-Fleet, Halluminate, Snorkel, Patronus, Mercor, Scaler AI Labs rows remain the detailed sponsor map. This matrix ties **parent themes §9–14** to the same implementation so evaluators see both **theme** and **sponsor** coverage.
 ### One-line pitch bank (theme → sentence)
@@ -47,7 +47,7 @@ Use in Q&A if asked “which themes?”
 1. **Theme 1:** “IC coordinates five roles under partial observability and coalition mechanics.”
 2. **Theme 2:** “Sparse end-of-episode reward and persistent server state force long-horizon planning beyond a single context.”
 3. **Theme 3.1:** “Tool-bound enterprise workflows with anti-shortcut evidence and runbook discipline.”
-4. **Theme 3.2:** “**INC008** is a BRD-style personal/EA conflict (calendar + family messaging) on the same engine; other incidents stress customer delegation under SLA.”
 5. **Theme 4:** “**`/curriculum`** shows live adaptive difficulty from rolling rewards; expert criteria rotate; GRPO improves the policy.”
 6. **Theme 5:** “Wild-card angle: one deployable environment that fuses the hardest ops themes for LLM incident command training.”

+# Compliance Lock Matrix (hackathon-aligned)
+Purpose: freeze **mandatory requirements** and **judging rubric** traceability so implementation changes stay aligned with hackathon compliance.
+## Mandatory requirements (pass/fail)
 | Gate | Requirement | Project evidence | Verification command |
 |---|---|---|---|
 | Colab training script | Minimal Colab script with TRL/Unsloth path | `notebooks/grpo_colab_v2.ipynb` | Notebook config + run cells |
 | Public artifact | HF blog or <2 min video | `docs/blog/*`, `docs/pitch/YOUTUBE_RECORDING_SCRIPT.md` | Submission URL checklist |
+## Weighted scoring criteria map (judging rubric)
 These four rows are what Cerebral Valley–aggregated scoring uses. The **Demonstration** column is how a judge should *see* each criterion in the live Space or repo in under five minutes.
+| Criterion | Weight | What judges need (intent) | NEXUS evidence | How the design demonstrates it (demo / artefact) |
 |---|---:|---|---|---|
 | **1 — Environment Innovation** | 40% | Novel, creative, or challenging; meaningfully tests agent behaviour | `server/environment.py`, `server/incidents.py`, `server/agents.py`, `server/tools.py`, `docs/project/SUBTHEME_EVIDENCE_MATRIX.md` | Open the dashboard **Training** tab → **manual validation** on **INC008** (Theme 3.2) and INC004–INC007; show **coalition**, **role-scoped tooling**, **INC007 schema drift** in Q&A or deep demo. Eight incidents = difficulty ladder + operational nuance + personalized track. |
 | **2 — Storytelling** | 30% | Clear explanation of problem, environment, and agent behaviour; demo **engaging and easy to follow** | `docs/pitch/PITCH.md`, `docs/pitch/DEMO_WALKTHROUGH.md`, `docs/project/FINAL_OPERATIONS_RUNBOOK.md` | Follow `DEMO_WALKTHROUGH.md`: metrics -> **Validation tab auto-demo** (INC003) -> optional **Guided** steps to completion. One sentence hook: “IC coordinates specialists under partial observability and contract drift.” |
 ---
+## Hackathon content themes — how NEXUS maps
+Official hackathon **themes** (organizer brief) are distinct from the **four scoring criteria** above. NEXUS is architected as an **enterprise incident-command** environment; the table below states where each parent theme is **primary** (core loop), **secondary** (explicit mechanic but not the headline), or **bridge** (honest pitch link without claiming a different product genre).
+| Parent theme | Track | Primary requirement (summary) | NEXUS demonstration | Verification |
 |---|---|---|---|---|
+| **Theme 1 — Multi-agent** | Multi-agent track | Cooperation / competition / negotiation / **coalition**; **partial observability**; theory-of-mind style incentives | Five specialist roles + IC; coalition votes on harder incidents; IC observation is a slice, not full state | `server/environment.py`, `server/agents.py`, INC003+ in `server/incidents.py`, manual validation + `/state/{session_id}` |
+| **Theme 2 — Long horizon & instruction following** | Long-horizon track | **Sparse / delayed** reward; task **beyond one context**; decomposition & recovery; Scale sub-theme: **non-code** business workflows (incl. **HR & IT**) | Episode-level reward on `done`; long `max_steps` incidents; **server-side** session state and tool history so the task cannot fit in one static transcript; **IT ops** coordination (not a coding benchmark) | `server/reward.py`, `server/app.py` session store, INC006–INC007 length/complexity; see Scale AI row in `SUBTHEME_EVIDENCE_MATRIX.md` |
+| **Theme 3.1 — World modeling (professional)** | World modeling | Realistic tools/workflows; **anti-shortcut**; causal / persistent world | Datadog / runbook / portal-style tools; evidence-gated diagnosis; runbook steps; red herrings on harder tracks | `server/tools.py`, `server/reward.py`, `docs/project/REWARD_HACKING_DEFENSE.md` |
+| **Theme 3.2 — World modeling (personalized)** | Personalized track | Personal delegation / conflicts / messaging-style tasks | **Dedicated incident INC008** (EA calendar: board prep vs school concert, family thread, auto-accept root cause) using `IncidentType.PERSONAL_ASSISTANT`, same multi-agent tool loop as ops incidents. **Plus** enterprise paths: customer **notifications** and SLA framing on INC001–INC007. | Manual validation **INC008** on dashboard; `server/incidents.py` (`INC008`), `server/data_models.py` |
+| **Theme 4 — Self-improvement** | Self-improvement track | Curriculum / adaptive difficulty; recursive capability growth | **Process-wide adaptive tier:** `server/global_curriculum.py` + `GET /curriculum` — last-5 rolling avg ≥ 0.55 promotes difficulty across **HTTP/Colab sessions** (not lost per `NexusEnvironment()`). **Plus** seven-incident ladder + **Snorkel-style** rotating `expert_criteria`. Full recursive self-play is out of scope; GRPO improves policy externally. | `server/difficulty.py`, `server/global_curriculum.py`, `server/app.py` `/curriculum`, Colab GRPO |
+| **Theme 5 — Wild card** | Wild card | Creative value for LLM training on a defined task | **Primary positioning option:** “Out-of-box” fusion of **multi-agent ops + schema drift + oversight + token-scaled depth bonus** in one OpenEnv-deployable stack | Pitch close in `docs/pitch/PITCH.md`; innovation narrative in rubric row 1 |
+### Sub-theme bonuses — already locked in `SUBTHEME_EVIDENCE_MATRIX.md`
+Fleet, Halluminate, Snorkel, Patronus, Mercor, Scaler AI Labs rows remain the detailed sponsor map. This matrix ties **parent themes** to the same implementation so evaluators see both **theme** and **sponsor** coverage.
 ### One-line pitch bank (theme → sentence)
 1. **Theme 1:** “IC coordinates five roles under partial observability and coalition mechanics.”
 2. **Theme 2:** “Sparse end-of-episode reward and persistent server state force long-horizon planning beyond a single context.”
 3. **Theme 3.1:** “Tool-bound enterprise workflows with anti-shortcut evidence and runbook discipline.”
+4. **Theme 3.2:** “**INC008** is a personalized EA-style conflict (calendar + family messaging) on the same engine; other incidents stress customer delegation under SLA.”
 5. **Theme 4:** “**`/curriculum`** shows live adaptive difficulty from rolling rewards; expert criteria rotate; GRPO improves the policy.”
 6. **Theme 5:** “Wild-card angle: one deployable environment that fuses the hardest ops themes for LLM incident command training.”

docs/project/CURRICULUM_AND_ABLATION.md CHANGED Viewed

@@ -17,7 +17,7 @@ NEXUS follows a staged difficulty pattern:
 The environment and reward system are explicitly structured to support this progression (`server/environment.py`, `server/difficulty.py`, `server/incidents.py`).
-## Why this curriculum is valid for BRD + Self-Serve guidance
 - Keeps success probability > 0 early (prevents RL stall).
 - Increases branching complexity only after stable basic policy behavior.

 The environment and reward system are explicitly structured to support this progression (`server/environment.py`, `server/difficulty.py`, `server/incidents.py`).
+## Why this curriculum is valid for hackathon compliance + Self-Serve guidance
 - Keeps success probability > 0 early (prevents RL stall).
 - Increases branching complexity only after stable basic policy behavior.

docs/project/FINAL_OPERATIONS_RUNBOOK.md CHANGED Viewed

@@ -65,7 +65,7 @@ Purpose: deterministic final-day operations with explicit fallback paths and no
   - `/health`
   - `/metadata`
   - `/schema`
-- Pivot to hard-gate proof + archived transcript evidence.
 - Do not attempt risky hotfixes during final window.
 ## Roles and ownership

   - `/health`
   - `/metadata`
   - `/schema`
+- Pivot to compliance proof + archived transcript evidence.
 - Do not attempt risky hotfixes during final window.
 ## Roles and ownership

docs/project/FINAL_READINESS_REPORT.md CHANGED Viewed

@@ -45,12 +45,12 @@ Completed deliverables:
 - Network traces show successful API polling and interactions (`200` status on judge-critical endpoints).
 - No blocking console errors observed.
-## BRD compliance status
 - OpenEnv workflow compliance: **pass**
 - Colab training script path (TRL/Unsloth): **present and documented**
 - Public artifact path (blog/video): **script and references prepared**
-- Criterion evidence mapping and traceability: **locked in compliance and evidence docs**
 ## Residual risks

 - Network traces show successful API polling and interactions (`200` status on judge-critical endpoints).
 - No blocking console errors observed.
+## Hackathon compliance status
 - OpenEnv workflow compliance: **pass**
 - Colab training script path (TRL/Unsloth): **present and documented**
 - Public artifact path (blog/video): **script and references prepared**
+- Rubric evidence mapping and traceability: **locked in compliance and evidence docs**
 ## Residual risks

docs/project/IMPLEMENTATION_SUMMARY.md CHANGED Viewed

@@ -76,7 +76,7 @@ NEXUS Enhanced is a multi-agent incident response RL environment for the Meta Py
 **7 Cells**:
 1. Install: unsloth, trl, transformers, matplotlib
 2. Connectivity check: Verify HF Space reachable
-3. `NexusRemoteEnv`: Reset/step interface to PUBLIC `https://kunalkachru23-nexus-enhanced.hf.space`
 4. `reward_fn`: Parse IC action → call remote env → collect reward
 5. Load Qwen2.5-1.5B: Unsloth QLoRA (rank=16, 4-bit, targets q_proj/k_proj/v_proj/o_proj)
 6. GRPOTrainer: learning_rate=5e-5, batch_size=2, num_generations=4
@@ -104,10 +104,10 @@ NEXUS Enhanced is a multi-agent incident response RL environment for the Meta Py
 ### Phase 6: HF Spaces Deployment (Ready) 🚀
 **Steps**:
-1. Push code to https://huggingface.co/spaces/kunalkachru23/nexus-enhanced
 2. HF Spaces auto-builds Docker image
 3. Services available at:
-   - Judge dashboard: `https://kunalkachru23-nexus-enhanced.hf.space/` (port 7860)
    - Metrics: `/metrics`, `/learning-curve`, `/health`
    - API: `/reset`, `/step/{session_id}`
@@ -128,7 +128,7 @@ NEXUS Enhanced is a multi-agent incident response RL environment for the Meta Py
 6. ✅ HTML dashboard (`GET /`)
 7. ✅ Full episode execution (20 steps)
-**Run**: `python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced.hf.space`
 ---
@@ -239,7 +239,7 @@ git commit -m "Phase 5-7: Docker multi-service setup + deployment tests"
 # Push to HF Spaces repo
 git push origin main
-# Monitor build: https://huggingface.co/spaces/kunalkachru23/nexus-enhanced
 # Takes ~5-10 minutes for Docker build
 ```
@@ -249,7 +249,7 @@ Once HF Spaces shows "Running":
 ```bash
 # Test all endpoints
-python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced.hf.space
 # Expected: ✅ ALL TESTS PASS
 ```
@@ -260,9 +260,9 @@ Once Phase 7 tests pass:
 ```
 1. Open notebooks/grpo_colab_v2.ipynb
-2. Verify BASE_URL = "https://kunalkachru23-nexus-enhanced.hf.space"
 3. Run all cells (Unsloth + TRL GRPO training)
-4. Monitor reward curves at: https://kunalkachru23-nexus-enhanced.hf.space/learning-curve
 5. Expected trajectory: baseline 0.28 → improve to 0.6-0.8 over 50-100 episodes
 ```
@@ -277,7 +277,7 @@ Once Phase 7 tests pass:
 | **Reward Progress** | 20% | Observable Chart.js curves + MTTR improvements | ✅ Dashboard ready |
 | **Pipeline** | 10% | GRPO on Colab GPU → HF Space API | ✅ Tests ready |
-**Hard Gates**:
 - ✅ OpenEnv v0.2.3 compatible
 - ✅ HuggingFace TRL GRPO training
 - ✅ Trained checkpoint (TODO: save during training)
@@ -300,19 +300,19 @@ bash test_local_deployment.sh
 ### HF Space Testing
 ```bash
 # Against deployed environment
-python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced.hf.space
 ```
 ### Manual Verification
 ```bash
 # FastAPI health
-curl https://kunalkachru23-nexus-enhanced.hf.space/health
 # Judge dashboard
-open https://kunalkachru23-nexus-enhanced.hf.space/
 # Metrics snapshot
-curl https://kunalkachru23-nexus-enhanced.hf.space/metrics
 ```
 ---
@@ -350,7 +350,7 @@ curl https://kunalkachru23-nexus-enhanced.hf.space/metrics
 ## Questions for User
-1. **HF Space URL**: Is `kunalkachru23-nexus-enhanced` the correct space slug?
 2. **Training time**: Target training duration on Colab GPU (default ~6 hours for 50 episodes)?
 3. **Checkpoint save**: Should checkpoint be saved to HF model hub or kept local?
 4. **Blog post**: Topic preference (technical deep-dive vs. storytelling narrative)?

 **7 Cells**:
 1. Install: unsloth, trl, transformers, matplotlib
 2. Connectivity check: Verify HF Space reachable
+3. `NexusRemoteEnv`: Reset/step interface to PUBLIC `https://kunalkachru23-nexus-enhanced-stage.hf.space`
 4. `reward_fn`: Parse IC action → call remote env → collect reward
 5. Load Qwen2.5-1.5B: Unsloth QLoRA (rank=16, 4-bit, targets q_proj/k_proj/v_proj/o_proj)
 6. GRPOTrainer: learning_rate=5e-5, batch_size=2, num_generations=4
 ### Phase 6: HF Spaces Deployment (Ready) 🚀
 **Steps**:
+1. Push code to https://huggingface.co/spaces/kunalkachru23/nexus-enhanced-stage
 2. HF Spaces auto-builds Docker image
 3. Services available at:
+   - Judge dashboard: `https://kunalkachru23-nexus-enhanced-stage.hf.space/` (port 7860)
    - Metrics: `/metrics`, `/learning-curve`, `/health`
    - API: `/reset`, `/step/{session_id}`
 6. ✅ HTML dashboard (`GET /`)
 7. ✅ Full episode execution (20 steps)
+**Run**: `python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space`
 ---
 # Push to HF Spaces repo
 git push origin main
+# Monitor build: https://huggingface.co/spaces/kunalkachru23/nexus-enhanced-stage
 # Takes ~5-10 minutes for Docker build
 ```
 ```bash
 # Test all endpoints
+python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 # Expected: ✅ ALL TESTS PASS
 ```
 ```
 1. Open notebooks/grpo_colab_v2.ipynb
+2. Verify BASE_URL = "https://kunalkachru23-nexus-enhanced-stage.hf.space"
 3. Run all cells (Unsloth + TRL GRPO training)
+4. Monitor reward curves at: https://kunalkachru23-nexus-enhanced-stage.hf.space/learning-curve
 5. Expected trajectory: baseline 0.28 → improve to 0.6-0.8 over 50-100 episodes
 ```
 | **Reward Progress** | 20% | Observable Chart.js curves + MTTR improvements | ✅ Dashboard ready |
 | **Pipeline** | 10% | GRPO on Colab GPU → HF Space API | ✅ Tests ready |
+**Mandatory submission requirements**:
 - ✅ OpenEnv v0.2.3 compatible
 - ✅ HuggingFace TRL GRPO training
 - ✅ Trained checkpoint (TODO: save during training)
 ### HF Space Testing
 ```bash
 # Against deployed environment
+python test_hf_space_deployment.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space
 ```
 ### Manual Verification
 ```bash
 # FastAPI health
+curl https://kunalkachru23-nexus-enhanced-stage.hf.space/health
 # Judge dashboard
+open https://kunalkachru23-nexus-enhanced-stage.hf.space/
 # Metrics snapshot
+curl https://kunalkachru23-nexus-enhanced-stage.hf.space/metrics
 ```
 ---
 ## Questions for User
+1. **HF Space URL**: Canonical judge/demo Space is `kunalkachru23/nexus-enhanced-stage` (`kunalkachru23-nexus-enhanced-stage.hf.space`).
 2. **Training time**: Target training duration on Colab GPU (default ~6 hours for 50 episodes)?
 3. **Checkpoint save**: Should checkpoint be saved to HF model hub or kept local?
 4. **Blog post**: Topic preference (technical deep-dive vs. storytelling narrative)?

docs/project/JUDGING_EVIDENCE_INDEX.md CHANGED Viewed

@@ -3,7 +3,7 @@
 Snapshot timestamp (UTC): `2026-04-24T16:48:26Z`
 Stage URL: `https://kunalkachru23-nexus-enhanced-stage.hf.space`
-## Hard-gate evidence (BRD Section 17)
 1. OpenEnv latest-release workflow in use
    - Local package validate: `openenv validate .`
@@ -20,9 +20,9 @@ Stage URL: `https://kunalkachru23-nexus-enhanced-stage.hf.space`
    - Owner action: publish final link and add URL to submission package.
 4. Compliance lock matrix
-   - BRD criterion and hard-gate mapping: `docs/project/COMPLIANCE_LOCK_MATRIX.md`
-## Live metrics snapshot (Criterion 3 evidence)
 Source endpoints:
 - `GET /metrics`
@@ -78,7 +78,7 @@ Canonical demo-day snapshot set (stage URL only):
 - 2-minute live walkthrough: `docs/pitch/DEMO_WALKTHROUGH.md`
 - <2 minute recording script: `docs/pitch/YOUTUBE_RECORDING_SCRIPT.md`
 - Manual demo test cases: `docs/pitch/DEMO_MANUAL_TEST_CASES.md`
-- Criterion-4 behavior proof sheet: `docs/project/BEHAVIORAL_DELTA_PROOF.md`
 - Sub-theme matrix: `docs/project/SUBTHEME_EVIDENCE_MATRIX.md`
 - Reward-hacking defense: `docs/project/REWARD_HACKING_DEFENSE.md`
 - Training audit ledger: `docs/project/TRAINING_AUDIT_LOG.md`

 Snapshot timestamp (UTC): `2026-04-24T16:48:26Z`
 Stage URL: `https://kunalkachru23-nexus-enhanced-stage.hf.space`
+## Mandatory compliance evidence (OpenEnv + submission artifacts)
 1. OpenEnv latest-release workflow in use
    - Local package validate: `openenv validate .`
    - Owner action: publish final link and add URL to submission package.
 4. Compliance lock matrix
+   - Rubric and mandatory-requirement mapping: `docs/project/COMPLIANCE_LOCK_MATRIX.md`
+## Live metrics snapshot (observable improvement evidence)
 Source endpoints:
 - `GET /metrics`
 - 2-minute live walkthrough: `docs/pitch/DEMO_WALKTHROUGH.md`
 - <2 minute recording script: `docs/pitch/YOUTUBE_RECORDING_SCRIPT.md`
 - Manual demo test cases: `docs/pitch/DEMO_MANUAL_TEST_CASES.md`
+- Behavior delta proof sheet: `docs/project/BEHAVIORAL_DELTA_PROOF.md`
 - Sub-theme matrix: `docs/project/SUBTHEME_EVIDENCE_MATRIX.md`
 - Reward-hacking defense: `docs/project/REWARD_HACKING_DEFENSE.md`
 - Training audit ledger: `docs/project/TRAINING_AUDIT_LOG.md`

docs/project/PLAN_OF_ACTION.md CHANGED Viewed

@@ -1,23 +1,23 @@
-# Plan of action + BRD compliance matrix
-**Authority:** [`../../../design/hackathon_brd.md`](../../../design/hackathon_brd.md) (Section 17 hard gates, Section 18 rubric, §18.1 pitch format).
 ---
-## BRD compliance — strict review (evidence-based)
 | Ref | Requirement | Repo / runtime evidence | Status |
 |-----|----------------|-------------------------|--------|
-| **§17.1** | OpenEnv **(latest release)** — not fork, not old | `openenv validate .` OK; `openenv validate --url` OK after contract routes; `openenv push` workflow; Colab `pip install openenv>=0.2.3`; README “BRD hard gate — OpenEnv” | **Pass** — document `openenv --version` on submission day |
-| **§17.2** | Minimal training in **Colab** with **Unsloth** or **HF TRL** | `notebooks/grpo_colab_v2.ipynb` installs TRL + Unsloth + trains GRPO | **Pass pending** — you must execute notebook once on GPU before final submit |
-| **§17.3** | **HF blog** *or* **YouTube video &lt; 2 min** | `docs/blog/blog_post.md` draft exists; **publish** + URL in README/submission | **Gap** — publishing is owner action |
-| **§18.1** | **3 min** pitch + **2 min** Q&A | `docs/pitch/PITCH.md` script + Q&A table timed to format | **Pass** (content) — rehearsal is owner action |
-| **§18.2 C1** | Environment innovation **40%** | Multi-agent, partial observability, 7 incidents, INC007 schema drift, coalition | **Strong** — rehearse one INC007 sentence |
-| **§18.2 C2** | Storytelling **30%** | Dashboard + demo flow in `docs/pitch/DEMO_MANUAL_TEST_CASES.md` + `docs/pitch/PITCH.md` | **Pass** — practice run |
-| **§18.2 C3** | Observable reward improvement **20%** | `/learning-curve`, dashboard, `scripts/export_reward_plot.py` | **Pass** — keep Space populated for live curve |
-| **§18.2 C4** | Reward + pipeline coherence **10%** | Sparse reward, dimensions in README; trained **behaviour** narrative | **Medium** — tie checkpoint to different IC actions, not only reward |
-**Non-BRD but operational:** `pytest tests/` green; `test_hf_space_deployment.py` 8/8 on stage URL; `./gate.sh` or `scripts/shell/gate.sh` optional full run before deploy.
 ---
@@ -30,7 +30,7 @@
 | 3 | Execute **`grpo_colab_v2.ipynb`** on Colab T4+ end-to-end | Team | Notebook completes; curve updates on stage |
 | 4 | **`python scripts/export_reward_plot.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space`** → drop PNG into deck | Team | `docs/images/training_reward_curve.png` in slide asset folder |
 | 5 | Rehearse **`docs/pitch/PITCH.md`** with live demo (timer **3:00**) | Team | No overrun; Q&A 2:00 bank ready |
-| 6 | **Publish** HF blog *or* record **≤2 min** YouTube; add URL to README | Team | BRD §17.3 satisfied |
 | 7 | Submission package: Space URL, Colab link, blog/video link, `openenv --version` screenshot | Team | Checklist complete |
 | 8 | (Optional) INC007 **60 s** clip for innovation Q&A | Team | Recorded path in repo or drive link |

+# Plan of action + hackathon compliance matrix
+**Scope:** This matrix tracks evidence against **hackathon compliance criteria** (OpenEnv toolchain, Colab training with TRL/Unsloth, published blog or short video, pitch format, and judging rubric dimensions). Align deliverables with the official organizer requirements for your submission wave.
 ---
+## Compliance — strict review (evidence-based)
 | Ref | Requirement | Repo / runtime evidence | Status |
 |-----|----------------|-------------------------|--------|
+| **C1** | OpenEnv **(latest release)** — not fork, not old | `openenv validate .` OK; `openenv validate --url` OK after contract routes; `openenv push` workflow; Colab `pip install openenv>=0.2.3`; README **OpenEnv (reproduce)** section | **Pass** — document `openenv --version` on submission day |
+| **C2** | Minimal training in **Colab** with **Unsloth** or **HF TRL** | `notebooks/grpo_colab_v2.ipynb` installs TRL + Unsloth + trains GRPO | **Pass pending** — you must execute notebook once on GPU before final submit |
+| **C3** | **HF blog** *or* **YouTube video &lt; 2 min** | `docs/blog/blog_post_hf.md` draft exists; **publish** + URL in README/submission | **Gap** — publishing is owner action |
+| **C4** | **3 min** pitch + **2 min** Q&A | `docs/pitch/PITCH.md` script + Q&A table timed to format | **Pass** (content) — rehearsal is owner action |
+| **R1** | Environment innovation **40%** | Multi-agent, partial observability, 7 incidents, INC007 schema drift, coalition | **Strong** — rehearse one INC007 sentence |
+| **R2** | Storytelling **30%** | Dashboard + demo flow in `docs/pitch/DEMO_MANUAL_TEST_CASES.md` + `docs/pitch/PITCH.md` | **Pass** — practice run |
+| **R3** | Observable reward improvement **20%** | `/learning-curve`, dashboard, `scripts/export_reward_plot.py` | **Pass** — keep Space populated for live curve |
+| **R4** | Reward + pipeline coherence **10%** | Sparse reward, dimensions in README; trained **behaviour** narrative | **Medium** — tie checkpoint to different IC actions, not only reward |
+**Additional quality gates:** `pytest tests/` green; `test_hf_space_deployment.py` 8/8 on stage URL; `./gate.sh` or `scripts/shell/gate.sh` optional full run before deploy.
 ---
 | 3 | Execute **`grpo_colab_v2.ipynb`** on Colab T4+ end-to-end | Team | Notebook completes; curve updates on stage |
 | 4 | **`python scripts/export_reward_plot.py --url https://kunalkachru23-nexus-enhanced-stage.hf.space`** → drop PNG into deck | Team | `docs/images/training_reward_curve.png` in slide asset folder |
 | 5 | Rehearse **`docs/pitch/PITCH.md`** with live demo (timer **3:00**) | Team | No overrun; Q&A 2:00 bank ready |
+| 6 | **Publish** HF blog *or* record **≤2 min** YouTube; add URL to README | Team | Hackathon submission artifact (blog or video) satisfied |
 | 7 | Submission package: Space URL, Colab link, blog/video link, `openenv --version` screenshot | Team | Checklist complete |
 | 8 | (Optional) INC007 **60 s** clip for innovation Q&A | Team | Recorded path in repo or drive link |

docs/project/PROJECT_STATUS.md CHANGED Viewed

@@ -1,7 +1,8 @@
 # NEXUS Enhanced — project status & backlog
-**Source of truth for judging:** [`../../../design/hackathon_brd.md`](../../../design/hackathon_brd.md) (Section 17 hard gates + Section 18 rubric).
-**Last reviewed:** [`../pitch/PITCH.md`](../pitch/PITCH.md), [`PLAN_OF_ACTION.md`](PLAN_OF_ACTION.md), [`../../scripts/export_reward_plot.py`](../../scripts/export_reward_plot.py), BRD compliance matrix.
 **See also:** [`../pitch/DEMO_MANUAL_TEST_CASES.md`](../pitch/DEMO_MANUAL_TEST_CASES.md).
@@ -22,24 +23,24 @@
 ---
-## BRD hard gates (Section 17) — checklist
 | # | Requirement | Status |
 |---|-------------|--------|
-| 1 | OpenEnv **latest release** | **Evidence in repo:** README “BRD hard gate — OpenEnv” commands; `openenv validate .` + `openenv validate --url` green after stubs; Colab still `pip install openenv>=0.2.3`. Record `openenv --version` in your pitch appendix. |
 | 2 | Minimal **Colab** training script (**Unsloth** or **HF TRL**) | **Notebook aligned:** `grpo_colab_v2.ipynb` now defaults `BASE_URL` to **stage** (`kunalkachru23-nexus-enhanced-stage.hf.space`). You still need one successful T4+ run before submission. |
 | 3 | **Blog (HF)** or **Video (YouTube, &lt;2 min)** | **You own:** publish + link in submission. |
 ---
-## Judging rubric (Section 18) — quick gap scan
 | Criterion | Weight | Focus next |
 |-----------|--------|------------|
 | Environment innovation | 40% | One sharp “why NEXUS is hard” story (partial observability, schema drift, coalitions) backed by INC007 / live UI. |
 | Storytelling | 30% | 3-minute pitch script rehearsed; demo path: metrics → auto-demo → guided manual complete. |
 | Observable reward improvement | 20% | Keep dashboard + `/learning-curve` honest; optional: export static plot artifact for slides. |
-| Reward / pipeline coherence | 10% | Tie reward dimensions to BRD wording; show before/after behaviour if you have checkpoints. |
 ---
@@ -47,7 +48,7 @@
 1. ~~**Submission hygiene:** OpenEnv reproduce block in README + `outputs/` for clean `openenv validate .`~~ (done this iteration).
 2. **Colab:** Run `grpo_colab_v2.ipynb` once on T4+; capture reward curve screenshot for slides.
-3. **Judge artifacts (BRD gate):** Publish HF **blog** or **YouTube &lt;2 min** and add the link next to README “Blog Post” section.
 4. **Pitch (30% storytelling):** 3-minute path: Training tab metrics → **Run Auto-Demo** → Manual **Guided: fill + execute** to complete (see [`../pitch/DEMO_MANUAL_TEST_CASES.md`](../pitch/DEMO_MANUAL_TEST_CASES.md)).
 5. **Optional hardening:** Richer `/schema` from Pydantic models (cosmetic).

 # NEXUS Enhanced — project status & backlog
+**Compliance reference:** Use **hackathon compliance criteria** (OpenEnv latest in toolchain, Colab training with TRL/Unsloth, published blog or ≤2 min video, pitch format, judging rubric). This file tracks status against those expectations; compliance wording lives in-repo only (no linked external design doc).
+**Last reviewed:** [`../pitch/PITCH.md`](../pitch/PITCH.md), [`PLAN_OF_ACTION.md`](PLAN_OF_ACTION.md), [`../../scripts/export_reward_plot.py`](../../scripts/export_reward_plot.py), [`COMPLIANCE_LOCK_MATRIX.md`](COMPLIANCE_LOCK_MATRIX.md).
 **See also:** [`../pitch/DEMO_MANUAL_TEST_CASES.md`](../pitch/DEMO_MANUAL_TEST_CASES.md).
 ---
+## Mandatory submission checklist (OpenEnv + artifacts)
 | # | Requirement | Status |
 |---|-------------|--------|
+| 1 | OpenEnv **latest release** | **Evidence in repo:** README **OpenEnv (reproduce)** commands; `openenv validate .` + `openenv validate --url` green after stubs; Colab still `pip install openenv>=0.2.3`. Record `openenv --version` in your pitch appendix. |
 | 2 | Minimal **Colab** training script (**Unsloth** or **HF TRL**) | **Notebook aligned:** `grpo_colab_v2.ipynb` now defaults `BASE_URL` to **stage** (`kunalkachru23-nexus-enhanced-stage.hf.space`). You still need one successful T4+ run before submission. |
 | 3 | **Blog (HF)** or **Video (YouTube, &lt;2 min)** | **You own:** publish + link in submission. |
 ---
+## Judging rubric — quick gap scan
 | Criterion | Weight | Focus next |
 |-----------|--------|------------|
 | Environment innovation | 40% | One sharp “why NEXUS is hard” story (partial observability, schema drift, coalitions) backed by INC007 / live UI. |
 | Storytelling | 30% | 3-minute pitch script rehearsed; demo path: metrics → auto-demo → guided manual complete. |
 | Observable reward improvement | 20% | Keep dashboard + `/learning-curve` honest; optional: export static plot artifact for slides. |
+| Reward / pipeline coherence | 10% | Tie reward dimensions to the published reward model; show before/after behaviour if you have checkpoints. |
 ---
 1. ~~**Submission hygiene:** OpenEnv reproduce block in README + `outputs/` for clean `openenv validate .`~~ (done this iteration).
 2. **Colab:** Run `grpo_colab_v2.ipynb` once on T4+; capture reward curve screenshot for slides.
+3. **Submission artifacts:** Publish HF **blog** or **YouTube &lt;2 min** and add the link next to README “Blog Post” section.
 4. **Pitch (30% storytelling):** 3-minute path: Training tab metrics → **Run Auto-Demo** → Manual **Guided: fill + execute** to complete (see [`../pitch/DEMO_MANUAL_TEST_CASES.md`](../pitch/DEMO_MANUAL_TEST_CASES.md)).
 5. **Optional hardening:** Richer `/schema` from Pydantic models (cosmetic).

docs/project/SUBTHEME_EVIDENCE_MATRIX.md CHANGED Viewed

@@ -1,14 +1,14 @@
 # Sub-Theme Evidence Matrix (Judge-Ready)
-This matrix maps implemented mechanics to BRD wording and where judges can verify each claim.
-**Parent BRD themes (§9–14)** and the **four §18 scoring criteria** are mapped end-to-end in `docs/project/COMPLIANCE_LOCK_MATRIX.md` (theme demonstration + demo beats). This file focuses on **§15 sponsor sub-themes** and cross-links to the same implementation paths.
 ## Targeted sub-themes
-| Sponsor / Sub-theme | BRD wording to satisfy | Implemented evidence | Where to verify |
 |---|---|---|---|
-| **Theme 3.2 — Personalized (BRD §12)** | Personal tasks, delegation, conflicting priorities | **INC008** — executive EA calendar conflict (family vs board), smart-scheduler auto-accept root cause; `IncidentType.PERSONAL_ASSISTANT` | Dashboard manual validation select **INC008**; `server/incidents.py` |
 | Fleet AI — Scalable Oversight | "monitor, analyze, and explain" | Oversight-oriented behavior + oversight reward component in final score model | `server/reward.py`, `server/agents.py`, live run transcript from `/demo/run/INC003` |
 | Halluminate — Multi-Actor Environments | "interacts with and manages multiple actors ... to discover and achieve task" | IC orchestrates L1/L2/SRE/PM actions with partial observability; coalition mechanics present | `server/environment.py`, `server/agents.py`, `server/incidents.py` (INC003+), dashboard manual flow |
 | Snorkel AI — Simulated Experts | "changing requirements/preferences" | Rotating expert criteria and adaptive scoring emphasis over episodes | `server/reward.py`, project docs (`README.md`, `docs/project/PLAN_OF_ACTION.md`) |
@@ -17,9 +17,9 @@ This matrix maps implemented mechanics to BRD wording and where judges can verif
 | Scale AI — Non-code business (HR & IT) | Long-horizon **non-code** workflows in Sales / PM / **HR & IT** only | **IT / on-call incident command** (status pages, escalations, runbooks, customer comms)—no code-writing task as the core object | Multi-step dashboard validation, SLA/revenue semantics in `server/incidents.py`, L1 customer paths |
 | Scaler AI Labs — Multi-App Enterprise RL | "business rule nuances" in enterprise multi-app world | Datadog/Jira/Runbook/Customer interactions with operational constraints and role-specific visibility | `server/tools.py`, `server/incidents.py`, dashboard and auto-demo flow |
-## Cross-criterion reinforcement
-- Criterion 1 (Innovation 40%): multi-agent + partial observability + schema drift + business-rule constraints.
-- Criterion 2 (Storytelling 30%): deterministic live flow in `docs/pitch/PITCH.md` and `docs/pitch/DEMO_WALKTHROUGH.md`.
-- Criterion 3 (Improvement 20%): `/learning-curve`, `/metrics`, `docs/images/training_reward_curve.png`.
-- Criterion 4 (Pipeline 10%): Colab GRPO script + behavior delta sheet (`docs/project/BEHAVIORAL_DELTA_PROOF.md`).

 # Sub-Theme Evidence Matrix (Judge-Ready)
+This matrix maps implemented mechanics to **organizer theme wording** and where judges can verify each claim.
+**Parent hackathon themes** and the **four judging rubric rows** are mapped end-to-end in `docs/project/COMPLIANCE_LOCK_MATRIX.md` (theme demonstration + demo beats). This file focuses on **sponsor sub-themes** and cross-links to the same implementation paths.
 ## Targeted sub-themes
+| Sponsor / Sub-theme | Theme wording to satisfy | Implemented evidence | Where to verify |
 |---|---|---|---|
+| **Theme 3.2 — Personalized** | Personal tasks, delegation, conflicting priorities | **INC008** — executive EA calendar conflict (family vs board), smart-scheduler auto-accept root cause; `IncidentType.PERSONAL_ASSISTANT` | Dashboard manual validation select **INC008**; `server/incidents.py` |
 | Fleet AI — Scalable Oversight | "monitor, analyze, and explain" | Oversight-oriented behavior + oversight reward component in final score model | `server/reward.py`, `server/agents.py`, live run transcript from `/demo/run/INC003` |
 | Halluminate — Multi-Actor Environments | "interacts with and manages multiple actors ... to discover and achieve task" | IC orchestrates L1/L2/SRE/PM actions with partial observability; coalition mechanics present | `server/environment.py`, `server/agents.py`, `server/incidents.py` (INC003+), dashboard manual flow |
 | Snorkel AI — Simulated Experts | "changing requirements/preferences" | Rotating expert criteria and adaptive scoring emphasis over episodes | `server/reward.py`, project docs (`README.md`, `docs/project/PLAN_OF_ACTION.md`) |
 | Scale AI — Non-code business (HR & IT) | Long-horizon **non-code** workflows in Sales / PM / **HR & IT** only | **IT / on-call incident command** (status pages, escalations, runbooks, customer comms)—no code-writing task as the core object | Multi-step dashboard validation, SLA/revenue semantics in `server/incidents.py`, L1 customer paths |
 | Scaler AI Labs — Multi-App Enterprise RL | "business rule nuances" in enterprise multi-app world | Datadog/Jira/Runbook/Customer interactions with operational constraints and role-specific visibility | `server/tools.py`, `server/incidents.py`, dashboard and auto-demo flow |
+## Cross-rubric reinforcement
+- Innovation (40%): multi-agent + partial observability + schema drift + business-rule constraints.
+- Storytelling (30%): deterministic live flow in `docs/pitch/PITCH.md` and `docs/pitch/DEMO_WALKTHROUGH.md`.
+- Observable improvement (20%): `/learning-curve`, `/metrics`, `docs/images/training_reward_curve.png`.
+- Pipeline coherence (10%): Colab GRPO script + behavior delta sheet (`docs/project/BEHAVIORAL_DELTA_PROOF.md`).

docs/project/TEST_RESULTS_SUMMARY.md CHANGED Viewed

@@ -474,7 +474,7 @@ Episodes will complete naturally without needing the "End Episode" button workar
 | **API Tests** | ✅ PASS | Episode completion with reward |
 | **UI Playwright** | ✅ PASS | 8-iteration state management |
 | **Manual Testing** | ⚠️ PARTIAL | Phase progression works, reward needs workaround |
-| **Environment Audit** | ✅ PASS | All BRD requirements met |
 ---

 | **API Tests** | ✅ PASS | Episode completion with reward |
 | **UI Playwright** | ✅ PASS | 8-iteration state management |
 | **Manual Testing** | ⚠️ PARTIAL | Phase progression works, reward needs workaround |
+| **Environment Audit** | ✅ PASS | All hackathon compliance requirements met |
 ---

episode_rewards.json CHANGED Viewed

@@ -1 +1 @@

- [{"session_id": "legacy-1", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3047, "timestamp": 0.0}, {"session_id": "legacy-2", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2568, "timestamp": 0.0}, {"session_id": "legacy-3", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3226, "timestamp": 0.0}, {"session_id": "legacy-4", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3956, "timestamp": 0.0}, {"session_id": "legacy-5", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2579, "timestamp": 0.0}, {"session_id": "legacy-6", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2608, "timestamp": 0.0}, {"session_id": "legacy-7", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4088, "timestamp": 0.0}, {"session_id": "legacy-8", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3468, "timestamp": 0.0}, {"session_id": "legacy-9", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2507, "timestamp": 0.0}, {"session_id": "legacy-10", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3346, "timestamp": 0.0}, {"session_id": "legacy-11", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.257, "timestamp": 0.0}, {"session_id": "legacy-12", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2597, "timestamp": 0.0}, {"session_id": "legacy-13", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3193, "timestamp": 0.0}, {"session_id": "legacy-14", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.1497, "timestamp": 0.0}, {"session_id": "legacy-15", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.1677, "timestamp": 0.0}, {"session_id": "legacy-16", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2636, "timestamp": 0.0}, {"session_id": "legacy-17", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2305, "timestamp": 0.0}, {"session_id": "legacy-18", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3396, "timestamp": 0.0}, {"session_id": "legacy-19", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2447, "timestamp": 0.0}, {"session_id": "legacy-20", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2073, "timestamp": 0.0}, {"session_id": "legacy-21", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4404, "timestamp": 0.0}, {"session_id": "legacy-22", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.308, "timestamp": 0.0}, {"session_id": "legacy-23", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3344, "timestamp": 0.0}, {"session_id": "legacy-24", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2179, "timestamp": 0.0}, {"session_id": "legacy-25", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2912, "timestamp": 0.0}, {"session_id": "legacy-26", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3466, "timestamp": 0.0}, {"session_id": "legacy-27", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2485, "timestamp": 0.0}, {"session_id": "legacy-28", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3736, "timestamp": 0.0}, {"session_id": "legacy-29", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2984, "timestamp": 0.0}, {"session_id": "legacy-30", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.326, "timestamp": 0.0}, {"session_id": "legacy-31", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3041, "timestamp": 0.0}, {"session_id": "legacy-32", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5033, "timestamp": 0.0}, {"session_id": "legacy-33", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.357, "timestamp": 0.0}, {"session_id": "legacy-34", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2764, "timestamp": 0.0}, {"session_id": "legacy-35", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4297, "timestamp": 0.0}, {"session_id": "legacy-36", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2691, "timestamp": 0.0}, {"session_id": "legacy-37", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3864, "timestamp": 0.0}, {"session_id": "legacy-38", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2158, "timestamp": 0.0}, {"session_id": "legacy-39", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2693, "timestamp": 0.0}, {"session_id": "legacy-40", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3942, "timestamp": 0.0}, {"session_id": "legacy-41", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4404, "timestamp": 0.0}, {"session_id": "legacy-42", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3979, "timestamp": 0.0}, {"session_id": "legacy-43", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3779, "timestamp": 0.0}, {"session_id": "legacy-44", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.366, "timestamp": 0.0}, {"session_id": "legacy-45", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2747, "timestamp": 0.0}, {"session_id": "legacy-46", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3383, "timestamp": 0.0}, {"session_id": "legacy-47", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3619, "timestamp": 0.0}, {"session_id": "legacy-48", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4863, "timestamp": 0.0}, {"session_id": "legacy-49", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4321, "timestamp": 0.0}, {"session_id": "legacy-50", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2665, "timestamp": 0.0}, {"session_id": "legacy-51", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4363, "timestamp": 0.0}, {"session_id": "legacy-52", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3825, "timestamp": 0.0}, {"session_id": "legacy-53", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3621, "timestamp": 0.0}, {"session_id": "legacy-54", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4681, "timestamp": 0.0}, {"session_id": "legacy-55", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5045, "timestamp": 0.0}, {"session_id": "legacy-56", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4995, "timestamp": 0.0}, {"session_id": "legacy-57", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3607, "timestamp": 0.0}, {"session_id": "legacy-58", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.406, "timestamp": 0.0}, {"session_id": "legacy-59", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4602, "timestamp": 0.0}, {"session_id": "legacy-60", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5146, "timestamp": 0.0}, {"session_id": "legacy-61", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4012, "timestamp": 0.0}, {"session_id": "legacy-62", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4275, "timestamp": 0.0}, {"session_id": "legacy-63", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3568, "timestamp": 0.0}, {"session_id": "legacy-64", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3525, "timestamp": 0.0}, {"session_id": "legacy-65", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5161, "timestamp": 0.0}, {"session_id": "legacy-66", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5625, "timestamp": 0.0}, {"session_id": "legacy-67", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4512, "timestamp": 0.0}, {"session_id": "legacy-68", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5401, "timestamp": 0.0}, {"session_id": "legacy-69", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4917, "timestamp": 0.0}, {"session_id": "legacy-70", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4141, "timestamp": 0.0}, {"session_id": "legacy-71", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4975, "timestamp": 0.0}, {"session_id": "legacy-72", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5945, "timestamp": 0.0}, {"session_id": "legacy-73", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4715, "timestamp": 0.0}, {"session_id": "legacy-74", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.6025, "timestamp": 0.0}, {"session_id": "legacy-75", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2706, "timestamp": 0.0}, {"session_id": "legacy-76", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5489, "timestamp": 0.0}, {"session_id": "legacy-77", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.493, "timestamp": 0.0}, {"session_id": "legacy-78", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.465, "timestamp": 0.0}, {"session_id": "legacy-79", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4992, "timestamp": 0.0}, {"session_id": "legacy-80", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3357, "timestamp": 0.0}, {"session_id": "legacy-81", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4801, "timestamp": 0.0}, {"session_id": "legacy-82", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5291, "timestamp": 0.0}, {"session_id": "legacy-83", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.6217, "timestamp": 0.0}, {"session_id": "legacy-84", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4649, "timestamp": 0.0}, {"session_id": "legacy-85", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4446, "timestamp": 0.0}, {"session_id": "legacy-86", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.9484, "timestamp": 0.0}, {"session_id": "legacy-87", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5883, "timestamp": 0.0}, {"session_id": "legacy-88", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5443, "timestamp": 0.0}, {"session_id": "legacy-89", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4785, "timestamp": 0.0}, {"session_id": "legacy-90", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5649, "timestamp": 0.0}, {"session_id": "legacy-91", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5345, "timestamp": 0.0}, {"session_id": "legacy-92", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.6071, "timestamp": 0.0}, {"session_id": "legacy-93", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4764, "timestamp": 0.0}, {"session_id": "legacy-94", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5092, "timestamp": 0.0}, {"session_id": "legacy-95", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.507, "timestamp": 0.0}, {"session_id": "legacy-96", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4242, "timestamp": 0.0}, {"session_id": "legacy-97", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5679, "timestamp": 0.0}, {"session_id": "legacy-98", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.568, "timestamp": 0.0}, {"session_id": "legacy-99", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5504, "timestamp": 0.0}, {"session_id": "legacy-100", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-101", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-102", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-103", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-104", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.8889, "timestamp": 0.0}, {"session_id": "legacy-105", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.8889, "timestamp": 0.0}, {"session_id": "legacy-106", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.8889, "timestamp": 0.0}, {"session_id": "legacy-107", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-108", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-109", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-110", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-111", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-112", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-113", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-114", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-115", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-116", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-117", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-118", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-119", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-120", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-121", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-122", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-123", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-124", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-125", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-126", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-127", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-128", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-129", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-130", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-131", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4252, "timestamp": 0.0}, {"session_id": "legacy-132", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4252, "timestamp": 0.0}, {"session_id": "legacy-133", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4252, "timestamp": 0.0}, {"session_id": "a003e153-9026-406c-879d-25172aa11eda", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776965109.7711759}, {"session_id": "bf34a807-a40d-4ae5-b8b0-4d8333a62c81", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776965109.825952}, {"session_id": "8876cfb6-f5e9-4ad0-941e-84bc7d8e2b96", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776967485.476565}, {"session_id": "db31a9f2-ca48-4f93-8d1e-3fa50fa5d21a", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776967485.519014}, {"session_id": "4aaf8a4d-0db7-47f5-9dca-8bc360b19088", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776978808.275504}, {"session_id": "c13f72bb-5715-4209-9872-e85acecbc8b3", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776978808.317802}, {"session_id": "4da60545-6cb0-4a65-acf2-9a8fd8cf7e59", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776982752.592321}, {"session_id": "48a56b54-d4cd-47c9-b2e1-165dd330a6c9", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776982752.632246}, {"session_id": "2eab58c0-ffe3-4a11-9512-3b606eb2a957", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776983166.506444}, {"session_id": "ec0ae7f3-6095-4dce-b323-1cd1482c1ba4", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776983166.548518}, {"session_id": "b720c132-ddce-41fa-99d1-be7ebaa32de2", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776984636.36427}, {"session_id": "edb8bdb1-5903-4710-b9f4-cd68c67f6474", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776984636.4119911}, {"session_id": "2654393f-5c7e-4d17-938d-2570195a3c5f", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776984647.97215}, {"session_id": "c0e98721-13b3-49e3-a3fd-735801f5f9f7", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776984648.018783}, {"session_id": "6fe810db-c9ae-48e7-a205-6d2df902b555", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777049568.085221}, {"session_id": "58c9af47-0d8b-4359-9889-a6a04552db83", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777049568.1298869}, {"session_id": "552395e3-cae4-4423-bc3c-9314b8cc276d", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777049580.030255}, {"session_id": "49a9bb9e-c9b1-465f-a617-6443da42d1be", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777049580.0770292}, {"session_id": "fb4a1426-bf33-47f0-8357-29b19a75a19c", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777051304.085602}, {"session_id": "2714abc5-60be-4bf4-981c-da1c2ba328b5", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777051304.1293068}, {"session_id": "69407c50-1635-4bd8-b1b9-e1017cbb5297", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777070951.992904}, {"session_id": "9fe3899c-455e-483b-b5ce-19240df9f0e6", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777070952.039953}]

+ [{"session_id": "legacy-1", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3047, "timestamp": 0.0}, {"session_id": "legacy-2", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2568, "timestamp": 0.0}, {"session_id": "legacy-3", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3226, "timestamp": 0.0}, {"session_id": "legacy-4", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3956, "timestamp": 0.0}, {"session_id": "legacy-5", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2579, "timestamp": 0.0}, {"session_id": "legacy-6", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2608, "timestamp": 0.0}, {"session_id": "legacy-7", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4088, "timestamp": 0.0}, {"session_id": "legacy-8", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3468, "timestamp": 0.0}, {"session_id": "legacy-9", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2507, "timestamp": 0.0}, {"session_id": "legacy-10", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3346, "timestamp": 0.0}, {"session_id": "legacy-11", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.257, "timestamp": 0.0}, {"session_id": "legacy-12", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2597, "timestamp": 0.0}, {"session_id": "legacy-13", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3193, "timestamp": 0.0}, {"session_id": "legacy-14", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.1497, "timestamp": 0.0}, {"session_id": "legacy-15", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.1677, "timestamp": 0.0}, {"session_id": "legacy-16", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2636, "timestamp": 0.0}, {"session_id": "legacy-17", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2305, "timestamp": 0.0}, {"session_id": "legacy-18", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3396, "timestamp": 0.0}, {"session_id": "legacy-19", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2447, "timestamp": 0.0}, {"session_id": "legacy-20", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2073, "timestamp": 0.0}, {"session_id": "legacy-21", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4404, "timestamp": 0.0}, {"session_id": "legacy-22", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.308, "timestamp": 0.0}, {"session_id": "legacy-23", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3344, "timestamp": 0.0}, {"session_id": "legacy-24", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2179, "timestamp": 0.0}, {"session_id": "legacy-25", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2912, "timestamp": 0.0}, {"session_id": "legacy-26", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3466, "timestamp": 0.0}, {"session_id": "legacy-27", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2485, "timestamp": 0.0}, {"session_id": "legacy-28", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3736, "timestamp": 0.0}, {"session_id": "legacy-29", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2984, "timestamp": 0.0}, {"session_id": "legacy-30", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.326, "timestamp": 0.0}, {"session_id": "legacy-31", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3041, "timestamp": 0.0}, {"session_id": "legacy-32", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5033, "timestamp": 0.0}, {"session_id": "legacy-33", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.357, "timestamp": 0.0}, {"session_id": "legacy-34", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2764, "timestamp": 0.0}, {"session_id": "legacy-35", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4297, "timestamp": 0.0}, {"session_id": "legacy-36", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2691, "timestamp": 0.0}, {"session_id": "legacy-37", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3864, "timestamp": 0.0}, {"session_id": "legacy-38", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2158, "timestamp": 0.0}, {"session_id": "legacy-39", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2693, "timestamp": 0.0}, {"session_id": "legacy-40", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3942, "timestamp": 0.0}, {"session_id": "legacy-41", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4404, "timestamp": 0.0}, {"session_id": "legacy-42", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3979, "timestamp": 0.0}, {"session_id": "legacy-43", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3779, "timestamp": 0.0}, {"session_id": "legacy-44", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.366, "timestamp": 0.0}, {"session_id": "legacy-45", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2747, "timestamp": 0.0}, {"session_id": "legacy-46", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3383, "timestamp": 0.0}, {"session_id": "legacy-47", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3619, "timestamp": 0.0}, {"session_id": "legacy-48", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4863, "timestamp": 0.0}, {"session_id": "legacy-49", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4321, "timestamp": 0.0}, {"session_id": "legacy-50", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2665, "timestamp": 0.0}, {"session_id": "legacy-51", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4363, "timestamp": 0.0}, {"session_id": "legacy-52", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3825, "timestamp": 0.0}, {"session_id": "legacy-53", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3621, "timestamp": 0.0}, {"session_id": "legacy-54", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4681, "timestamp": 0.0}, {"session_id": "legacy-55", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5045, "timestamp": 0.0}, {"session_id": "legacy-56", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4995, "timestamp": 0.0}, {"session_id": "legacy-57", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3607, "timestamp": 0.0}, {"session_id": "legacy-58", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.406, "timestamp": 0.0}, {"session_id": "legacy-59", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4602, "timestamp": 0.0}, {"session_id": "legacy-60", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5146, "timestamp": 0.0}, {"session_id": "legacy-61", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4012, "timestamp": 0.0}, {"session_id": "legacy-62", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4275, "timestamp": 0.0}, {"session_id": "legacy-63", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3568, "timestamp": 0.0}, {"session_id": "legacy-64", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3525, "timestamp": 0.0}, {"session_id": "legacy-65", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5161, "timestamp": 0.0}, {"session_id": "legacy-66", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5625, "timestamp": 0.0}, {"session_id": "legacy-67", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4512, "timestamp": 0.0}, {"session_id": "legacy-68", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5401, "timestamp": 0.0}, {"session_id": "legacy-69", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4917, "timestamp": 0.0}, {"session_id": "legacy-70", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4141, "timestamp": 0.0}, {"session_id": "legacy-71", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4975, "timestamp": 0.0}, {"session_id": "legacy-72", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5945, "timestamp": 0.0}, {"session_id": "legacy-73", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4715, "timestamp": 0.0}, {"session_id": "legacy-74", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.6025, "timestamp": 0.0}, {"session_id": "legacy-75", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.2706, "timestamp": 0.0}, {"session_id": "legacy-76", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5489, "timestamp": 0.0}, {"session_id": "legacy-77", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.493, "timestamp": 0.0}, {"session_id": "legacy-78", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.465, "timestamp": 0.0}, {"session_id": "legacy-79", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4992, "timestamp": 0.0}, {"session_id": "legacy-80", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3357, "timestamp": 0.0}, {"session_id": "legacy-81", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4801, "timestamp": 0.0}, {"session_id": "legacy-82", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5291, "timestamp": 0.0}, {"session_id": "legacy-83", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.6217, "timestamp": 0.0}, {"session_id": "legacy-84", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4649, "timestamp": 0.0}, {"session_id": "legacy-85", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4446, "timestamp": 0.0}, {"session_id": "legacy-86", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.9484, "timestamp": 0.0}, {"session_id": "legacy-87", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5883, "timestamp": 0.0}, {"session_id": "legacy-88", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5443, "timestamp": 0.0}, {"session_id": "legacy-89", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4785, "timestamp": 0.0}, {"session_id": "legacy-90", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5649, "timestamp": 0.0}, {"session_id": "legacy-91", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5345, "timestamp": 0.0}, {"session_id": "legacy-92", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.6071, "timestamp": 0.0}, {"session_id": "legacy-93", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4764, "timestamp": 0.0}, {"session_id": "legacy-94", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5092, "timestamp": 0.0}, {"session_id": "legacy-95", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.507, "timestamp": 0.0}, {"session_id": "legacy-96", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4242, "timestamp": 0.0}, {"session_id": "legacy-97", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5679, "timestamp": 0.0}, {"session_id": "legacy-98", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.568, "timestamp": 0.0}, {"session_id": "legacy-99", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.5504, "timestamp": 0.0}, {"session_id": "legacy-100", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-101", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-102", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-103", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-104", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.8889, "timestamp": 0.0}, {"session_id": "legacy-105", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.8889, "timestamp": 0.0}, {"session_id": "legacy-106", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.8889, "timestamp": 0.0}, {"session_id": "legacy-107", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-108", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-109", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-110", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-111", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-112", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-113", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-114", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-115", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-116", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-117", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-118", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-119", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-120", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-121", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-122", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-123", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-124", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-125", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-126", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-127", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-128", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-129", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-130", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.3378, "timestamp": 0.0}, {"session_id": "legacy-131", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4252, "timestamp": 0.0}, {"session_id": "legacy-132", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4252, "timestamp": 0.0}, {"session_id": "legacy-133", "run_id": "default", "incident_id": "unknown", "difficulty": "unknown", "reward": 0.4252, "timestamp": 0.0}, {"session_id": "a003e153-9026-406c-879d-25172aa11eda", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776965109.7711759}, {"session_id": "bf34a807-a40d-4ae5-b8b0-4d8333a62c81", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776965109.825952}, {"session_id": "8876cfb6-f5e9-4ad0-941e-84bc7d8e2b96", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776967485.476565}, {"session_id": "db31a9f2-ca48-4f93-8d1e-3fa50fa5d21a", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776967485.519014}, {"session_id": "4aaf8a4d-0db7-47f5-9dca-8bc360b19088", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776978808.275504}, {"session_id": "c13f72bb-5715-4209-9872-e85acecbc8b3", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776978808.317802}, {"session_id": "4da60545-6cb0-4a65-acf2-9a8fd8cf7e59", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776982752.592321}, {"session_id": "48a56b54-d4cd-47c9-b2e1-165dd330a6c9", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776982752.632246}, {"session_id": "2eab58c0-ffe3-4a11-9512-3b606eb2a957", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776983166.506444}, {"session_id": "ec0ae7f3-6095-4dce-b323-1cd1482c1ba4", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776983166.548518}, {"session_id": "b720c132-ddce-41fa-99d1-be7ebaa32de2", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776984636.36427}, {"session_id": "edb8bdb1-5903-4710-b9f4-cd68c67f6474", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776984636.4119911}, {"session_id": "2654393f-5c7e-4d17-938d-2570195a3c5f", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1776984647.97215}, {"session_id": "c0e98721-13b3-49e3-a3fd-735801f5f9f7", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1776984648.018783}, {"session_id": "6fe810db-c9ae-48e7-a205-6d2df902b555", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777049568.085221}, {"session_id": "58c9af47-0d8b-4359-9889-a6a04552db83", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777049568.1298869}, {"session_id": "552395e3-cae4-4423-bc3c-9314b8cc276d", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777049580.030255}, {"session_id": "49a9bb9e-c9b1-465f-a617-6443da42d1be", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777049580.0770292}, {"session_id": "fb4a1426-bf33-47f0-8357-29b19a75a19c", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777051304.085602}, {"session_id": "2714abc5-60be-4bf4-981c-da1c2ba328b5", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777051304.1293068}, {"session_id": "29f5e924-bb71-4b4d-85ea-e6f321f8b674", "run_id": "default", "incident_id": "INC001", "difficulty": "easy", "reward": 0.3378, "timestamp": 1777184935.9644742}, {"session_id": "057b4802-136c-4fb3-b62b-0e60014a6fcf", "run_id": "pytest_full_episode_metrics", "incident_id": "INC008", "difficulty": "easy", "reward": 0.4252, "timestamp": 1777184936.008428}]

notebooks/grpo_colab_enhanced.ipynb CHANGED Viewed

@@ -4,7 +4,24 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# NEXUS Enhanced \u2014 GRPO Training Notebook (**Enhanced**)\n\n**Same pipeline as `grpo_colab_v2.ipynb`, with optional multi-incident rotation and scoped metrics `run_id`.**\n\nUse **`grpo_colab_v2.ipynb`** for the simplest single-incident (INC003) path. Use **this notebook** when you want a **defined incident pool** (enterprise + EA + lighter variety) without editing code between runs.\n\n- **Rotation:** `round_robin` (default) or `random` per reward episode (`NEXUS_INCIDENT_ROTATION`).\n- **Pool:** `NEXUS_INCIDENT_POOL` (comma-separated) or defaults to `INC003,INC008,INC001`. Set `NEXUS_MULTI_INCIDENT=false` to lock to `NEXUS_INCIDENT_ID` only.\n- **Metrics:** `NEXUS_GRPO_RUN_ID` tags `/reset` episodes so `GET /learning-curve?run_id=...` matches this Colab run.\n\n## How to run (please follow order)\n\n1. **Runtime:** GPU (e.g. T4).\n2. Run cells **top to bottom** at least once per session: installs \u2192 **configuration** \u2192 environment \u2192 model \u2192 **training** \u2192 **plots**.\n3. Edit the **configuration cell** for `BASE_URL`, incident pool, `GRPO_RUN_ID`, `ONE_ROUND_TRAINING`, and optional `NEXUS_*` env vars.\n4. **Google Drive:** Same backup behavior as v2 when `BACKUP_TO_GOOGLE_DRIVE` is True on Colab.\n- **Checkpoints / resume:** frequent `save_steps` on the quick path; default checkpoint folder moves to **Google Drive** on Colab when Drive is mounted so you can reconnect and **continue training** without restarting from scratch.\n\n"
    ]
   },
   {
@@ -13,14 +30,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Validate OpenEnv installation (Hard Gate #1)\n",
     "try:\n",
     "    import openenv\n",
-    "    print(f\"\u2705 OpenEnv {openenv.__version__} installed\")\n",
     "except ImportError:\n",
-    "    print(\"\u26a0\ufe0f  OpenEnv not yet installed (will be installed in next cell)\")\n",
     "\n",
-    "print(\"\u2705 This notebook meets BRD Hard Gate #1: 'Usage of OpenEnv (latest release)'\")"
    ]
   },
   {
@@ -40,7 +57,32 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Configuration & HF Space connectivity\n\nThe next code cell adds **multi-incident** defaults and **`run_id`** for scoped learning curves. Override with:\n\n- `NEXUS_INCIDENT_POOL` \u2014 e.g. `INC003,INC008,INC004` (comma-separated). Ignored if `NEXUS_MULTI_INCIDENT=false`.\n- `NEXUS_MULTI_INCIDENT` \u2014 `false` to train against **`NEXUS_INCIDENT_ID`** only (v2-style).\n- `NEXUS_INCIDENT_ROTATION` \u2014 `round_robin` (default) or `random`.\n- `NEXUS_GRPO_RUN_ID` \u2014 string passed to `POST /reset` as `run_id` (default `colab_grpo_enhanced`).\n\n`REWARD_MAX_STEPS` default is **35** so mixed pools have headroom vs INC003-only (28).\n**Checkpoints:** On Colab with Drive mounted, weights save under `.../NEXUS_GRPO_backups/active_grpo_checkpoints` by default so a disconnect does not wipe them. Re-run the training cell to **resume** from the latest step (`NEXUS_FORCE_FRESH=true` starts over). Set `NEXUS_GRPO_OUTPUT_DIR` to override the directory.\n\n---\n\n## Colab free tier \u2014 reducing disconnect / expiry pain\n\n**Free Colab cannot be guaranteed** to stay alive for hours (idle limits, preemption, daily caps). Mitigations:\n\n1. **Drive checkpoints + resume** (this notebook): mount Drive in the config cell; **`GRPO_OUTPUT_DIR`** defaults to **`.../NEXUS_GRPO_backups/active_grpo_checkpoints`** on Colab when Drive is present. After a disconnect, **re-run setup cells in order**, then training \u2014 **`trainer.train(resume_from_checkpoint=...)`** picks up the latest step unless **`NEXUS_FORCE_FRESH=true`**.\n2. **Shorter runs:** **`ONE_ROUND_TRAINING = True`** or fewer prompts per session; continue later with resume instead of one very long run.\n3. **Frequent `save_steps`** on the quick path so less work is lost.\n4. **Stable browser session:** avoid laptop sleep; keep the Colab tab in a focused window on a reliable network while training runs.\n5. **Colab Pro / Pro+** if you need longer single sessions.\n\n**Disable HF checkpoints entirely:** set **`NEXUS_ENABLE_CHECKPOINTS=false`** (no checkpoint files, no resume; saves Drive space).\n\n"
    ]
   },
   {
@@ -73,15 +115,15 @@
     "IN_COLAB = _in_colab()\n",
     "\n",
     "# ---------------------------------------------------------------------------\n",
-    "# Notebook configuration \u2014 edit defaults here or set environment variables.\n",
     "# Enhanced notebook extras:\n",
-    "#   NEXUS_INCIDENT_POOL     \u2014 comma-separated case ids (default: INC003,INC008,INC001)\n",
-    "#   NEXUS_MULTI_INCIDENT    \u2014 false \u2192 use only NEXUS_INCIDENT_ID (v2-style single task)\n",
-    "#   NEXUS_INCIDENT_ROTATION \u2014 round_robin | random\n",
-    "#   NEXUS_GRPO_RUN_ID       \u2014 POST /reset run_id for scoped /learning-curve and /metrics\n",
     "#\n",
     "# See grpo_colab_v2.ipynb for the full list of NEXUS_* vars (same as here).\n",
-    "# NEXUS_ENABLE_CHECKPOINTS (true/false) \u2014 false: no HF checkpoint files, no resume\n",
     "# ---------------------------------------------------------------------------\n",
     "\n",
     "BASE_URL = _env(\n",
@@ -167,13 +209,13 @@
     "    if not BACKUP_TO_GOOGLE_DRIVE:\n",
     "        return\n",
     "    if not IN_COLAB:\n",
-    "        print(\"\u26a0\ufe0f BACKUP_TO_GOOGLE_DRIVE is True but not in Colab \u2014 skipping Drive mount.\")\n",
     "        return\n",
     "    if os.path.isdir(\"/content/drive/MyDrive\"):\n",
-    "        print(\"\u2705 Google Drive already mounted.\")\n",
     "        return\n",
     "    from google.colab import drive\n",
-    "    print(\"\ud83d\udcc2 Mount Google Drive when prompted (artifacts copy to My Drive / NEXUS_GRPO_backups).\")\n",
     "    drive.mount(\"/content/drive\")\n",
     "\n",
     "\n",
@@ -224,12 +266,12 @@
     "try:\n",
     "    resp = requests.get(f\"{BASE_URL}/health\", timeout=5)\n",
     "    if resp.status_code == 200:\n",
-    "        print(\"\u2705 HF Space is reachable\")\n",
     "        print(f\"Response: {resp.json()}\")\n",
     "    else:\n",
-    "        print(f\"\u274c HF Space returned status {resp.status_code}\")\n",
     "except Exception as e:\n",
-    "    print(f\"\u274c Error connecting to HF Space: {e}\")\n",
     "    print(f\"URL: {BASE_URL}\")\n",
     "    print(\"Make sure HF Space is deployed and running\")\n"
    ]
@@ -289,7 +331,7 @@
     "        return data[\"observation\"], data[\"reward\"], data[\"done\"], data[\"info\"]\n",
     "\n",
     "    def get_learning_curve(self, run_id=None):\n",
-    "        \"\"\"GET /learning-curve \u2014 optional run_id scopes metrics to this Colab run.\"\"\"\n",
     "        params = {}\n",
     "        rid = run_id if run_id is not None else GRPO_RUN_ID\n",
     "        if rid:\n",
@@ -304,7 +346,7 @@
     "\n",
     "\n",
     "env = NexusRemoteEnv()\n",
-    "print(\"\u2705 Environment interface ready (enhanced: run_id + scoped learning curve)\")\n",
     "\n",
     "_NEXUS_DRIVE_RUN_DIR = None\n",
     "\n",
@@ -328,7 +370,7 @@
     "\n",
     "\n",
     "def backup_nexus_artifacts_to_drive(reason=\"manual\", *, include_learning_curve=True):\n",
-    "    \"\"\"Copy GRPO checkpoints, PNG plots (if present), learning curve JSON, manifest \u2192 Google Drive.\"\"\"\n",
     "    if not BACKUP_TO_GOOGLE_DRIVE:\n",
     "        print(f\"[Drive backup:{reason}] skipped (BACKUP_TO_GOOGLE_DRIVE=False)\")\n",
     "        return None\n",
@@ -336,7 +378,7 @@
     "        print(f\"[Drive backup:{reason}] skipped (not Colab)\")\n",
     "        return None\n",
     "    if not os.path.isdir(\"/content/drive/MyDrive\"):\n",
-    "        print(f\"[Drive backup:{reason}] skipped \u2014 mount Drive in the config cell first\")\n",
     "        return None\n",
     "    dest = _nexus_google_drive_run_dir()\n",
     "    if dest is None:\n",
@@ -346,22 +388,22 @@
     "    if os.path.isdir(GRPO_OUTPUT_DIR):\n",
     "        tgt = os.path.join(dest, \"grpo_checkpoints\")\n",
     "        shutil.copytree(GRPO_OUTPUT_DIR, tgt, dirs_exist_ok=True)\n",
-    "        print(f\"  \u2705 checkpoints \u2192 {tgt}\")\n",
     "\n",
     "    for name in (\"training_analysis.png\", \"reward_curves_hires.png\"):\n",
     "        src = os.path.join(PLOT_OUTPUT_DIR, name)\n",
     "        if os.path.isfile(src):\n",
     "            shutil.copy2(src, os.path.join(dest, name))\n",
-    "            print(f\"  \u2705 plot {name}\")\n",
     "\n",
     "    if include_learning_curve:\n",
     "        try:\n",
     "            curve = env.get_learning_curve()\n",
     "            with open(os.path.join(dest, \"learning_curve.json\"), \"w\") as f:\n",
     "                _json_backup.dump(curve, f, indent=2)\n",
-    "            print(\"  \u2705 learning_curve.json\")\n",
     "        except Exception as e:\n",
-    "            print(f\"  \u26a0\ufe0f learning curve fetch failed: {e}\")\n",
     "\n",
     "    manifest = {\n",
     "        \"reason\": reason,\n",
@@ -377,7 +419,7 @@
     "    }\n",
     "    with open(os.path.join(dest, \"run_manifest.json\"), \"w\") as f:\n",
     "        _json_backup.dump(manifest, f, indent=2)\n",
-    "    print(f\"\\n\ud83d\udce6 Drive backup ({reason}): {dest}\\n\")\n",
     "    return dest\n"
    ]
   },
@@ -463,7 +505,7 @@
     "\n",
     "\n",
     "print(\n",
-    "    \"\u2705 Reward function defined (pool=\",\n",
     "    TRAINING_INCIDENT_POOL,\n",
     "    \", rotation=\",\n",
     "    INCIDENT_ROTATION,\n",
@@ -522,7 +564,7 @@
     "    save_steps=GRPO_SAVE_STEPS_QUICK if ONE_ROUND_TRAINING else GRPO_SAVE_STEPS_FULL,\n",
     ")\n",
     "\n",
-    "print(\"\u2705 Model loaded and GRPO configured\")\n"
    ]
   },
   {
@@ -541,6 +583,9 @@
   },
   {
    "cell_type": "code",
    "source": [
     "from datasets import Dataset\n",
     "import os\n",
@@ -550,18 +595,18 @@
     "\n",
     "print(\"\\n\" + \"=\" * 70)\n",
     "if ONE_ROUND_TRAINING:\n",
-    "    print(f\"\ud83d\ude80 GRPO \u2014 ONE ROUND ({n_target} prompts, fast path)\")\n",
     "else:\n",
-    "    print(f\"\ud83d\ude80 GRPO \u2014 FULL RUN ({n_target} prompts)\")\n",
     "print(\"=\" * 70)\n",
     "print(\"\\nConfiguration:\")\n",
-    "print(f\"  \u2022 Model: {MODEL_NAME}\")\n",
-    "print(f\"  \u2022 Dataset rows: {n_target}\")\n",
-    "print(f\"  \u2022 Environment: {BASE_URL}\")\n",
-    "print(f\"  \u2022 Incident pool: {TRAINING_INCIDENT_POOL} ({INCIDENT_ROTATION})\")\n",
-    "print(f\"  \u2022 GRPO_RUN_ID (metrics scope): {GRPO_RUN_ID}\")\n",
-    "print(f\"  \u2022 Checkpoints dir: {GRPO_OUTPUT_DIR}\")\n",
-    "print(\"  \u2022 Adjust settings in the configuration cell (or NEXUS_* env vars).\")\n",
     "print(\"\\nMonitor dashboard:\")\n",
     "print(f\"  {BASE_URL}/\")\n",
     "print(\"=\" * 70 + \"\\n\")\n",
@@ -591,30 +636,30 @@
     "if ENABLE_CHECKPOINTS and CHECKPOINT_RESUME and not FORCE_FRESH_RUN:\n",
     "    resume_ckpt = get_last_checkpoint(GRPO_OUTPUT_DIR)\n",
     "if resume_ckpt:\n",
-    "    print(f\"\ud83d\udcc2 Resuming training from: {resume_ckpt}\")\n",
     "else:\n",
-    "    print(\"\ud83d\udcc2 Starting training fresh (no checkpoint, or NEXUS_RESUME=false, or NEXUS_FORCE_FRESH=true)\")\n",
     "\n",
-    "print(f\"\ud83d\udcca Dataset: {len(train_dataset)} prompts\")\n",
-    "print(\"\u23f3 Training started...\")\n",
     "trainer.train(resume_from_checkpoint=resume_ckpt)\n",
     "\n",
     "print(\"\\n\" + \"=\" * 70)\n",
-    "print(\"\u2705 Training step finished\")\n",
     "print(\"=\" * 70)\n",
     "print(f\"Dashboard: {BASE_URL}/\")\n",
     "print(f\"Learning curve API: {BASE_URL}/learning-curve\")\n",
-    "print(\"\u25b6\ufe0f Run next cell to plot results.\")\n",
     "print(\"=\" * 70)\n",
     "\n",
     "backup_nexus_artifacts_to_drive(\"post_training\", include_learning_curve=True)\n"
-   ],
-   "metadata": {},
-   "execution_count": null,
-   "outputs": []
   },
   {
    "cell_type": "code",
    "source": [
     "import os\n",
     "import matplotlib.pyplot as plt\n",
@@ -622,7 +667,7 @@
     "os.makedirs(PLOT_OUTPUT_DIR, exist_ok=True)\n",
     "\n",
     "print(\"\\n\" + \"=\" * 70)\n",
-    "print(\"\ud83d\udcca FETCHING REAL TRAINING DATA FROM HF SPACE\")\n",
     "print(\"=\" * 70)\n",
     "print(f\"Using run_id filter: {GRPO_RUN_ID}\")\n",
     "\n",
@@ -688,14 +733,14 @@
     "    summary_text = f\"\"\"\n",
     "TRAINING SUMMARY\n",
     "\n",
-    "\ud83d\udcca Episodes: {len(rewards)}\n",
-    "\ud83d\udd35 Baseline: {baseline:.4f}\n",
-    "\ud83d\udcc8 Average: {avg_reward:.4f}\n",
-    "\u2b50 Best: {best_reward:.4f}\n",
-    "\ud83d\udcc9 Worst: {min(rewards):.4f}\n",
     "\n",
-    "\ud83d\udcca Improvement: +{improvement_from_baseline:.1f}%\n",
-    "\ud83d\udccc Last 5 Avg: {last_5_avg:.4f}\n",
     "    \"\"\"\n",
     "\n",
     "    ax4.text(\n",
@@ -709,14 +754,14 @@
     "        bbox=dict(boxstyle=\"round\", facecolor=\"#1e293b\", alpha=0.8, edgecolor=\"#0ea5e9\"),\n",
     "    )\n",
     "\n",
-    "    plt.suptitle(\"NEXUS Enhanced \u2014 Complete Training Analysis\", fontsize=14, fontweight=\"bold\", y=0.995)\n",
     "    plt.tight_layout()\n",
     "\n",
-    "    print(\"\\n\ud83d\udcc1 Saving visualizations...\")\n",
     "    p1 = os.path.join(PLOT_OUTPUT_DIR, \"training_analysis.png\")\n",
     "    p2 = os.path.join(PLOT_OUTPUT_DIR, \"reward_curves_hires.png\")\n",
     "    plt.savefig(p1, dpi=150, bbox_inches=\"tight\")\n",
-    "    print(f\"  \u2705 {p1} (4-panel comprehensive view)\")\n",
     "\n",
     "    fig_single, ax = plt.subplots(figsize=(14, 7))\n",
     "    ax.plot(episodes, rewards, \"o-\", label=\"Episode Reward\", color=\"#0ea5e9\", markersize=7, linewidth=2.5, alpha=0.8)\n",
@@ -725,18 +770,18 @@
     "    ax.axhline(y=baseline, color=\"#ef4444\", linestyle=\"--\", linewidth=2.5, label=f\"Baseline: {baseline:.3f}\")\n",
     "    ax.set_xlabel(\"Episode\", fontsize=12, fontweight=\"bold\")\n",
     "    ax.set_ylabel(\"Reward Score\", fontsize=12, fontweight=\"bold\")\n",
-    "    ax.set_title(\"NEXUS Enhanced GRPO Training \u2014 Reward Progression\", fontsize=13, fontweight=\"bold\")\n",
     "    ax.legend(fontsize=11, loc=\"lower right\")\n",
     "    ax.grid(True, alpha=0.3, linestyle=\"--\")\n",
     "    ax.set_ylim(-0.05, 1.05)\n",
     "    plt.tight_layout()\n",
     "    plt.savefig(p2, dpi=200, bbox_inches=\"tight\")\n",
-    "    print(f\"  \u2705 {p2} (high-res)\")\n",
     "\n",
     "    plt.show()\n",
     "\n",
     "    print(\"\\n\" + \"=\" * 70)\n",
-    "    print(\"\ud83d\udcc8 FINAL TRAINING RESULTS\")\n",
     "    print(\"=\" * 70)\n",
     "    print(f\"\\n{'Metric':<35} {'Value':<20}\")\n",
     "    print(\"-\" * 55)\n",
@@ -754,25 +799,22 @@
     "        late_avg = sum(rewards[-5:]) / 5\n",
     "        print(f\"\\n{'Early Phase (Ep 1-5) Avg':<35} {early_avg:.4f}\")\n",
     "        print(f\"{'Late Phase (Ep -5) Avg':<35} {late_avg:.4f}\")\n",
-    "        learning_status = \"\u2705 Learning\" if late_avg > early_avg else \"\u26a0\ufe0f  Plateau\"\n",
     "        print(f\"{'Status':<35} {learning_status:<20}\")\n",
     "\n",
     "    print(\"\\n\" + \"=\" * 70)\n",
-    "    print(\"\u2705 COMPLETE!\")\n",
     "    print(\"=\" * 70)\n",
     "\n",
     "    backup_nexus_artifacts_to_drive(\"post_plots\", include_learning_curve=True)\n",
     "\n",
     "else:\n",
-    "    print(\"\\n\u274c No episode data found\")\n",
-    "    print(\"\u23f3 Training may still be running...\")\n",
-    "    print(\"\ud83d\udca1 Rerun this cell in a few minutes\")\n",
-    "    print(f\"\ud83d\udcca Live: {BASE_URL}/learning-curve\")\n",
     "    backup_nexus_artifacts_to_drive(\"no_rewards_yet\", include_learning_curve=True)\n"
-   ],
-   "metadata": {},
-   "execution_count": null,
-   "outputs": []
   }
  ],
  "metadata": {

    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "# NEXUS Enhanced — GRPO Training Notebook (**Enhanced**)\n",
+    "\n",
+    "**Same pipeline as `grpo_colab_v2.ipynb`, with optional multi-incident rotation and scoped metrics `run_id`.**\n",
+    "\n",
+    "Use **`grpo_colab_v2.ipynb`** for the simplest single-incident (INC003) path. Use **this notebook** when you want a **defined incident pool** (enterprise + EA + lighter variety) without editing code between runs.\n",
+    "\n",
+    "- **Rotation:** `round_robin` (default) or `random` per reward episode (`NEXUS_INCIDENT_ROTATION`).\n",
+    "- **Pool:** `NEXUS_INCIDENT_POOL` (comma-separated) or defaults to `INC003,INC008,INC001`. Set `NEXUS_MULTI_INCIDENT=false` to lock to `NEXUS_INCIDENT_ID` only.\n",
+    "- **Metrics:** `NEXUS_GRPO_RUN_ID` tags `/reset` episodes so `GET /learning-curve?run_id=...` matches this Colab run.\n",
+    "\n",
+    "## How to run (please follow order)\n",
+    "\n",
+    "1. **Runtime:** GPU (e.g. T4).\n",
+    "2. Run cells **top to bottom** at least once per session: installs → **configuration** → environment → model → **training** → **plots**.\n",
+    "3. Edit the **configuration cell** for `BASE_URL`, incident pool, `GRPO_RUN_ID`, `ONE_ROUND_TRAINING`, and optional `NEXUS_*` env vars.\n",
+    "4. **Google Drive:** Same backup behavior as v2 when `BACKUP_TO_GOOGLE_DRIVE` is True on Colab.\n",
+    "- **Checkpoints / resume:** frequent `save_steps` on the quick path; default checkpoint folder moves to **Google Drive** on Colab when Drive is mounted so you can reconnect and **continue training** without restarting from scratch.\n",
+    "\n"
    ]
   },
   {
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Validate OpenEnv installation (hackathon compliance)\n",
     "try:\n",
     "    import openenv\n",
+    "    print(f\"✅ OpenEnv {openenv.__version__} installed\")\n",
     "except ImportError:\n",
+    "    print(\"⚠️  OpenEnv not yet installed (will be installed in next cell)\")\n",
     "\n",
+    "print(\"✅ OpenEnv (latest release) check passed — hackathon compliance\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Configuration & HF Space connectivity\n",
+    "\n",
+    "The next code cell adds **multi-incident** defaults and **`run_id`** for scoped learning curves. Override with:\n",
+    "\n",
+    "- `NEXUS_INCIDENT_POOL` — e.g. `INC003,INC008,INC004` (comma-separated). Ignored if `NEXUS_MULTI_INCIDENT=false`.\n",
+    "- `NEXUS_MULTI_INCIDENT` — `false` to train against **`NEXUS_INCIDENT_ID`** only (v2-style).\n",
+    "- `NEXUS_INCIDENT_ROTATION` — `round_robin` (default) or `random`.\n",
+    "- `NEXUS_GRPO_RUN_ID` — string passed to `POST /reset` as `run_id` (default `colab_grpo_enhanced`).\n",
+    "\n",
+    "`REWARD_MAX_STEPS` default is **35** so mixed pools have headroom vs INC003-only (28).\n",
+    "**Checkpoints:** On Colab with Drive mounted, weights save under `.../NEXUS_GRPO_backups/active_grpo_checkpoints` by default so a disconnect does not wipe them. Re-run the training cell to **resume** from the latest step (`NEXUS_FORCE_FRESH=true` starts over). Set `NEXUS_GRPO_OUTPUT_DIR` to override the directory.\n",
+    "\n",
+    "---\n",
+    "\n",
+    "## Colab free tier — reducing disconnect / expiry pain\n",
+    "\n",
+    "**Free Colab cannot be guaranteed** to stay alive for hours (idle limits, preemption, daily caps). Mitigations:\n",
+    "\n",
+    "1. **Drive checkpoints + resume** (this notebook): mount Drive in the config cell; **`GRPO_OUTPUT_DIR`** defaults to **`.../NEXUS_GRPO_backups/active_grpo_checkpoints`** on Colab when Drive is present. After a disconnect, **re-run setup cells in order**, then training — **`trainer.train(resume_from_checkpoint=...)`** picks up the latest step unless **`NEXUS_FORCE_FRESH=true`**.\n",
+    "2. **Shorter runs:** **`ONE_ROUND_TRAINING = True`** or fewer prompts per session; continue later with resume instead of one very long run.\n",
+    "3. **Frequent `save_steps`** on the quick path so less work is lost.\n",
+    "4. **Stable browser session:** avoid laptop sleep; keep the Colab tab in a focused window on a reliable network while training runs.\n",
+    "5. **Colab Pro / Pro+** if you need longer single sessions.\n",
+    "\n",
+    "**Disable HF checkpoints entirely:** set **`NEXUS_ENABLE_CHECKPOINTS=false`** (no checkpoint files, no resume; saves Drive space).\n",
+    "\n"
    ]
   },
   {
     "IN_COLAB = _in_colab()\n",
     "\n",
     "# ---------------------------------------------------------------------------\n",
+    "# Notebook configuration — edit defaults here or set environment variables.\n",
     "# Enhanced notebook extras:\n",
+    "#   NEXUS_INCIDENT_POOL     — comma-separated case ids (default: INC003,INC008,INC001)\n",
+    "#   NEXUS_MULTI_INCIDENT    — false → use only NEXUS_INCIDENT_ID (v2-style single task)\n",
+    "#   NEXUS_INCIDENT_ROTATION — round_robin | random\n",
+    "#   NEXUS_GRPO_RUN_ID       — POST /reset run_id for scoped /learning-curve and /metrics\n",
     "#\n",
     "# See grpo_colab_v2.ipynb for the full list of NEXUS_* vars (same as here).\n",
+    "# NEXUS_ENABLE_CHECKPOINTS (true/false) — false: no HF checkpoint files, no resume\n",
     "# ---------------------------------------------------------------------------\n",
     "\n",
     "BASE_URL = _env(\n",
     "    if not BACKUP_TO_GOOGLE_DRIVE:\n",
     "        return\n",
     "    if not IN_COLAB:\n",
+    "        print(\"⚠️ BACKUP_TO_GOOGLE_DRIVE is True but not in Colab — skipping Drive mount.\")\n",
     "        return\n",
     "    if os.path.isdir(\"/content/drive/MyDrive\"):\n",
+    "        print(\"✅ Google Drive already mounted.\")\n",
     "        return\n",
     "    from google.colab import drive\n",
+    "    print(\"📂 Mount Google Drive when prompted (artifacts copy to My Drive / NEXUS_GRPO_backups).\")\n",
     "    drive.mount(\"/content/drive\")\n",
     "\n",
     "\n",
     "try:\n",
     "    resp = requests.get(f\"{BASE_URL}/health\", timeout=5)\n",
     "    if resp.status_code == 200:\n",
+    "        print(\"✅ HF Space is reachable\")\n",
     "        print(f\"Response: {resp.json()}\")\n",
     "    else:\n",
+    "        print(f\"❌ HF Space returned status {resp.status_code}\")\n",
     "except Exception as e:\n",
+    "    print(f\"❌ Error connecting to HF Space: {e}\")\n",
     "    print(f\"URL: {BASE_URL}\")\n",
     "    print(\"Make sure HF Space is deployed and running\")\n"
    ]
     "        return data[\"observation\"], data[\"reward\"], data[\"done\"], data[\"info\"]\n",
     "\n",
     "    def get_learning_curve(self, run_id=None):\n",
+    "        \"\"\"GET /learning-curve — optional run_id scopes metrics to this Colab run.\"\"\"\n",
     "        params = {}\n",
     "        rid = run_id if run_id is not None else GRPO_RUN_ID\n",
     "        if rid:\n",
     "\n",
     "\n",
     "env = NexusRemoteEnv()\n",
+    "print(\"✅ Environment interface ready (enhanced: run_id + scoped learning curve)\")\n",
     "\n",
     "_NEXUS_DRIVE_RUN_DIR = None\n",
     "\n",
     "\n",
     "\n",
     "def backup_nexus_artifacts_to_drive(reason=\"manual\", *, include_learning_curve=True):\n",
+    "    \"\"\"Copy GRPO checkpoints, PNG plots (if present), learning curve JSON, manifest → Google Drive.\"\"\"\n",
     "    if not BACKUP_TO_GOOGLE_DRIVE:\n",
     "        print(f\"[Drive backup:{reason}] skipped (BACKUP_TO_GOOGLE_DRIVE=False)\")\n",
     "        return None\n",
     "        print(f\"[Drive backup:{reason}] skipped (not Colab)\")\n",
     "        return None\n",
     "    if not os.path.isdir(\"/content/drive/MyDrive\"):\n",
+    "        print(f\"[Drive backup:{reason}] skipped — mount Drive in the config cell first\")\n",
     "        return None\n",
     "    dest = _nexus_google_drive_run_dir()\n",
     "    if dest is None:\n",
     "    if os.path.isdir(GRPO_OUTPUT_DIR):\n",
     "        tgt = os.path.join(dest, \"grpo_checkpoints\")\n",
     "        shutil.copytree(GRPO_OUTPUT_DIR, tgt, dirs_exist_ok=True)\n",
+    "        print(f\"  ✅ checkpoints → {tgt}\")\n",
     "\n",
     "    for name in (\"training_analysis.png\", \"reward_curves_hires.png\"):\n",
     "        src = os.path.join(PLOT_OUTPUT_DIR, name)\n",
     "        if os.path.isfile(src):\n",
     "            shutil.copy2(src, os.path.join(dest, name))\n",
+    "            print(f\"  ✅ plot {name}\")\n",
     "\n",
     "    if include_learning_curve:\n",
     "        try:\n",
     "            curve = env.get_learning_curve()\n",
     "            with open(os.path.join(dest, \"learning_curve.json\"), \"w\") as f:\n",
     "                _json_backup.dump(curve, f, indent=2)\n",
+    "            print(\"  ✅ learning_curve.json\")\n",
     "        except Exception as e:\n",
+    "            print(f\"  ⚠️ learning curve fetch failed: {e}\")\n",
     "\n",
     "    manifest = {\n",
     "        \"reason\": reason,\n",
     "    }\n",
     "    with open(os.path.join(dest, \"run_manifest.json\"), \"w\") as f:\n",
     "        _json_backup.dump(manifest, f, indent=2)\n",
+    "    print(f\"\\n📦 Drive backup ({reason}): {dest}\\n\")\n",
     "    return dest\n"
    ]
   },
     "\n",
     "\n",
     "print(\n",
+    "    \"✅ Reward function defined (pool=\",\n",
     "    TRAINING_INCIDENT_POOL,\n",
     "    \", rotation=\",\n",
     "    INCIDENT_ROTATION,\n",
     "    save_steps=GRPO_SAVE_STEPS_QUICK if ONE_ROUND_TRAINING else GRPO_SAVE_STEPS_FULL,\n",
     ")\n",
     "\n",
+    "print(\"✅ Model loaded and GRPO configured\")\n"
    ]
   },
   {
   },
   {
    "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "from datasets import Dataset\n",
     "import os\n",
     "\n",
     "print(\"\\n\" + \"=\" * 70)\n",
     "if ONE_ROUND_TRAINING:\n",
+    "    print(f\"🚀 GRPO — ONE ROUND ({n_target} prompts, fast path)\")\n",
     "else:\n",
+    "    print(f\"🚀 GRPO — FULL RUN ({n_target} prompts)\")\n",
     "print(\"=\" * 70)\n",
     "print(\"\\nConfiguration:\")\n",
+    "print(f\"  • Model: {MODEL_NAME}\")\n",
+    "print(f\"  • Dataset rows: {n_target}\")\n",
+    "print(f\"  • Environment: {BASE_URL}\")\n",
+    "print(f\"  • Incident pool: {TRAINING_INCIDENT_POOL} ({INCIDENT_ROTATION})\")\n",
+    "print(f\"  • GRPO_RUN_ID (metrics scope): {GRPO_RUN_ID}\")\n",
+    "print(f\"  • Checkpoints dir: {GRPO_OUTPUT_DIR}\")\n",
+    "print(\"  • Adjust settings in the configuration cell (or NEXUS_* env vars).\")\n",
     "print(\"\\nMonitor dashboard:\")\n",
     "print(f\"  {BASE_URL}/\")\n",
     "print(\"=\" * 70 + \"\\n\")\n",
     "if ENABLE_CHECKPOINTS and CHECKPOINT_RESUME and not FORCE_FRESH_RUN:\n",
     "    resume_ckpt = get_last_checkpoint(GRPO_OUTPUT_DIR)\n",
     "if resume_ckpt:\n",
+    "    print(f\"📂 Resuming training from: {resume_ckpt}\")\n",
     "else:\n",
+    "    print(\"📂 Starting training fresh (no checkpoint, or NEXUS_RESUME=false, or NEXUS_FORCE_FRESH=true)\")\n",
     "\n",
+    "print(f\"📊 Dataset: {len(train_dataset)} prompts\")\n",
+    "print(\"⏳ Training started...\")\n",
     "trainer.train(resume_from_checkpoint=resume_ckpt)\n",
     "\n",
     "print(\"\\n\" + \"=\" * 70)\n",
+    "print(\"✅ Training step finished\")\n",
     "print(\"=\" * 70)\n",
     "print(f\"Dashboard: {BASE_URL}/\")\n",
     "print(f\"Learning curve API: {BASE_URL}/learning-curve\")\n",
+    "print(\"▶️ Run next cell to plot results.\")\n",
     "print(\"=\" * 70)\n",
     "\n",
     "backup_nexus_artifacts_to_drive(\"post_training\", include_learning_curve=True)\n"
+   ]
   },
   {
    "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "import os\n",
     "import matplotlib.pyplot as plt\n",
     "os.makedirs(PLOT_OUTPUT_DIR, exist_ok=True)\n",
     "\n",
     "print(\"\\n\" + \"=\" * 70)\n",
+    "print(\"📊 FETCHING REAL TRAINING DATA FROM HF SPACE\")\n",
     "print(\"=\" * 70)\n",
     "print(f\"Using run_id filter: {GRPO_RUN_ID}\")\n",
     "\n",
     "    summary_text = f\"\"\"\n",
     "TRAINING SUMMARY\n",
     "\n",
+    "📊 Episodes: {len(rewards)}\n",
+    "🔵 Baseline: {baseline:.4f}\n",
+    "📈 Average: {avg_reward:.4f}\n",
+    "⭐ Best: {best_reward:.4f}\n",
+    "📉 Worst: {min(rewards):.4f}\n",
     "\n",
+    "📊 Improvement: +{improvement_from_baseline:.1f}%\n",
+    "📌 Last 5 Avg: {last_5_avg:.4f}\n",
     "    \"\"\"\n",
     "\n",
     "    ax4.text(\n",
     "        bbox=dict(boxstyle=\"round\", facecolor=\"#1e293b\", alpha=0.8, edgecolor=\"#0ea5e9\"),\n",
     "    )\n",
     "\n",
+    "    plt.suptitle(\"NEXUS Enhanced — Complete Training Analysis\", fontsize=14, fontweight=\"bold\", y=0.995)\n",
     "    plt.tight_layout()\n",
     "\n",
+    "    print(\"\\n📁 Saving visualizations...\")\n",
     "    p1 = os.path.join(PLOT_OUTPUT_DIR, \"training_analysis.png\")\n",
     "    p2 = os.path.join(PLOT_OUTPUT_DIR, \"reward_curves_hires.png\")\n",
     "    plt.savefig(p1, dpi=150, bbox_inches=\"tight\")\n",
+    "    print(f\"  ✅ {p1} (4-panel comprehensive view)\")\n",
     "\n",
     "    fig_single, ax = plt.subplots(figsize=(14, 7))\n",
     "    ax.plot(episodes, rewards, \"o-\", label=\"Episode Reward\", color=\"#0ea5e9\", markersize=7, linewidth=2.5, alpha=0.8)\n",
     "    ax.axhline(y=baseline, color=\"#ef4444\", linestyle=\"--\", linewidth=2.5, label=f\"Baseline: {baseline:.3f}\")\n",
     "    ax.set_xlabel(\"Episode\", fontsize=12, fontweight=\"bold\")\n",
     "    ax.set_ylabel(\"Reward Score\", fontsize=12, fontweight=\"bold\")\n",
+    "    ax.set_title(\"NEXUS Enhanced GRPO Training — Reward Progression\", fontsize=13, fontweight=\"bold\")\n",
     "    ax.legend(fontsize=11, loc=\"lower right\")\n",
     "    ax.grid(True, alpha=0.3, linestyle=\"--\")\n",
     "    ax.set_ylim(-0.05, 1.05)\n",
     "    plt.tight_layout()\n",
     "    plt.savefig(p2, dpi=200, bbox_inches=\"tight\")\n",
+    "    print(f\"  ✅ {p2} (high-res)\")\n",
     "\n",
     "    plt.show()\n",
     "\n",
     "    print(\"\\n\" + \"=\" * 70)\n",
+    "    print(\"📈 FINAL TRAINING RESULTS\")\n",
     "    print(\"=\" * 70)\n",
     "    print(f\"\\n{'Metric':<35} {'Value':<20}\")\n",
     "    print(\"-\" * 55)\n",
     "        late_avg = sum(rewards[-5:]) / 5\n",
     "        print(f\"\\n{'Early Phase (Ep 1-5) Avg':<35} {early_avg:.4f}\")\n",
     "        print(f\"{'Late Phase (Ep -5) Avg':<35} {late_avg:.4f}\")\n",
+    "        learning_status = \"✅ Learning\" if late_avg > early_avg else \"⚠️  Plateau\"\n",
     "        print(f\"{'Status':<35} {learning_status:<20}\")\n",
     "\n",
     "    print(\"\\n\" + \"=\" * 70)\n",
+    "    print(\"✅ COMPLETE!\")\n",
     "    print(\"=\" * 70)\n",
     "\n",
     "    backup_nexus_artifacts_to_drive(\"post_plots\", include_learning_curve=True)\n",
     "\n",
     "else:\n",
+    "    print(\"\\n❌ No episode data found\")\n",
+    "    print(\"⏳ Training may still be running...\")\n",
+    "    print(\"💡 Rerun this cell in a few minutes\")\n",
+    "    print(f\"📊 Live: {BASE_URL}/learning-curve\")\n",
     "    backup_nexus_artifacts_to_drive(\"no_rewards_yet\", include_learning_curve=True)\n"
+   ]
   }
  ],
  "metadata": {

notebooks/grpo_colab_v2.ipynb CHANGED Viewed

@@ -33,14 +33,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Validate OpenEnv installation (Hard Gate #1)\n",
     "try:\n",
     "    import openenv\n",
     "    print(f\"✅ OpenEnv {openenv.__version__} installed\")\n",
     "except ImportError:\n",
     "    print(\"⚠️  OpenEnv not yet installed (will be installed in next cell)\")\n",
     "\n",
-    "print(\"✅ This notebook meets BRD Hard Gate #1: 'Usage of OpenEnv (latest release)'\")"
    ]
   },
   {

    "metadata": {},
    "outputs": [],
    "source": [
+    "# Validate OpenEnv installation (hackathon compliance)\n",
     "try:\n",
     "    import openenv\n",
     "    print(f\"✅ OpenEnv {openenv.__version__} installed\")\n",
     "except ImportError:\n",
     "    print(\"⚠️  OpenEnv not yet installed (will be installed in next cell)\")\n",
     "\n",
+    "print(\"✅ OpenEnv (latest release) check passed — hackathon compliance\")"
    ]
   },
   {

server/app.py CHANGED Viewed

@@ -546,7 +546,7 @@ def get_episodes(run_id: Optional[str] = None):
 @app.get("/learning-curve")
 def get_learning_curve(run_id: Optional[str] = None):
-    """Rolling reward average — for Criterion 3 observable improvement evidence."""
     run_key = _normalize_run_id_filter(run_id)
     scoped_records = _get_records_for_run(run_key)
     rewards = [float(rec.get("reward", 0.0)) for rec in scoped_records]
@@ -561,7 +561,7 @@ def get_learning_curve(run_id: Optional[str] = None):
         "run_id": run_key or "all",
         "rewards": rewards,
         "rolling_avg": rolling,
-        "baseline": 0.265,  # Pre-event scripted baseline avg (BRD Criterion 3)
         "episode_count": len(rewards),
         "current_avg": round(sum(rewards) / len(rewards), 4),
         "improvement": round(sum(rewards) / len(rewards) - 0.265, 4),

 @app.get("/learning-curve")
 def get_learning_curve(run_id: Optional[str] = None):
+    """Rolling reward average — for observable training-progress evidence (judging rubric)."""
     run_key = _normalize_run_id_filter(run_id)
     scoped_records = _get_records_for_run(run_key)
     rewards = [float(rec.get("reward", 0.0)) for rec in scoped_records]
         "run_id": run_key or "all",
         "rewards": rewards,
         "rolling_avg": rolling,
+        "baseline": 0.265,  # Pre-event scripted baseline avg (observable improvement baseline)
         "episode_count": len(rewards),
         "current_avg": round(sum(rewards) / len(rewards), 4),
         "improvement": round(sum(rewards) / len(rewards) - 0.265, 4),

server/data_models.py CHANGED Viewed

@@ -9,7 +9,7 @@ class IncidentType(Enum):
     CASCADE = "cascade"
     SECURITY = "security"
     DATA = "data"
-    # Theme 3.2 — personalized delegation / conflicting priorities (BRD §12)
     PERSONAL_ASSISTANT = "personal_assistant"

     CASCADE = "cascade"
     SECURITY = "security"
     DATA = "data"
+    # Theme 3.2 — personalized delegation / conflicting priorities (hackathon personalized track)
     PERSONAL_ASSISTANT = "personal_assistant"

server/reward.py CHANGED Viewed

@@ -299,7 +299,7 @@ def compute_oversight_score(state: EpisodeState) -> float:
 def compute_depth_bonus(state: EpisodeState) -> float:
     """
     Mercor sub-theme: reward longer, better-structured IC reasoning.
-    UNCAPPED — per BRD Section 10.7: rewards scale with token output without ceiling.
     Calibration principle (per Mercor requirement):
     - Short canned strings (<30 words) earn 0 — they do not represent "reasoning"

 def compute_depth_bonus(state: EpisodeState) -> float:
     """
     Mercor sub-theme: reward longer, better-structured IC reasoning.
+    UNCAPPED — per Mercor sub-theme: rewards scale with token output without ceiling.
     Calibration principle (per Mercor requirement):
     - Short canned strings (<30 words) earn 0 — they do not represent "reasoning"

training_artifacts/pre_event_benchmark.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "description": "Untrained scripted baseline on INC003 \u2014 establishes reward floor for GRPO improvement (BRD Criterion 3)",
   "incident_id": "INC003",
   "policy": "scripted_baseline",
   "n_trials": 5,

 {
+  "description": "Untrained scripted baseline on INC003 \u2014 establishes reward floor for GRPO improvement (observable improvement evidence)",
   "incident_id": "INC003",
   "policy": "scripted_baseline",
   "n_trials": 5,