---
title: NEXUS Enhanced
emoji: ⚡
colorFrom: blue
colorTo: red
sdk: docker
app_port: 7860
tags:
  - reinforcement-learning
  - multi-agent
  - incident-response
  - openenv
  - grpo
  - pytorch
  - deployed
---

# ⚡ NEXUS Enhanced

**Multi-Agent Enterprise Incident Response RL Environment**  
*Meta PyTorch OpenEnv Hackathon Grand Finale — Team Falcons*

NEXUS Enhanced trains an AI Incident Commander to orchestrate 5 specialist agents across 7 production incident scenarios, culminating in a CrowdStrike-scale global failure affecting 8.5 million machines.

## Judge Fast Path (3-5 min)

Use this section first during judging/review.

- **Live environment (HF Space):** https://kunalkachru23-nexus-enhanced-stage.hf.space/
- **3-minute pitch script:** [`docs/pitch/PITCH_3MIN.md`](docs/pitch/PITCH_3MIN.md)
- **2-minute demo walkthrough:** [`docs/pitch/DEMO_WALKTHROUGH.md`](docs/pitch/DEMO_WALKTHROUGH.md)
- **Compliance + judging evidence index:** [`docs/project/JUDGING_EVIDENCE_INDEX.md`](docs/project/JUDGING_EVIDENCE_INDEX.md)
- **Frozen submission snapshot:** [`docs/project/snapshots/submission_snapshot_20260424T164826Z.md`](docs/project/snapshots/submission_snapshot_20260424T164826Z.md)
- **HF blog draft (publish-ready):** [`docs/blog/blog_post_hf.md`](docs/blog/blog_post_hf.md)
- **YouTube (1 min, latest):** [https://www.youtube.com/watch?v=yZnUS1-F5p0](https://www.youtube.com/watch?v=yZnUS1-F5p0)
- **YouTube (8 min, detailed):** [https://www.youtube.com/watch?v=a9YZF30tomw](https://www.youtube.com/watch?v=a9YZF30tomw)

### Canonical evidence snapshot (frozen)

From [`docs/project/snapshots/submission_snapshot_20260424T164826Z.md`](docs/project/snapshots/submission_snapshot_20260424T164826Z.md):

- Episodes: `387`
- Average reward: `0.4634`
- Best reward: `1.0032`
- Baseline reward: `0.265`
- Improvement: `+74.9%`

### Baseline vs trained (quick read)

| Signal | Baseline (pre-event benchmark) | Trained (latest frozen snapshot) |
|---|---:|---:|
| Average reward | 0.2650 | 0.4634 |
| Best reward | - | 1.0032 |
| Improvement | - | +74.9% |
| Behavioral evidence | Scripted/weak coordination baseline pattern | See [`docs/project/JUDGING_EVIDENCE_INDEX.md`](docs/project/JUDGING_EVIDENCE_INDEX.md) |

## Quick Start

```bash
pip install -r requirements.txt

# Run all tests (~220+)
pytest tests/ -q

# Start the server
uvicorn server.app:app --reload --port 7860

# Open the incident command dashboard
open http://localhost:7860/web

# Auto-demo (no server needed)
python -c "from server.app import run_demo; import json; print(json.dumps(run_demo('INC003'), indent=2))"
```

## Architecture

6 agents | 5 enterprise tools | 7 incident cases | OpenEnv v0.2.3

```
Incident Commander (IC)    ← trained via GRPO on Qwen2.5-1.5B
├── L1 Support             → SimSlack + SimCustomerPortal
├── L2 Engineer            → SimDatadog (rate-limited)
├── SRE Agent              → SimRunbook (schema drift v1→v2 in INC007)
├── Product Manager        → SimJira (VP approval + change freeze)
└── Oversight Agent        → monitor() + analyse() + explain()
```

Each agent has **partial observability** — only sees its role-scoped tool outputs. The IC synthesizes partial views into a coordinated incident response.

## Reward Model

```
episode_reward = (
    0.30 × mttr_score       # faster resolution
  + 0.25 × diagnosis_score  # root cause + evidence (anti-shortcut)
  + 0.20 × customer_score   # proactive notification required
  + 0.15 × coordination     # no duplicate tool queries
  + 0.05 × oversight        # protocol compliance
  + depth_bonus             # UNCAPPED reasoning quality (Mercor)
)
```

Expert criteria rotate every 4 episodes (speed/communication/technical/cost) — Snorkel AI sub-theme.

## Incident Library

| ID | Difficulty | Key Challenge |
|----|------------|---------------|
| INC001 | Easy | Payment service timeout |
| INC002 | Easy | DB pool exhaustion, cascade |
| **INC003** | **Medium** | **Red herrings + ML memory leak** ← primary demo |
| INC004 | Hard | Vendor retry storm, masked root cause |
| INC005 | Hard | JWT key mismatch, conflicting signals |
| INC006 | Very Hard | Multi-region CDN misrouting |
| INC007 | Nightmare | CrowdStrike-scale + live schema drift |

## Training

```bash
# GRPO fine-tuning (run in Colab with GPU)
# Open notebooks/grpo_colab_v2.ipynb

# Pre-event baseline (30 episodes, avg reward 0.265)
python training/train.py --episodes 30 --difficulties easy,medium
```

## Training data visualization

Episode rewards from the deployed Space (`GET /learning-curve`) — same data as the dashboard training tab.

```bash
python scripts/export_reward_plot.py \
  --url https://kunalkachru23-nexus-enhanced-stage.hf.space \
  --out docs/images/training_reward_curve.png
```

![Reward history (HF stage Space, x-axis=episode, y-axis=reward)](docs/images/training_reward_curve.png)

Caption: blue line is per-episode reward, green is rolling average, red dashed line is baseline (`0.265`).

## OpenEnv (reproduce)

Per **hackathon compliance criteria**, the submission uses **OpenEnv (latest release)** in the toolchain—not only a custom HTTP server. Reproduce validation with the commands below.

**Local (dev machine, after `pip install "openenv>=0.2.3"`):**

```bash
cd nexus-enhanced
openenv validate .
pytest tests/ -q
uvicorn server.app:app --host 127.0.0.1 --port 7860
# second terminal:
openenv validate --url http://127.0.0.1:7860
```

**HF Space (after `openenv push`):** use your Space URL, e.g. `https://kunalkachru23-nexus-enhanced-stage.hf.space`:

```bash
openenv validate --url https://kunalkachru23-nexus-enhanced-stage.hf.space
./gate.sh --skip-regression --skip-local-api --hf-url https://kunalkachru23-nexus-enhanced-stage.hf.space
```

**Deploying with OpenEnv:** use `openenv push . --repo-id <user>/<space> --exclude .hfignore` (or **`./gate.sh --push`**, which adds `--exclude` for you). OpenEnv does not load `.hfignore` unless you pass it via `--exclude`; omitting it does **not** break the build, it only uploads extra paths (less lean). See `docs/guides/QUICK_START.md` for a short rationale.

`requirements.txt` **omits** `openenv` on the Space Docker image to keep builds reliable; the **Colab notebook** installs `openenv>=0.2.3` to satisfy the **Colab + OpenEnv** portion of compliance. Contract-only routes (`/metadata`, `/schema`, `GET /state`, `POST /mcp`) satisfy `openenv validate --url`; episode logic uses **`/reset`**, **`/step/{session_id}`**, **`/state/{session_id}`** only.

## API Endpoints

| Method | Path | Description |
|--------|------|-------------|
| POST | `/reset` | Start new episode |
| POST | `/step/{session_id}` | Execute IC action |
| GET | `/state/{session_id}` | Full episode state |
| GET | `/reward/{session_id}` | Live reward breakdown |
| POST | `/demo/run/{incident_id}` | Auto-demo mode |
| GET | `/web` | Incident command dashboard |
| GET | `/health` | Health (`status: healthy` for OpenEnv CLI) |
| GET | `/metadata` | OpenEnv discovery stub |
| GET | `/schema` | OpenEnv schema stub |
| GET | `/state` | OpenAPI stub (use `/state/{session_id}` for data) |
| POST | `/mcp` | OpenEnv JSON-RPC stub |
| GET | `/metrics` | Training metrics |

## Sub-Theme Coverage

- **Scaler AI Labs** — 5 enterprise tools with business rule nuances
- **Fleet AI** — OversightAgent: monitor + analyse + explain
- **Halluminate** — 6 agents + coalition debate + partial observability
- **Scale AI** — IT incident management domain
- **Mercor** — Uncapped reasoning depth bonus
- **Snorkel AI** — Rotating expert review board (4 criteria)
- **Patronus AI** — Live schema drift in INC007 at step 18

## Pitch, plan, and compliance evidence

Documentation lives under [`docs/`](docs/) (guides, deployment, project status, pitch/demo scripts, blog drafts).

- **[`docs/pitch/PITCH_3MIN.md`](docs/pitch/PITCH_3MIN.md)** — 3-minute spoken script.
- **[`docs/project/JUDGING_EVIDENCE_INDEX.md`](docs/project/JUDGING_EVIDENCE_INDEX.md)** — canonical compliance and judging evidence map.
- **`scripts/export_reward_plot.py`** — export reward curve PNG from `--url` or `episode_rewards.json` (slides / observable improvement evidence). Canonical chart (tracked in git): **`docs/images/training_reward_curve.png`** (see section above).

## Final submission checklist (compliance-ready)

- [ ] Space URL is live and included in final form: `https://kunalkachru23-nexus-enhanced-stage.hf.space/`
- [ ] `openenv validate .` passes locally.
- [ ] `openenv validate --url https://kunalkachru23-nexus-enhanced-stage.hf.space` passes.
- [ ] Full gate green: `./gate.sh --push`
- [ ] Latest frozen snapshot files are refreshed and referenced:
  - `docs/project/snapshots/submission_snapshot_20260424T164826Z.md`
  - `docs/project/snapshots/component_metrics_20260424T164826Z.md`
- [ ] Blog/video/slide links are present and clickable from this README:
  - Blog draft: [`docs/blog/blog_post_hf.md`](docs/blog/blog_post_hf.md)
  - Video link (1 min, latest): [https://www.youtube.com/watch?v=yZnUS1-F5p0](https://www.youtube.com/watch?v=yZnUS1-F5p0)
  - Video link (8 min, detailed): [https://www.youtube.com/watch?v=a9YZF30tomw](https://www.youtube.com/watch?v=a9YZF30tomw)
- [ ] Pitch numbers match frozen snapshot values (no stale metrics in scripts).

## Blog Post

See [`docs/blog/blog_post_hf.md`](docs/blog/blog_post_hf.md) for the publish-ready HuggingFace blog draft (includes reward model deep-dive, training methodology, and demo walkthrough). Publish and add the public URL to your submission package (blog or short video, per organizer requirements).

## Team

Team Falcons — [kunalkachru23@gmail.com](mailto:kunalkachru23@gmail.com)