# Evidence Contract

This project is organized as a research-development workspace. Every visible
project statement should point to a local artifact that a reader can inspect before using
the dashboard as a basis for further work.

| Project statement | Current evidence | Status | Current scope |
| --- | --- | --- | --- |
| A first-pass reader has a compact current-state summary. | `PROJECT_STATUS.md`, `docs/data/project_status.json` | Verified guide | Summarizes existing evidence and current limitations |
| The research roadmap is explicit. | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Current roadmap | Connects public-sample task development to multi-episode data preparation, Qwen3-Omni LoRA, robustness runs, and larger omni-model extensions |
| The public dataset description is aligned with the official gated Xperience-10M dataset card and public sample card. | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `docs/data/xperience10m_dataset_card_alignment.json` | Verified description alignment | Summarizes upstream public metadata, API listing facts, sample license/tooling, and card facts; does not grant access or mirror raw data |
| Source facts, sample details, API-listing notes, and project coverage are aligned across repo, website, and HF cards. | `SOURCE_ALIGNMENT_AUDIT.md`, `docs/data/source_alignment_audit.json`, `scripts/validate_source_alignment.py` | Source alignment recorded | Offline committed-fact report; does not fetch private gated data |
| Public figures are indexed as project evidence. | `FIGURE_INDEX.md`, `docs/data/figure_index.json`, `scripts/build_figure_index.py` | Verified visual evidence | Derived figures and thumbnails only; does not include raw MP4/HDF5/RRD data |
| The project logo is consistently packaged across public surfaces. | `docs/data/brand_assets.json`, `docs/assets/brand/`, `scripts/build_brand_assets.py` | Verified brand packaging | Generated presentation assets only; does not contain raw Xperience-10M data or model weights |
| The public Xperience-10M sample has been converted into aligned model windows. | `results/episode_task_suite/windows.csv`, `results/episode_task_suite/shared_windows.npz`, `results/episode_task_suite/summary_report.json` | Verified for 5,821 frames and 1,161 windows | One public sample episode only |
| The current feature contract is explicit and inspectable. | `results/episode_task_suite/feature_manifest.json`, `results/episode_task_suite/available_modalities.json` | Verified for an 8,546-d feature vector | Synchronized video, audio, depth, pose/SLAM, motion, inertial, calibration, and language signals are represented |
| The task evaluation protocol is explicit and generated from committed metrics. | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json`, `scripts/build_evaluation_protocol.py` | Verified protocol | Defines windows, split, per-task metrics, leakage controls, and current limitations |
| The public sample modalities are inspectable without raw data redistribution. | `docs/data/modality_atlas.json`, `docs/assets/modalities/`, website modality atlas | Verified derived thumbnail atlas | Thumbnails are presentation assets, not a replacement for official raw data access |
| Public task cards stay readable for non-expert readers. | `docs/data/task_surface_integrity.json`, `scripts/validate_task_surface.py`, website task cards/player | Task-surface report | Presentation layer only; it does not add model quality or new data |
| The unified 20-task suite is implemented with saved metrics, predictions, and source-linked result rows. | `TASK_SUITE_20.md`, `docs/data/task_suite_20.json`, `docs/data/task_method_20_result_matrix.json`, `TASK_METHOD_20_SOURCE_AUDIT.md` | Verified for all 20 task definitions and 180 method-task result rows | Chronological single-episode split for public-sample rows; selected 128-episode rows use the documented 96/16/16 split |
| Minimal and neural heads use the same task contracts. | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/`, `docs/assets/task_architectures.png` | Verified for 12 minimal heads and 12 neural MLP heads | Small heads only; not a foundation model |
| Four Ropedia research directions are mapped honestly as direct, proxy, or diagnostic evidence. | `results/episode_task_suite/research_directions/research_direction_taxonomy.json`, `docs/data/research_directions.json` | Verified taxonomy | Some directions remain proxy-only |
| Four extra direction probes are coded and evaluated. | `results/episode_task_suite/research_direction_extensions/research_direction_extension_results.json`, `docs/data/research_direction_extensions.json` | Verified single-episode probes | Not full human modeling, neural rendering, intent modeling, or world modeling solutions |
| Qwen3-Omni infrastructure has passed setup checks. | `results/omni_finetune/RUN_REPORT.md`, `results/omni_finetune/dataset_manifest.json`, `results/omni_finetune/metrics_eval.json` | Setup-stage evidence | One episode, 128 train windows; full metrics require completed multi-episode data preparation and held-out evaluation |
| The Qwen3-Omni LoRA pilot is in selected multi-episode preparation. | `results/omni_finetune/DATA_ACCESS_STATUS.md`, `results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md`, `results/omni_finetune/source_discovery.json` | Data preparation | The gated Xperience-10M dataset is available; held-out metrics come after manifest construction, training, and test evaluation |
| Older pilot path strings are tracked as setup-file provenance. | `scripts/validate_scope_claims.py`, `docs/data/scope_claims_audit.json` | Multi-episode pilot status | Run/path identifiers stay separate from completed held-out-episode results |
| Prepared GitHub/Hugging Face mirrors carry matching critical files. | `scripts/validate_mirror_parity.py`, `docs/data/mirror_parity.json` | Mirror parity report | Compares prepared data files, visual assets, website HTML, and validator scripts before upload; live URLs are checked after publishing |
| The public GitHub and Hugging Face bundles are ready to share. | `scripts/validate_publication_package.py`, `docs/data/publication_audit.json` | Public bundle contents | Covers public files, HF bundles, and current public-card assets; temporary local outputs are excluded |
| The public repo, website, and Hugging Face cards present one cohesive research project. | `PUBLIC_SURFACE_QA.md`, `scripts/build_public_surface_qa.py`, `docs/data/public_surface_qa.json` | Public project surface | Covers SEO/social metadata, accessible tab semantics, public links, project links, and clear project presentation |
| The public website has validated local references. | `scripts/validate_website_integrity.py`, `docs/data/website_integrity.json` | Website reference report | Covers local links, anchors, JSON data, and referenced images; external URLs are not fetched |
| The rendered website walkthrough has a browser-level interaction check. | `RENDERED_SITE_CHECK.md`, `scripts/build_rendered_site_check.py`, `docs/data/rendered_site_check.json` | Rendered website check | Covers local page load, tab switch, walkthrough deep link, player controls, and console health |
| The release checks are explicit. | `QUALITY_GATES.md`, `scripts/build_quality_gates.py`, `docs/data/quality_gates.json` | Release checks | Summarizes packaging and live-mirror checks; cross-episode model quality is measured by later held-out reports |
| The live public mirrors are verified after upload. | `scripts/verify_live_publication.py`, `docs/data/live_publication_status.json` | Live publication report | Fetches public GitHub/HF URLs; it does not validate private training state |
| The core project artifacts are indexed and grouped for fast reading. | `ARTIFACT_GUIDE.md`, `scripts/build_artifact_index.py`, `docs/data/artifact_index.json` | Verified guide and index | Selective source-of-truth catalog, not a complete inventory of every output file |
| The public reproduction path is documented. | `REPRODUCIBILITY.md`, `docs/data/reproducibility_matrix.json`, `notes/reproducibility_audit.md` | Verified documentation and prior exact-match check | Publicly reproduces the single-episode pipeline; multi-episode Qwen3-Omni metrics are added only after staging and held-out evaluation |
| The project is externally citable and machine-readable. | `CITATION.cff`, `codemeta.json`, `docs/data/project_manifest.json`, `LICENSE` | Verified metadata files | Code license does not override original Xperience-10M dataset terms |
| A first-time reader has an explicit project path. | `docs/data/project_packet.json`, website project path section, README project path | Verified project packet | Guides inspection across data, tasks, results, and scale-up status |

## Reading Order

1. Read `PROJECT_STATUS.md` and `docs/data/project_status.json` for
   the fastest current-state decision table.
2. Read `RESEARCH_ROADMAP.md` and `docs/data/research_roadmap.json` for the
   research path from public-sample development to multi-episode modeling.
3. Read `docs/data/project_packet.json` for the shortest project path and
   current scope.
4. Read `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` and
   `docs/data/xperience10m_dataset_card_alignment.json` to check the official
   dataset-card wording and how the current repo is scoped against it.
5. Read `SOURCE_ALIGNMENT_AUDIT.md` and
   `docs/data/source_alignment_audit.json` to inspect the same source facts
   present across repo, website, and HF cards.
6. Read `FIGURE_INDEX.md`, `docs/data/figure_index.json`, and
   `docs/data/brand_assets.json` to inspect public figures, charts, modality
   thumbnails, logo assets, dimensions, hashes, and source scripts.
7. Read `EVALUATION_PROTOCOL.md` and `docs/data/evaluation_protocol.json` to
   check windowing, split policy, per-task metrics, leakage controls, and
   current limitations.
8. Read `ARTIFACT_GUIDE.md` and `docs/data/artifact_index.json` to see grouped
   project artifacts, indexed supporting artifacts,
   sizes, and stable-file hashes.
9. Read `docs/assets/task_suite_infographic.png` and
   `docs/data/modality_atlas.json` for the high-level map and modality atlas.
10. Read `REPRODUCIBILITY.md` and `docs/data/reproducibility_matrix.json` before
   rerunning the public pipeline.
11. Inspect `results/episode_task_suite/summary_report.json` for the task and
   metric source of truth.
12. Inspect `results/episode_task_suite/feature_manifest.json` to see which
   modalities enter the current feature vector.
13. Inspect `results/episode_task_suite/neural_mlp/` to compare minimal and
   neural heads under the same splits.
14. Inspect `docs/data/scope_claims_audit.json` before interpreting older
   Qwen3-Omni setup artifacts.
15. Inspect `docs/data/mirror_parity.json` before assuming the GitHub and
   Hugging Face mirrors contain the same critical data, visual, HTML, and
   validator files.
16. Inspect `results/omni_finetune/DATA_ACCESS_STATUS.md` and
   `results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md` before interpreting
   any Qwen3-Omni artifact.
17. Inspect `QUALITY_GATES.md`, `docs/data/quality_gates.json`,
   `PUBLIC_SURFACE_QA.md`, `docs/data/public_surface_qa.json`,
   `docs/data/publication_audit.json`, and `docs/data/website_integrity.json`
   before sharing a new public release.
18. Inspect `CITATION.cff`, `codemeta.json`, and `LICENSE` before reusing or
   citing the project.