Evidence Contract
This project is intentionally audit-first. Every visible claim should point to a local artifact that a reader can inspect before trusting the dashboard.
| Claim | Current evidence | Status | Boundary |
|---|---|---|---|
| The public Xperience-10M sample has been converted into aligned model windows. | artifacts/episode_task_suite/windows.csv, artifacts/episode_task_suite/shared_windows.npz, artifacts/episode_task_suite/summary_report.json |
Verified for 5,821 frames and 1,161 windows | One public sample episode only |
| The current feature contract is explicit and reviewable. | artifacts/episode_task_suite/feature_manifest.json, artifacts/episode_task_suite/available_modalities.json |
Verified for an 8,378-d feature vector | Audio is present in MP4 streams but not yet a feature block |
| The public sample modalities are inspectable without raw data redistribution. | metrics/modality_atlas.json, assets/modalities/, website modality atlas |
Verified derived thumbnail atlas | Thumbnails are presentation/review assets, not a replacement for official raw data access |
| The 12 task heads are real scripts and artifacts, not presentation placeholders. | scripts/episode_task_suite.py, artifacts/episode_task_suite/*/metrics.json, artifacts/episode_task_suite/*/predictions.* |
Verified for all 12 task definitions | Chronological single-episode split, not cross-episode generalization |
| Minimal and neural heads use the same task contracts. | scripts/neural_task_models.py, artifacts/episode_task_suite/neural_mlp/, assets/task_architectures.png |
Verified for 12 minimal heads and 12 neural MLP heads | Small heads only; not a foundation model |
| Four Ropedia research directions are mapped honestly as direct, proxy, or diagnostic evidence. | artifacts/episode_task_suite/research_directions/research_direction_taxonomy.json, metrics/research_directions.json |
Verified taxonomy | Some directions remain proxy-only |
| Four extra direction probes are coded and evaluated. | artifacts/episode_task_suite/research_direction_extensions/research_direction_extension_results.json, metrics/research_direction_extensions.json |
Verified single-episode probes | Not full human modeling, neural rendering, intent modeling, or world modeling solutions |
| Qwen3-Omni infrastructure has passed technical smoke checks. | Companion GitHub repo: results/omni_finetune/RUN_REPORT.md, results/omni_finetune/dataset_manifest.json, results/omni_finetune/metrics_eval.json |
Smoke-only evidence | One episode, 128 train windows; not a 32-episode pilot |
| The real 32-episode LoRA pilot is blocked on gated data access, not on repo presentation. | Companion GitHub repo: results/omni_finetune/DATA_BLOCKER_REPORT.md, results/omni_finetune/A100_HF_RELAY_STATUS.md, results/omni_finetune/source_discovery.json |
Blocker documented | No 32-episode metric should be claimed until the gate passes |
Historical 32ep path strings are not treated as 32-episode results. |
scripts/validate_scope_claims.py, metrics/scope_claims_audit.json |
Verified pass | Classifies old run/path identifiers and fails if public presentation claims real 32-episode metrics |
| Prepared GitHub/Hugging Face mirrors carry matching critical files. | scripts/validate_mirror_parity.py, metrics/mirror_parity.json |
Verified pass | Compares prepared data files, visual assets, website HTML, and validator scripts before upload; live URLs are checked after publishing |
| The public GitHub and Hugging Face bundles are publication-clean. | scripts/validate_publication_package.py, metrics/publication_audit.json |
Verified pass | Checks public files, HF bundles, and public-card freshness; ignored local scratch outputs are excluded |
| The public website has checked local references. | scripts/validate_website_integrity.py, metrics/website_integrity.json |
Verified pass | Checks local links, anchors, JSON data, and referenced images; external URLs are not fetched |
| The core proof artifacts are indexed and grouped for fast review. | ARTIFACT_GUIDE.md, scripts/build_artifact_index.py, metrics/artifact_index.json |
Verified guide and index | Selective source-of-truth catalog, not a complete inventory of every output file |
| The public reproduction path is documented. | REPRODUCIBILITY.md, metrics/reproducibility_matrix.json, notes/reproducibility_audit.md |
Verified documentation and prior exact-match audit | Publicly reproduces the single-episode pipeline, not the gated 32-episode Qwen3-Omni pilot |
| The project is externally citable and machine-readable. | CITATION.cff, codemeta.json, metrics/project_manifest.json, LICENSE |
Verified metadata files | Code license does not override original Xperience-10M dataset terms |
| A first-time reviewer has an explicit audit path. | metrics/reviewer_packet.json, website reviewer section, README reviewer path |
Verified reviewer packet | It guides inspection; it does not add new experimental claims |
Review Order
- Read
metrics/reviewer_packet.jsonfor the shortest audit path and proof boundary. - Read
ARTIFACT_GUIDE.mdandmetrics/artifact_index.jsonto see grouped reviewer artifacts, indexed proof artifacts, sizes, and stable-file hashes. - Read
assets/task_suite_infographic.pngandmetrics/modality_atlas.jsonfor the high-level map and modality atlas. - Read
REPRODUCIBILITY.mdandmetrics/reproducibility_matrix.jsonbefore rerunning the public pipeline. - Inspect
artifacts/episode_task_suite/summary_report.jsonfor the task and metric source of truth. - Inspect
artifacts/episode_task_suite/feature_manifest.jsonto see which modalities enter the current feature vector. - Inspect
artifacts/episode_task_suite/neural_mlp/to compare minimal and neural heads under the same splits. - Inspect
metrics/scope_claims_audit.jsonbefore interpreting historical32epstrings in Qwen3-Omni smoke artifacts. - Inspect
metrics/mirror_parity.jsonbefore assuming the GitHub and Hugging Face mirrors contain the same critical data, visual, HTML, and validator files. - Inspect the companion GitHub repo's
results/omni_finetune/DATA_BLOCKER_REPORT.mdbefore interpreting any Qwen3-Omni artifact. - Inspect
metrics/publication_audit.jsonandmetrics/website_integrity.jsonbefore publishing or sharing the project externally. - Inspect
CITATION.cff,codemeta.json, andLICENSEbefore reusing or citing the project.