Publish Ropedia Xperience-10M task baseline cards

ae4f6df verified 28 days ago

6.57 kB

	# Evidence Contract

	This project is intentionally audit-first. Every visible claim should point to a
	local artifact that a reader can inspect before trusting the dashboard.

	\| Claim \| Current evidence \| Status \| Boundary \|
	\| --- \| --- \| --- \| --- \|
	\| The public Xperience-10M sample has been converted into aligned model windows. \| `artifacts/episode_task_suite/windows.csv`, `artifacts/episode_task_suite/shared_windows.npz`, `artifacts/episode_task_suite/summary_report.json` \| Verified for 5,821 frames and 1,161 windows \| One public sample episode only \|
	\| The current feature contract is explicit and reviewable. \| `artifacts/episode_task_suite/feature_manifest.json`, `artifacts/episode_task_suite/available_modalities.json` \| Verified for an 8,378-d feature vector \| Audio is present in MP4 streams but not yet a feature block \|
	\| The public sample modalities are inspectable without raw data redistribution. \| `metrics/modality_atlas.json`, `assets/modalities/`, website modality atlas \| Verified derived thumbnail atlas \| Thumbnails are presentation/review assets, not a replacement for official raw data access \|
	\| The 12 task heads are real scripts and artifacts, not presentation placeholders. \| `scripts/episode_task_suite.py`, `artifacts/episode_task_suite//metrics.json`, `artifacts/episode_task_suite//predictions.*` \| Verified for all 12 task definitions \| Chronological single-episode split, not cross-episode generalization \|
	\| Minimal and neural heads use the same task contracts. \| `scripts/neural_task_models.py`, `artifacts/episode_task_suite/neural_mlp/`, `assets/task_architectures.png` \| Verified for 12 minimal heads and 12 neural MLP heads \| Small heads only; not a foundation model \|
	\| Four Ropedia research directions are mapped honestly as direct, proxy, or diagnostic evidence. \| `artifacts/episode_task_suite/research_directions/research_direction_taxonomy.json`, `metrics/research_directions.json` \| Verified taxonomy \| Some directions remain proxy-only \|
	\| Four extra direction probes are coded and evaluated. \| `artifacts/episode_task_suite/research_direction_extensions/research_direction_extension_results.json`, `metrics/research_direction_extensions.json` \| Verified single-episode probes \| Not full human modeling, neural rendering, intent modeling, or world modeling solutions \|
	\| Qwen3-Omni infrastructure has passed technical smoke checks. \| Companion GitHub repo: `results/omni_finetune/RUN_REPORT.md`, `results/omni_finetune/dataset_manifest.json`, `results/omni_finetune/metrics_eval.json` \| Smoke-only evidence \| One episode, 128 train windows; not a 32-episode pilot \|
	\| The real 32-episode LoRA pilot is blocked on gated data access, not on repo presentation. \| Companion GitHub repo: `results/omni_finetune/DATA_BLOCKER_REPORT.md`, `results/omni_finetune/A100_HF_RELAY_STATUS.md`, `results/omni_finetune/source_discovery.json` \| Blocker documented \| No 32-episode metric should be claimed until the gate passes \|
	\| Historical `32ep` path strings are not treated as 32-episode results. \| `scripts/validate_scope_claims.py`, `metrics/scope_claims_audit.json` \| Verified pass \| Classifies old run/path identifiers and fails if public presentation claims real 32-episode metrics \|
	\| Prepared GitHub/Hugging Face mirrors carry matching critical files. \| `scripts/validate_mirror_parity.py`, `metrics/mirror_parity.json` \| Verified pass \| Compares prepared data files, visual assets, website HTML, and validator scripts before upload; live URLs are checked after publishing \|
	\| The public GitHub and Hugging Face bundles are publication-clean. \| `scripts/validate_publication_package.py`, `metrics/publication_audit.json` \| Verified pass \| Checks public files, HF bundles, and public-card freshness; ignored local scratch outputs are excluded \|
	\| The public website has checked local references. \| `scripts/validate_website_integrity.py`, `metrics/website_integrity.json` \| Verified pass \| Checks local links, anchors, JSON data, and referenced images; external URLs are not fetched \|
	\| The core proof artifacts are indexed and grouped for fast review. \| `ARTIFACT_GUIDE.md`, `scripts/build_artifact_index.py`, `metrics/artifact_index.json` \| Verified guide and index \| Selective source-of-truth catalog, not a complete inventory of every output file \|
	\| The public reproduction path is documented. \| `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json`, `notes/reproducibility_audit.md` \| Verified documentation and prior exact-match audit \| Publicly reproduces the single-episode pipeline, not the gated 32-episode Qwen3-Omni pilot \|
	\| The project is externally citable and machine-readable. \| `CITATION.cff`, `codemeta.json`, `metrics/project_manifest.json`, `LICENSE` \| Verified metadata files \| Code license does not override original Xperience-10M dataset terms \|
	\| A first-time reviewer has an explicit audit path. \| `metrics/reviewer_packet.json`, website reviewer section, README reviewer path \| Verified reviewer packet \| It guides inspection; it does not add new experimental claims \|

	## Review Order

	1. Read `metrics/reviewer_packet.json` for the shortest audit path and proof
	boundary.
	2. Read `ARTIFACT_GUIDE.md` and `metrics/artifact_index.json` to see grouped
	reviewer artifacts, indexed proof artifacts,
	sizes, and stable-file hashes.
	3. Read `assets/task_suite_infographic.png` and
	`metrics/modality_atlas.json` for the high-level map and modality atlas.
	4. Read `REPRODUCIBILITY.md` and `metrics/reproducibility_matrix.json` before
	rerunning the public pipeline.
	5. Inspect `artifacts/episode_task_suite/summary_report.json` for the task and
	metric source of truth.
	6. Inspect `artifacts/episode_task_suite/feature_manifest.json` to see which
	modalities enter the current feature vector.
	7. Inspect `artifacts/episode_task_suite/neural_mlp/` to compare minimal and
	neural heads under the same splits.
	8. Inspect `metrics/scope_claims_audit.json` before interpreting historical
	`32ep` strings in Qwen3-Omni smoke artifacts.
	9. Inspect `metrics/mirror_parity.json` before assuming the GitHub and
	Hugging Face mirrors contain the same critical data, visual, HTML, and
	validator files.
	10. Inspect the companion GitHub repo's
	`results/omni_finetune/DATA_BLOCKER_REPORT.md` before interpreting any
	Qwen3-Omni artifact.
	11. Inspect `metrics/publication_audit.json` and
	`metrics/website_integrity.json` before publishing or sharing the project
	externally.
	12. Inspect `CITATION.cff`, `codemeta.json`, and `LICENSE` before reusing or
	citing the project.