cy0307 commited on
Commit
94a5118
·
verified ·
1 Parent(s): d1e380e

Publish Ropedia Xperience-10M task baseline cards

Browse files
ARTIFACT_GUIDE.md CHANGED
@@ -8,13 +8,15 @@ The project intentionally separates five layers:
8
 
9
  1. **Proof boundary:** what is claimed, what is smoke-only, and what remains
10
  gated by data access.
11
- 2. **Data contract:** how one public Xperience-10M sample episode becomes
 
 
12
  aligned model windows and feature blocks.
13
- 3. **Task evidence:** minimal and neural results for the 12 task contracts plus
14
  four research-direction extension probes.
15
- 4. **Reproducibility:** public commands, expected outputs, and exact-match audit
16
  evidence for the single-episode pipeline.
17
- 5. **Scale-up status:** scripts and reports for the planned 32-episode
18
  Qwen3-Omni pilot, without claiming those results before data access lands.
19
 
20
  ## Start Here
@@ -23,8 +25,10 @@ The project intentionally separates five layers:
23
  | --- | --- |
24
  | [`EVIDENCE_CONTRACT.md`](EVIDENCE_CONTRACT.md) | Defines which claims are verified and which are explicitly not claimed. |
25
  | [`QUALITY_GATES.md`](QUALITY_GATES.md) | Lists the automated release gates and post-publish checks required before presenting a release as current. |
 
26
  | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
27
  | [`docs/data/artifact_index.json`](docs/data/artifact_index.json) | Lists reviewer-critical files with existence, size, and stable hashes. |
 
28
  | [`docs/data/quality_gates.json`](docs/data/quality_gates.json) | Machine-readable quality-gate summary for website and HF mirrors. |
29
  | [`docs/data/live_publication_status.json`](docs/data/live_publication_status.json) | Last live GitHub/HF verification after upload. |
30
  | [`docs/data/mirror_parity.json`](docs/data/mirror_parity.json) | Confirms prepared HF Space, artifact, and model mirrors match the repo for critical data, figures, website HTML, and validator scripts. |
@@ -33,6 +37,13 @@ The project intentionally separates five layers:
33
  | [`docs/data/website_integrity.json`](docs/data/website_integrity.json) | Confirms local site links, anchors, JSON bundles, and referenced images resolve. |
34
  | [`docs/data/reviewer_packet.json`](docs/data/reviewer_packet.json) | Gives the shortest machine-readable reviewer route. |
35
 
 
 
 
 
 
 
 
36
  ## Data Contract
37
 
38
  | Artifact | What it proves |
 
8
 
9
  1. **Proof boundary:** what is claimed, what is smoke-only, and what remains
10
  gated by data access.
11
+ 2. **Official source alignment:** what the upstream Xperience-10M dataset card
12
+ says, and which parts this repo currently covers.
13
+ 3. **Data contract:** how one public Xperience-10M sample episode becomes
14
  aligned model windows and feature blocks.
15
+ 4. **Task evidence:** minimal and neural results for the 12 task contracts plus
16
  four research-direction extension probes.
17
+ 5. **Reproducibility:** public commands, expected outputs, and exact-match audit
18
  evidence for the single-episode pipeline.
19
+ 6. **Scale-up status:** scripts and reports for the planned 32-episode
20
  Qwen3-Omni pilot, without claiming those results before data access lands.
21
 
22
  ## Start Here
 
25
  | --- | --- |
26
  | [`EVIDENCE_CONTRACT.md`](EVIDENCE_CONTRACT.md) | Defines which claims are verified and which are explicitly not claimed. |
27
  | [`QUALITY_GATES.md`](QUALITY_GATES.md) | Lists the automated release gates and post-publish checks required before presenting a release as current. |
28
+ | [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md) | Aligns this repo's public dataset wording with the official gated Xperience-10M dataset card. |
29
  | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
30
  | [`docs/data/artifact_index.json`](docs/data/artifact_index.json) | Lists reviewer-critical files with existence, size, and stable hashes. |
31
+ | [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json) | Machine-readable official dataset-card alignment summary. |
32
  | [`docs/data/quality_gates.json`](docs/data/quality_gates.json) | Machine-readable quality-gate summary for website and HF mirrors. |
33
  | [`docs/data/live_publication_status.json`](docs/data/live_publication_status.json) | Last live GitHub/HF verification after upload. |
34
  | [`docs/data/mirror_parity.json`](docs/data/mirror_parity.json) | Confirms prepared HF Space, artifact, and model mirrors match the repo for critical data, figures, website HTML, and validator scripts. |
 
37
  | [`docs/data/website_integrity.json`](docs/data/website_integrity.json) | Confirms local site links, anchors, JSON bundles, and referenced images resolve. |
38
  | [`docs/data/reviewer_packet.json`](docs/data/reviewer_packet.json) | Gives the shortest machine-readable reviewer route. |
39
 
40
+ ## Official Source Alignment
41
+
42
+ | Artifact | What it proves |
43
+ | --- | --- |
44
+ | [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md) | Human-readable summary of the official gated Xperience-10M dataset card, scale, modalities, access boundary, intended uses, and limitations. |
45
+ | [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json) | Machine-readable copy of the same alignment facts for website and HF mirrors. |
46
+
47
  ## Data Contract
48
 
49
  | Artifact | What it proves |
EVIDENCE_CONTRACT.md CHANGED
@@ -5,6 +5,7 @@ local artifact that a reader can inspect before trusting the dashboard.
5
 
6
  | Claim | Current evidence | Status | Boundary |
7
  | --- | --- | --- | --- |
 
8
  | The public Xperience-10M sample has been converted into aligned model windows. | `results/episode_task_suite/windows.csv`, `results/episode_task_suite/shared_windows.npz`, `results/episode_task_suite/summary_report.json` | Verified for 5,821 frames and 1,161 windows | One public sample episode only |
9
  | The current feature contract is explicit and reviewable. | `results/episode_task_suite/feature_manifest.json`, `results/episode_task_suite/available_modalities.json` | Verified for an 8,378-d feature vector | Audio is present in MP4 streams but not yet a feature block |
10
  | The public sample modalities are inspectable without raw data redistribution. | `docs/data/modality_atlas.json`, `docs/assets/modalities/`, website modality atlas | Verified derived thumbnail atlas | Thumbnails are presentation/review assets, not a replacement for official raw data access |
@@ -29,28 +30,31 @@ local artifact that a reader can inspect before trusting the dashboard.
29
 
30
  1. Read `docs/data/reviewer_packet.json` for the shortest audit path and proof
31
  boundary.
32
- 2. Read `ARTIFACT_GUIDE.md` and `docs/data/artifact_index.json` to see grouped
 
 
 
33
  reviewer artifacts, indexed proof artifacts,
34
  sizes, and stable-file hashes.
35
- 3. Read `docs/assets/task_suite_infographic.png` and
36
  `docs/data/modality_atlas.json` for the high-level map and modality atlas.
37
- 4. Read `REPRODUCIBILITY.md` and `docs/data/reproducibility_matrix.json` before
38
  rerunning the public pipeline.
39
- 5. Inspect `results/episode_task_suite/summary_report.json` for the task and
40
  metric source of truth.
41
- 6. Inspect `results/episode_task_suite/feature_manifest.json` to see which
42
  modalities enter the current feature vector.
43
- 7. Inspect `results/episode_task_suite/neural_mlp/` to compare minimal and
44
  neural heads under the same splits.
45
- 8. Inspect `docs/data/scope_claims_audit.json` before interpreting historical
46
  `32ep` strings in Qwen3-Omni smoke artifacts.
47
- 9. Inspect `docs/data/mirror_parity.json` before assuming the GitHub and
48
  Hugging Face mirrors contain the same critical data, visual, HTML, and
49
  validator files.
50
- 10. Inspect `results/omni_finetune/DATA_BLOCKER_REPORT.md` before interpreting
51
  any Qwen3-Omni artifact.
52
- 11. Inspect `QUALITY_GATES.md`, `docs/data/quality_gates.json`,
53
  `docs/data/publication_audit.json`, and `docs/data/website_integrity.json`
54
  before publishing or sharing the project externally.
55
- 12. Inspect `CITATION.cff`, `codemeta.json`, and `LICENSE` before reusing or
56
  citing the project.
 
5
 
6
  | Claim | Current evidence | Status | Boundary |
7
  | --- | --- | --- | --- |
8
+ | The public dataset description is aligned with the official gated Xperience-10M dataset card. | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `docs/data/xperience10m_dataset_card_alignment.json` | Verified description alignment | Summarizes upstream public metadata and card facts; does not grant access or mirror raw data |
9
  | The public Xperience-10M sample has been converted into aligned model windows. | `results/episode_task_suite/windows.csv`, `results/episode_task_suite/shared_windows.npz`, `results/episode_task_suite/summary_report.json` | Verified for 5,821 frames and 1,161 windows | One public sample episode only |
10
  | The current feature contract is explicit and reviewable. | `results/episode_task_suite/feature_manifest.json`, `results/episode_task_suite/available_modalities.json` | Verified for an 8,378-d feature vector | Audio is present in MP4 streams but not yet a feature block |
11
  | The public sample modalities are inspectable without raw data redistribution. | `docs/data/modality_atlas.json`, `docs/assets/modalities/`, website modality atlas | Verified derived thumbnail atlas | Thumbnails are presentation/review assets, not a replacement for official raw data access |
 
30
 
31
  1. Read `docs/data/reviewer_packet.json` for the shortest audit path and proof
32
  boundary.
33
+ 2. Read `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` and
34
+ `docs/data/xperience10m_dataset_card_alignment.json` to check the official
35
+ dataset-card wording and how the current repo is scoped against it.
36
+ 3. Read `ARTIFACT_GUIDE.md` and `docs/data/artifact_index.json` to see grouped
37
  reviewer artifacts, indexed proof artifacts,
38
  sizes, and stable-file hashes.
39
+ 4. Read `docs/assets/task_suite_infographic.png` and
40
  `docs/data/modality_atlas.json` for the high-level map and modality atlas.
41
+ 5. Read `REPRODUCIBILITY.md` and `docs/data/reproducibility_matrix.json` before
42
  rerunning the public pipeline.
43
+ 6. Inspect `results/episode_task_suite/summary_report.json` for the task and
44
  metric source of truth.
45
+ 7. Inspect `results/episode_task_suite/feature_manifest.json` to see which
46
  modalities enter the current feature vector.
47
+ 8. Inspect `results/episode_task_suite/neural_mlp/` to compare minimal and
48
  neural heads under the same splits.
49
+ 9. Inspect `docs/data/scope_claims_audit.json` before interpreting historical
50
  `32ep` strings in Qwen3-Omni smoke artifacts.
51
+ 10. Inspect `docs/data/mirror_parity.json` before assuming the GitHub and
52
  Hugging Face mirrors contain the same critical data, visual, HTML, and
53
  validator files.
54
+ 11. Inspect `results/omni_finetune/DATA_BLOCKER_REPORT.md` before interpreting
55
  any Qwen3-Omni artifact.
56
+ 12. Inspect `QUALITY_GATES.md`, `docs/data/quality_gates.json`,
57
  `docs/data/publication_audit.json`, and `docs/data/website_integrity.json`
58
  before publishing or sharing the project externally.
59
+ 13. Inspect `CITATION.cff`, `codemeta.json`, and `LICENSE` before reusing or
60
  citing the project.
README.md CHANGED
@@ -73,6 +73,13 @@ map, then mirror the responsive modality atlas metadata in
73
  `metrics/modality_atlas.json`, with standalone derived thumbnails in
74
  `assets/modalities/`.
75
 
 
 
 
 
 
 
 
76
  The committed heads are intentionally small:
77
 
78
  - z-score + linear softmax classifiers,
@@ -98,6 +105,7 @@ Their purpose is to make every input/output contract auditable before scaling to
98
  | 5 | What is still pending? | companion GitHub `results/omni_finetune/DATA_BLOCKER_REPORT.md` and `A100_HF_RELAY_STATUS.md` |
99
 
100
  Human-readable artifact guide mirror: `ARTIFACT_GUIDE.md`.
 
101
  Publication quality gates mirror: `QUALITY_GATES.md` and `metrics/quality_gates.json`.
102
  Live publication status mirror: `metrics/live_publication_status.json`.
103
  Machine-readable reviewer packet mirror: `metrics/reviewer_packet.json`.
@@ -118,6 +126,7 @@ Source-of-truth artifact index mirror: `metrics/artifact_index.json`.
118
  | Website integrity | `metrics/website_integrity.json` and validator script mirror | local links, anchors, JSON bundles, and referenced images only |
119
  | Quality gates | `QUALITY_GATES.md`, `metrics/quality_gates.json`, and `scripts/build_quality_gates.py` | automated release gates plus live post-publish checks |
120
  | Live publication | `metrics/live_publication_status.json`, `scripts/verify_live_publication.py` | last public GitHub/HF URL verification after upload |
 
121
  | Artifact index | `metrics/artifact_index.json` and `scripts/build_artifact_index.py` | compact catalog of the reviewer-critical proof artifacts |
122
  | Artifact guide | `ARTIFACT_GUIDE.md` | human-readable map of proof boundary, task evidence, mirrors, and scale-up status |
123
  | Reproducibility | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json` | public commands, expected outputs, exact-match audit evidence, and non-reproducible boundaries |
@@ -149,6 +158,7 @@ transfers them to H20 for manifest building, training, and evaluation.
149
  | `assets/task_architectures.png` | shows the shared pipeline and all 12 heads |
150
  | `assets/task_suite_infographic.png` | presents the shared processing contract, 12 heads, verified metrics, and public-sample modality thumbnails |
151
  | `assets/modalities/`, `metrics/modality_atlas.json` | responsive modality-card thumbnails and metadata for sample inspection |
 
152
  | `metrics/artifact_index.json` | indexes proof artifacts with existence, size, and stable-file hashes |
153
  | `metrics/mirror_parity.json` | verifies prepared repo/HF mirrors have matching critical data, figures, website HTML, and validator files before upload |
154
  | `metrics/scope_claims_audit.json` | verifies historical `32ep` smoke-run identifiers are not presented as real 32-episode results |
 
73
  `metrics/modality_atlas.json`, with standalone derived thumbnails in
74
  `assets/modalities/`.
75
 
76
+ The model repo also mirrors the official-source alignment artifact at
77
+ `metrics/xperience10m_dataset_card_alignment.json` plus
78
+ `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`. That file records the official
79
+ `ropedia-ai/xperience-10m` card scope, gated access, full-scale modalities,
80
+ episode layout, intended uses, and the claims this small baseline repo does
81
+ not make.
82
+
83
  The committed heads are intentionally small:
84
 
85
  - z-score + linear softmax classifiers,
 
105
  | 5 | What is still pending? | companion GitHub `results/omni_finetune/DATA_BLOCKER_REPORT.md` and `A100_HF_RELAY_STATUS.md` |
106
 
107
  Human-readable artifact guide mirror: `ARTIFACT_GUIDE.md`.
108
+ Official dataset-card alignment mirror: `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` and `metrics/xperience10m_dataset_card_alignment.json`.
109
  Publication quality gates mirror: `QUALITY_GATES.md` and `metrics/quality_gates.json`.
110
  Live publication status mirror: `metrics/live_publication_status.json`.
111
  Machine-readable reviewer packet mirror: `metrics/reviewer_packet.json`.
 
126
  | Website integrity | `metrics/website_integrity.json` and validator script mirror | local links, anchors, JSON bundles, and referenced images only |
127
  | Quality gates | `QUALITY_GATES.md`, `metrics/quality_gates.json`, and `scripts/build_quality_gates.py` | automated release gates plus live post-publish checks |
128
  | Live publication | `metrics/live_publication_status.json`, `scripts/verify_live_publication.py` | last public GitHub/HF URL verification after upload |
129
+ | Official dataset card alignment | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `metrics/xperience10m_dataset_card_alignment.json` | official source scope, gated access, modality coverage, scale, and this repo's single-episode boundary |
130
  | Artifact index | `metrics/artifact_index.json` and `scripts/build_artifact_index.py` | compact catalog of the reviewer-critical proof artifacts |
131
  | Artifact guide | `ARTIFACT_GUIDE.md` | human-readable map of proof boundary, task evidence, mirrors, and scale-up status |
132
  | Reproducibility | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json` | public commands, expected outputs, exact-match audit evidence, and non-reproducible boundaries |
 
158
  | `assets/task_architectures.png` | shows the shared pipeline and all 12 heads |
159
  | `assets/task_suite_infographic.png` | presents the shared processing contract, 12 heads, verified metrics, and public-sample modality thumbnails |
160
  | `assets/modalities/`, `metrics/modality_atlas.json` | responsive modality-card thumbnails and metadata for sample inspection |
161
+ | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `metrics/xperience10m_dataset_card_alignment.json` | aligns public wording with the official gated Xperience-10M dataset card |
162
  | `metrics/artifact_index.json` | indexes proof artifacts with existence, size, and stable-file hashes |
163
  | `metrics/mirror_parity.json` | verifies prepared repo/HF mirrors have matching critical data, figures, website HTML, and validator files before upload |
164
  | `metrics/scope_claims_audit.json` | verifies historical `32ep` smoke-run identifiers are not presented as real 32-episode results |
XPERIENCE10M_DATASET_CARD_ALIGNMENT.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Xperience-10M Official Dataset Card Alignment
2
+
3
+ This file records the public description of the official
4
+ [`ropedia-ai/xperience-10m`](https://huggingface.co/datasets/ropedia-ai/xperience-10m)
5
+ dataset card and how this repo uses only one public sample episode from that
6
+ larger source. It is a description-alignment artifact, not a raw-data mirror.
7
+
8
+ Checked on: 2026-06-01.
9
+
10
+ ## Official Dataset Scope
11
+
12
+ The official Xperience-10M dataset is described by Ropedia as a large-scale
13
+ egocentric multimodal dataset for embodied AI, robotics, world models, and
14
+ spatial intelligence. The dataset card frames it as human-experience data with
15
+ roughly 10 million interaction/experience units and about 10,000 hours of
16
+ synchronized first-person recording.
17
+
18
+ The official card metadata lists these task and modality categories:
19
+
20
+ - task categories: video classification, image-to-text, depth estimation, robotics
21
+ - modalities: 3D, audio, video
22
+ - language: English
23
+ - license field: `other`
24
+ - size category: `1M<n<10M`
25
+ - access: manually gated, reviewed access for approved non-commercial use
26
+
27
+ The current public Hugging Face API metadata reports the dataset repo as
28
+ `gated: manual` and notes that an external DocuSign agreement may be required
29
+ before approval.
30
+
31
+ ## Official Modalities
32
+
33
+ The official dataset card describes the full dataset as synchronized 4D
34
+ multimodal egocentric data spanning:
35
+
36
+ - six RGB video streams: four fisheye views and two rectified stereo views
37
+ - audio embedded in the video streams
38
+ - stereo depth and depth confidence
39
+ - camera pose, SLAM trajectory, and point-cloud information
40
+ - two-hand motion capture, including hand joints and MANO-related data
41
+ - full-body motion capture, keypoints, contacts, and body orientation data
42
+ - inertial sensing from accelerometer and gyroscope streams
43
+ - hierarchical language/caption annotations
44
+ - metadata and calibration records
45
+
46
+ ## Official Scale Statistics
47
+
48
+ The official dataset card describes Xperience-10M at full scale with these
49
+ headline counts:
50
+
51
+ | Quantity | Official-card scale |
52
+ | --- | --- |
53
+ | Human experience / interaction units | about 10 million |
54
+ | Recording duration | about 10,000 hours |
55
+ | RGB frames | about 2.88 billion |
56
+ | Depth frames | about 720 million |
57
+ | Camera-pose records | about 576 million |
58
+ | Motion-capture frames | about 576 million |
59
+ | IMU records | about 7.2 billion |
60
+ | Caption sentences | about 16 million |
61
+ | Caption words | about 200 million |
62
+ | Vocabulary size | about 6,000 words |
63
+ | Object annotations | about 350,000 objects |
64
+ | Trajectory distance | about 39,000 km |
65
+ | Total storage described by the card | about 1 PB |
66
+
67
+ The public Hugging Face page may show a smaller currently listed file-size
68
+ summary for the gated repo. This project keeps those concepts separate: the
69
+ official card scale describes the dataset design, while this repo validates
70
+ only the files that are actually available to the project.
71
+
72
+ ## Episode File Layout
73
+
74
+ The official gated file listing and the public sample use episode folders with
75
+ this practical layout:
76
+
77
+ ```text
78
+ <session_uuid>/
79
+ ep<episode_id>/
80
+ fisheye_cam0.mp4
81
+ fisheye_cam1.mp4
82
+ fisheye_cam2.mp4
83
+ fisheye_cam3.mp4
84
+ stereo_left.mp4
85
+ stereo_right.mp4
86
+ annotation.hdf5
87
+ visualization.rrd # optional viewer artifact; excluded from training downloads
88
+ ```
89
+
90
+ For this repo, a valid training/evaluation episode requires `annotation.hdf5`.
91
+ Full-omni mode prefers all six MP4 streams. Degraded mode may use
92
+ `fisheye_cam0.mp4` plus the annotation file, but must record missing views in
93
+ the manifest.
94
+
95
+ ## Annotation File Content
96
+
97
+ The official card describes the HDF5 annotation file as carrying aligned
98
+ multimodal records. The relevant groups include:
99
+
100
+ - calibration: camera intrinsics/extrinsics for fisheye and stereo cameras
101
+ - SLAM/camera pose: quaternions, translations, frame names, and point cloud
102
+ - depth: depth map, confidence, scale, min/max, and validity metadata
103
+ - hand motion capture: left/right hand joints, translations, and MANO-related records
104
+ - full-body motion capture: body keypoints, contacts, transforms, and body rotations
105
+ - IMU: timestamps, accelerometer, gyroscope, and keyframe metadata
106
+ - video timing: timestamps, frame numbers, and video duration
107
+ - language/caption annotations and metadata
108
+
109
+ This repo's current 8,378-d feature vector uses video-derived statistics,
110
+ depth, pose/SLAM, calibration, mocap, IMU, and language-derived blocks. Audio
111
+ is documented and visualized, but it is not yet extracted into the current
112
+ baseline feature vector.
113
+
114
+ ## Intended Research Uses
115
+
116
+ The official dataset card supports research directions such as:
117
+
118
+ - egocentric video/action understanding
119
+ - task and subtask recognition
120
+ - temporal action localization and human-object interaction analysis
121
+ - object grounding and caption/language grounding
122
+ - audio-visual learning and multimodal pretraining
123
+ - embodied reasoning, world-model learning, and robotics imitation learning
124
+ - depth estimation, visual odometry, camera trajectory, SLAM, and scene reconstruction
125
+ - hand/body pose, human motion understanding, and sensor fusion
126
+
127
+ This repo currently implements a single-episode audit suite that starts several
128
+ of those directions, but it does not solve the full official task list. The 12
129
+ current tasks cover action/subtask labels, next-action prediction, transition
130
+ and temporal diagnostics, hand trajectory forecasting, contact prediction,
131
+ object relevance, caption grounding, cross-modal retrieval, modality
132
+ reconstruction, and misalignment detection. Missing or only-proxy coverage
133
+ includes real audio-visual modeling, full caption generation, depth-pixel
134
+ estimation, full SLAM estimation, neural rendering, policy learning, and
135
+ cross-episode generalization.
136
+
137
+ ## Responsible-Use Boundary
138
+
139
+ The official dataset is gated and intended for approved non-commercial research
140
+ use. This repo therefore does not redistribute raw MP4 files, raw
141
+ `annotation.hdf5`, private gated data, raw `visualization.rrd`, or any full
142
+ Qwen weights. Public assets here are derived metrics, small thumbnails,
143
+ manifests, scripts, charts, and lightweight baseline artifacts.
144
+
145
+ The official card also makes clear that the data is not meant for identity
146
+ recognition, re-identification, biometric profiling, surveillance, sensitive
147
+ attribute inference, or safety-critical deployment without appropriate
148
+ safeguards.
149
+
150
+ ## Limitations To Preserve In This Project
151
+
152
+ When describing Xperience-10M in this repo, keep these limitations visible:
153
+
154
+ - one public sample episode cannot prove cross-environment generalization
155
+ - full-dataset claims require gated access, many episodes, and held-out episode splits
156
+ - motion capture, SLAM, depth, captions, and other annotations can contain noise
157
+ - language annotations are not exhaustive descriptions of every scene state
158
+ - large-scale training requires substantial storage, preprocessing, and compute
159
+ - the current feature vector does not include an extracted audio feature block
160
+
161
+ ## Current Project Alignment
162
+
163
+ | Official dataset card concept | Current repo status |
164
+ | --- | --- |
165
+ | Full Xperience-10M is large, gated, and multi-episode | Acknowledged; not redistributed |
166
+ | Public sample includes video/audio/depth/pose/mocap/IMU/language | Represented in the modality atlas |
167
+ | Episode layout uses six MP4 streams and `annotation.hdf5` | Used by sample inspection and pilot-readiness scripts |
168
+ | Audio exists in MP4 streams | Documented and visualized, not featurized |
169
+ | 4D reconstruction/world modeling are intended research directions | Represented by proxy/diagnostic tasks only |
170
+ | Real model quality requires held-out multi-episode evaluation | Not claimed yet; 32-episode pilot remains gated |
metrics/artifact_index.json CHANGED
@@ -1,12 +1,13 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-01T07:34:10+00:00",
4
  "status": "pass",
5
- "artifact_count": 33,
6
  "missing": [],
7
  "by_kind": {
8
  "claim_boundary": 1,
9
  "review_path": 3,
 
10
  "quality_gate": 4,
11
  "reproducibility": 2,
12
  "hygiene_report": 1,
@@ -36,8 +37,8 @@
36
  "surface": "repo",
37
  "proves": "Defines what is verified, what is smoke-only, and what must not be inferred.",
38
  "exists": true,
39
- "bytes": 7046,
40
- "sha256": "fd4d09938147487f9c3e713c6ced07b3e6103426f3ccc58266047365bf4ed1ea"
41
  },
42
  {
43
  "id": "reviewer_packet",
@@ -47,8 +48,8 @@
47
  "surface": "website_hf",
48
  "proves": "Gives a short audit path with scope status and public surfaces.",
49
  "exists": true,
50
- "bytes": 4406,
51
- "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
52
  },
53
  {
54
  "id": "artifact_guide",
@@ -58,8 +59,30 @@
58
  "surface": "repo_hf",
59
  "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
60
  "exists": true,
61
- "bytes": 6943,
62
- "sha256": "81204b332da6bd1c3ebec603990eeacbec984534499df59463cad9aa6ab7841f"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  },
64
  {
65
  "id": "quality_gates",
@@ -81,7 +104,7 @@
81
  "proves": "Machine-readable release-gate summary for validators, mirrors, and reviewer surfaces.",
82
  "exists": true,
83
  "bytes": 4228,
84
- "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
85
  },
86
  {
87
  "id": "live_publication_status",
@@ -103,8 +126,8 @@
103
  "surface": "repo",
104
  "proves": "Fetches the published GitHub/HF URLs and compares live hashes and public-card markers against the release assets.",
105
  "exists": true,
106
- "bytes": 10587,
107
- "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
108
  },
109
  {
110
  "id": "reproducibility_contract",
@@ -136,8 +159,8 @@
136
  "surface": "repo_hf",
137
  "proves": "Generates the selective proof-artifact catalog from local files.",
138
  "exists": true,
139
- "bytes": 12875,
140
- "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
141
  },
142
  {
143
  "id": "publication_audit",
@@ -148,7 +171,7 @@
148
  "volatile": true,
149
  "proves": "Confirms public bundles pass raw-data, cache, archive, and token-string checks.",
150
  "exists": true,
151
- "bytes": 5508,
152
  "hash_policy": "existence_and_size_only"
153
  },
154
  {
@@ -172,7 +195,7 @@
172
  "volatile": true,
173
  "proves": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
174
  "exists": true,
175
- "bytes": 48916,
176
  "hash_policy": "existence_and_size_only"
177
  },
178
  {
@@ -184,7 +207,7 @@
184
  "volatile": true,
185
  "proves": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
186
  "exists": true,
187
- "bytes": 6159,
188
  "hash_policy": "existence_and_size_only"
189
  },
190
  {
@@ -195,8 +218,8 @@
195
  "surface": "website_hf",
196
  "proves": "Lists public URLs, upstream sources, and machine-readable project metadata.",
197
  "exists": true,
198
- "bytes": 2789,
199
- "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
200
  },
201
  {
202
  "id": "task_summary",
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-01T08:04:57+00:00",
4
  "status": "pass",
5
+ "artifact_count": 35,
6
  "missing": [],
7
  "by_kind": {
8
  "claim_boundary": 1,
9
  "review_path": 3,
10
+ "source_alignment": 2,
11
  "quality_gate": 4,
12
  "reproducibility": 2,
13
  "hygiene_report": 1,
 
37
  "surface": "repo",
38
  "proves": "Defines what is verified, what is smoke-only, and what must not be inferred.",
39
  "exists": true,
40
+ "bytes": 7572,
41
+ "sha256": "1b4c78c3d92c8592dcc7532b94103743bfef2a36b025245968c79fd51fa5c42c"
42
  },
43
  {
44
  "id": "reviewer_packet",
 
48
  "surface": "website_hf",
49
  "proves": "Gives a short audit path with scope status and public surfaces.",
50
  "exists": true,
51
+ "bytes": 5044,
52
+ "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
53
  },
54
  {
55
  "id": "artifact_guide",
 
59
  "surface": "repo_hf",
60
  "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
61
  "exists": true,
62
+ "bytes": 7925,
63
+ "sha256": "79c81d9f5631df046892e020f979773ac0933381f17b6d2b9f3ff503d6c332b7"
64
+ },
65
+ {
66
+ "id": "official_dataset_card_alignment",
67
+ "title": "Official Xperience-10M dataset-card alignment",
68
+ "path": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
69
+ "kind": "source_alignment",
70
+ "surface": "repo_hf",
71
+ "proves": "Aligns public dataset wording with the official gated Xperience-10M dataset card and records unsupported areas.",
72
+ "exists": true,
73
+ "bytes": 7654,
74
+ "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
75
+ },
76
+ {
77
+ "id": "official_dataset_card_alignment_json",
78
+ "title": "Official Xperience-10M dataset-card alignment JSON",
79
+ "path": "docs/data/xperience10m_dataset_card_alignment.json",
80
+ "kind": "source_alignment",
81
+ "surface": "website_hf",
82
+ "proves": "Machine-readable upstream dataset-card alignment facts for website and HF mirrors.",
83
+ "exists": true,
84
+ "bytes": 5103,
85
+ "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
86
  },
87
  {
88
  "id": "quality_gates",
 
104
  "proves": "Machine-readable release-gate summary for validators, mirrors, and reviewer surfaces.",
105
  "exists": true,
106
  "bytes": 4228,
107
+ "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
108
  },
109
  {
110
  "id": "live_publication_status",
 
126
  "surface": "repo",
127
  "proves": "Fetches the published GitHub/HF URLs and compares live hashes and public-card markers against the release assets.",
128
  "exists": true,
129
+ "bytes": 11753,
130
+ "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
131
  },
132
  {
133
  "id": "reproducibility_contract",
 
159
  "surface": "repo_hf",
160
  "proves": "Generates the selective proof-artifact catalog from local files.",
161
  "exists": true,
162
+ "bytes": 13641,
163
+ "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
164
  },
165
  {
166
  "id": "publication_audit",
 
171
  "volatile": true,
172
  "proves": "Confirms public bundles pass raw-data, cache, archive, and token-string checks.",
173
  "exists": true,
174
+ "bytes": 5624,
175
  "hash_policy": "existence_and_size_only"
176
  },
177
  {
 
195
  "volatile": true,
196
  "proves": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
197
  "exists": true,
198
+ "bytes": 51819,
199
  "hash_policy": "existence_and_size_only"
200
  },
201
  {
 
207
  "volatile": true,
208
  "proves": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
209
  "exists": true,
210
+ "bytes": 6286,
211
  "hash_policy": "existence_and_size_only"
212
  },
213
  {
 
218
  "surface": "website_hf",
219
  "proves": "Lists public URLs, upstream sources, and machine-readable project metadata.",
220
  "exists": true,
221
+ "bytes": 3411,
222
+ "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
223
  },
224
  {
225
  "id": "task_summary",
metrics/evidence_contract.json CHANGED
@@ -2,6 +2,17 @@
2
  "project": "Ropedia Xperience-10M Task Suite",
3
  "scope": "single public Xperience-10M sample episode",
4
  "claims": [
 
 
 
 
 
 
 
 
 
 
 
5
  {
6
  "id": "aligned_windows",
7
  "claim": "The public Xperience-10M sample has been converted into aligned model windows.",
 
2
  "project": "Ropedia Xperience-10M Task Suite",
3
  "scope": "single public Xperience-10M sample episode",
4
  "claims": [
5
+ {
6
+ "id": "official_dataset_card_alignment",
7
+ "claim": "The public dataset description is aligned with the official gated Xperience-10M dataset card.",
8
+ "status": "verified",
9
+ "evidence": [
10
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
11
+ "docs/data/xperience10m_dataset_card_alignment.json",
12
+ "https://huggingface.co/datasets/ropedia-ai/xperience-10m"
13
+ ],
14
+ "boundary": "summarizes upstream public metadata and dataset-card facts; does not grant access or mirror raw data"
15
+ },
16
  {
17
  "id": "aligned_windows",
18
  "claim": "The public Xperience-10M sample has been converted into aligned model windows.",
metrics/mirror_parity.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-01T07:35:01+00:00",
4
  "hf_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish",
5
  "summary": {
6
- "group_count": 34,
7
  "failure_count": 0,
8
  "failures_by_surface": {}
9
  },
@@ -36,27 +36,27 @@
36
  "local": {
37
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/artifact_index.json",
38
  "exists": true,
39
- "bytes": 14654,
40
- "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
41
  },
42
  "mirrors": {
43
  "hf_space": {
44
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/artifact_index.json",
45
  "exists": true,
46
- "bytes": 14654,
47
- "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
48
  },
49
  "hf_artifacts": {
50
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/artifact_index.json",
51
  "exists": true,
52
- "bytes": 14654,
53
- "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
54
  },
55
  "hf_model": {
56
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/artifact_index.json",
57
  "exists": true,
58
- "bytes": 14654,
59
- "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
60
  }
61
  },
62
  "failures": []
@@ -67,27 +67,27 @@
67
  "local": {
68
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/evidence_contract.json",
69
  "exists": true,
70
- "bytes": 7954,
71
- "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
72
  },
73
  "mirrors": {
74
  "hf_space": {
75
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/evidence_contract.json",
76
  "exists": true,
77
- "bytes": 7954,
78
- "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
79
  },
80
  "hf_artifacts": {
81
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/evidence_contract.json",
82
  "exists": true,
83
- "bytes": 7954,
84
- "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
85
  },
86
  "hf_model": {
87
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/evidence_contract.json",
88
  "exists": true,
89
- "bytes": 7954,
90
- "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
91
  }
92
  },
93
  "failures": []
@@ -160,27 +160,27 @@
160
  "local": {
161
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/project_manifest.json",
162
  "exists": true,
163
- "bytes": 2789,
164
- "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
165
  },
166
  "mirrors": {
167
  "hf_space": {
168
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/project_manifest.json",
169
  "exists": true,
170
- "bytes": 2789,
171
- "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
172
  },
173
  "hf_artifacts": {
174
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/project_manifest.json",
175
  "exists": true,
176
- "bytes": 2789,
177
- "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
178
  },
179
  "hf_model": {
180
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/project_manifest.json",
181
  "exists": true,
182
- "bytes": 2789,
183
- "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
184
  }
185
  },
186
  "failures": []
@@ -191,27 +191,27 @@
191
  "local": {
192
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/publication_audit.json",
193
  "exists": true,
194
- "bytes": 5508,
195
- "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
196
  },
197
  "mirrors": {
198
  "hf_space": {
199
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/publication_audit.json",
200
  "exists": true,
201
- "bytes": 5508,
202
- "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
203
  },
204
  "hf_artifacts": {
205
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/publication_audit.json",
206
  "exists": true,
207
- "bytes": 5508,
208
- "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
209
  },
210
  "hf_model": {
211
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/publication_audit.json",
212
  "exists": true,
213
- "bytes": 5508,
214
- "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
215
  }
216
  },
217
  "failures": []
@@ -223,26 +223,26 @@
223
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/quality_gates.json",
224
  "exists": true,
225
  "bytes": 4228,
226
- "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
227
  },
228
  "mirrors": {
229
  "hf_space": {
230
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/quality_gates.json",
231
  "exists": true,
232
  "bytes": 4228,
233
- "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
234
  },
235
  "hf_artifacts": {
236
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/quality_gates.json",
237
  "exists": true,
238
  "bytes": 4228,
239
- "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
240
  },
241
  "hf_model": {
242
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/quality_gates.json",
243
  "exists": true,
244
  "bytes": 4228,
245
- "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
246
  }
247
  },
248
  "failures": []
@@ -346,27 +346,27 @@
346
  "local": {
347
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/reviewer_packet.json",
348
  "exists": true,
349
- "bytes": 4406,
350
- "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
351
  },
352
  "mirrors": {
353
  "hf_space": {
354
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/reviewer_packet.json",
355
  "exists": true,
356
- "bytes": 4406,
357
- "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
358
  },
359
  "hf_artifacts": {
360
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/reviewer_packet.json",
361
  "exists": true,
362
- "bytes": 4406,
363
- "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
364
  },
365
  "hf_model": {
366
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/reviewer_packet.json",
367
  "exists": true,
368
- "bytes": 4406,
369
- "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
370
  }
371
  },
372
  "failures": []
@@ -378,26 +378,26 @@
378
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/scope_claims_audit.json",
379
  "exists": true,
380
  "bytes": 19964,
381
- "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
382
  },
383
  "mirrors": {
384
  "hf_space": {
385
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/scope_claims_audit.json",
386
  "exists": true,
387
  "bytes": 19964,
388
- "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
389
  },
390
  "hf_artifacts": {
391
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/scope_claims_audit.json",
392
  "exists": true,
393
  "bytes": 19964,
394
- "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
395
  },
396
  "hf_model": {
397
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/scope_claims_audit.json",
398
  "exists": true,
399
  "bytes": 19964,
400
- "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
401
  }
402
  },
403
  "failures": []
@@ -470,27 +470,58 @@
470
  "local": {
471
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/website_integrity.json",
472
  "exists": true,
473
- "bytes": 6159,
474
- "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
475
  },
476
  "mirrors": {
477
  "hf_space": {
478
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/website_integrity.json",
479
  "exists": true,
480
- "bytes": 6159,
481
- "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
482
  },
483
  "hf_artifacts": {
484
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/website_integrity.json",
485
  "exists": true,
486
- "bytes": 6159,
487
- "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
488
  },
489
  "hf_model": {
490
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/website_integrity.json",
491
  "exists": true,
492
- "bytes": 6159,
493
- "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
494
  }
495
  },
496
  "failures": []
@@ -871,21 +902,21 @@
871
  "local": {
872
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/build_artifact_index.py",
873
  "exists": true,
874
- "bytes": 12875,
875
- "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
876
  },
877
  "mirrors": {
878
  "hf_artifacts": {
879
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/build_artifact_index.py",
880
  "exists": true,
881
- "bytes": 12875,
882
- "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
883
  },
884
  "hf_model": {
885
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/build_artifact_index.py",
886
  "exists": true,
887
- "bytes": 12875,
888
- "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
889
  }
890
  },
891
  "failures": []
@@ -921,21 +952,21 @@
921
  "local": {
922
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/verify_live_publication.py",
923
  "exists": true,
924
- "bytes": 10587,
925
- "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
926
  },
927
  "mirrors": {
928
  "hf_artifacts": {
929
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/verify_live_publication.py",
930
  "exists": true,
931
- "bytes": 10587,
932
- "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
933
  },
934
  "hf_model": {
935
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/verify_live_publication.py",
936
  "exists": true,
937
- "bytes": 10587,
938
- "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
939
  }
940
  },
941
  "failures": []
@@ -946,21 +977,21 @@
946
  "local": {
947
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_mirror_parity.py",
948
  "exists": true,
949
- "bytes": 8423,
950
- "sha256": "213f46788af2f22763ba2a998b23dc8db17596c148196654c45ae287a58a330f"
951
  },
952
  "mirrors": {
953
  "hf_artifacts": {
954
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_mirror_parity.py",
955
  "exists": true,
956
- "bytes": 8423,
957
- "sha256": "213f46788af2f22763ba2a998b23dc8db17596c148196654c45ae287a58a330f"
958
  },
959
  "hf_model": {
960
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_mirror_parity.py",
961
  "exists": true,
962
- "bytes": 8423,
963
- "sha256": "213f46788af2f22763ba2a998b23dc8db17596c148196654c45ae287a58a330f"
964
  }
965
  },
966
  "failures": []
@@ -971,21 +1002,21 @@
971
  "local": {
972
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_publication_package.py",
973
  "exists": true,
974
- "bytes": 12630,
975
- "sha256": "c7ca01135dfc6414b3accc42ec833905feada6e8e82f65b6e3bf855657dba5d9"
976
  },
977
  "mirrors": {
978
  "hf_artifacts": {
979
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_publication_package.py",
980
  "exists": true,
981
- "bytes": 12630,
982
- "sha256": "c7ca01135dfc6414b3accc42ec833905feada6e8e82f65b6e3bf855657dba5d9"
983
  },
984
  "hf_model": {
985
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_publication_package.py",
986
  "exists": true,
987
- "bytes": 12630,
988
- "sha256": "c7ca01135dfc6414b3accc42ec833905feada6e8e82f65b6e3bf855657dba5d9"
989
  }
990
  },
991
  "failures": []
@@ -1046,21 +1077,21 @@
1046
  "local": {
1047
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/index.html",
1048
  "exists": true,
1049
- "bytes": 91007,
1050
- "sha256": "4e1a1fd3d4b3de962adbd2e2b1b3c6fe5771c77a65a478035d4ca33e7d999263"
1051
  },
1052
  "mirrors": {
1053
  "hf_space": {
1054
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/index.html",
1055
  "exists": true,
1056
- "bytes": 91007,
1057
- "sha256": "4e1a1fd3d4b3de962adbd2e2b1b3c6fe5771c77a65a478035d4ca33e7d999263"
1058
  },
1059
  "hf_artifacts_docs": {
1060
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/index.html",
1061
  "exists": true,
1062
- "bytes": 91007,
1063
- "sha256": "4e1a1fd3d4b3de962adbd2e2b1b3c6fe5771c77a65a478035d4ca33e7d999263"
1064
  }
1065
  },
1066
  "failures": []
@@ -1095,6 +1126,37 @@
1095
  }
1096
  },
1097
  "failures": []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1098
  }
1099
  ],
1100
  "failures": []
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-01T08:04:40+00:00",
4
  "hf_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish",
5
  "summary": {
6
+ "group_count": 36,
7
  "failure_count": 0,
8
  "failures_by_surface": {}
9
  },
 
36
  "local": {
37
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/artifact_index.json",
38
  "exists": true,
39
+ "bytes": 15675,
40
+ "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
41
  },
42
  "mirrors": {
43
  "hf_space": {
44
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/artifact_index.json",
45
  "exists": true,
46
+ "bytes": 15675,
47
+ "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
48
  },
49
  "hf_artifacts": {
50
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/artifact_index.json",
51
  "exists": true,
52
+ "bytes": 15675,
53
+ "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
54
  },
55
  "hf_model": {
56
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/artifact_index.json",
57
  "exists": true,
58
+ "bytes": 15675,
59
+ "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
60
  }
61
  },
62
  "failures": []
 
67
  "local": {
68
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/evidence_contract.json",
69
  "exists": true,
70
+ "bytes": 8483,
71
+ "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
72
  },
73
  "mirrors": {
74
  "hf_space": {
75
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/evidence_contract.json",
76
  "exists": true,
77
+ "bytes": 8483,
78
+ "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
79
  },
80
  "hf_artifacts": {
81
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/evidence_contract.json",
82
  "exists": true,
83
+ "bytes": 8483,
84
+ "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
85
  },
86
  "hf_model": {
87
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/evidence_contract.json",
88
  "exists": true,
89
+ "bytes": 8483,
90
+ "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
91
  }
92
  },
93
  "failures": []
 
160
  "local": {
161
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/project_manifest.json",
162
  "exists": true,
163
+ "bytes": 3411,
164
+ "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
165
  },
166
  "mirrors": {
167
  "hf_space": {
168
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/project_manifest.json",
169
  "exists": true,
170
+ "bytes": 3411,
171
+ "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
172
  },
173
  "hf_artifacts": {
174
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/project_manifest.json",
175
  "exists": true,
176
+ "bytes": 3411,
177
+ "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
178
  },
179
  "hf_model": {
180
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/project_manifest.json",
181
  "exists": true,
182
+ "bytes": 3411,
183
+ "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
184
  }
185
  },
186
  "failures": []
 
191
  "local": {
192
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/publication_audit.json",
193
  "exists": true,
194
+ "bytes": 5624,
195
+ "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
196
  },
197
  "mirrors": {
198
  "hf_space": {
199
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/publication_audit.json",
200
  "exists": true,
201
+ "bytes": 5624,
202
+ "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
203
  },
204
  "hf_artifacts": {
205
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/publication_audit.json",
206
  "exists": true,
207
+ "bytes": 5624,
208
+ "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
209
  },
210
  "hf_model": {
211
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/publication_audit.json",
212
  "exists": true,
213
+ "bytes": 5624,
214
+ "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
215
  }
216
  },
217
  "failures": []
 
223
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/quality_gates.json",
224
  "exists": true,
225
  "bytes": 4228,
226
+ "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
227
  },
228
  "mirrors": {
229
  "hf_space": {
230
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/quality_gates.json",
231
  "exists": true,
232
  "bytes": 4228,
233
+ "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
234
  },
235
  "hf_artifacts": {
236
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/quality_gates.json",
237
  "exists": true,
238
  "bytes": 4228,
239
+ "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
240
  },
241
  "hf_model": {
242
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/quality_gates.json",
243
  "exists": true,
244
  "bytes": 4228,
245
+ "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
246
  }
247
  },
248
  "failures": []
 
346
  "local": {
347
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/reviewer_packet.json",
348
  "exists": true,
349
+ "bytes": 5044,
350
+ "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
351
  },
352
  "mirrors": {
353
  "hf_space": {
354
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/reviewer_packet.json",
355
  "exists": true,
356
+ "bytes": 5044,
357
+ "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
358
  },
359
  "hf_artifacts": {
360
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/reviewer_packet.json",
361
  "exists": true,
362
+ "bytes": 5044,
363
+ "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
364
  },
365
  "hf_model": {
366
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/reviewer_packet.json",
367
  "exists": true,
368
+ "bytes": 5044,
369
+ "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
370
  }
371
  },
372
  "failures": []
 
378
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/scope_claims_audit.json",
379
  "exists": true,
380
  "bytes": 19964,
381
+ "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
382
  },
383
  "mirrors": {
384
  "hf_space": {
385
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/scope_claims_audit.json",
386
  "exists": true,
387
  "bytes": 19964,
388
+ "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
389
  },
390
  "hf_artifacts": {
391
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/scope_claims_audit.json",
392
  "exists": true,
393
  "bytes": 19964,
394
+ "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
395
  },
396
  "hf_model": {
397
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/scope_claims_audit.json",
398
  "exists": true,
399
  "bytes": 19964,
400
+ "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
401
  }
402
  },
403
  "failures": []
 
470
  "local": {
471
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/website_integrity.json",
472
  "exists": true,
473
+ "bytes": 6286,
474
+ "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
475
  },
476
  "mirrors": {
477
  "hf_space": {
478
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/website_integrity.json",
479
  "exists": true,
480
+ "bytes": 6286,
481
+ "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
482
  },
483
  "hf_artifacts": {
484
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/website_integrity.json",
485
  "exists": true,
486
+ "bytes": 6286,
487
+ "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
488
  },
489
  "hf_model": {
490
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/website_integrity.json",
491
  "exists": true,
492
+ "bytes": 6286,
493
+ "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
494
+ }
495
+ },
496
+ "failures": []
497
+ },
498
+ {
499
+ "name": "data/xperience10m_dataset_card_alignment.json",
500
+ "status": "pass",
501
+ "local": {
502
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/xperience10m_dataset_card_alignment.json",
503
+ "exists": true,
504
+ "bytes": 5103,
505
+ "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
506
+ },
507
+ "mirrors": {
508
+ "hf_space": {
509
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/xperience10m_dataset_card_alignment.json",
510
+ "exists": true,
511
+ "bytes": 5103,
512
+ "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
513
+ },
514
+ "hf_artifacts": {
515
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/xperience10m_dataset_card_alignment.json",
516
+ "exists": true,
517
+ "bytes": 5103,
518
+ "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
519
+ },
520
+ "hf_model": {
521
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/xperience10m_dataset_card_alignment.json",
522
+ "exists": true,
523
+ "bytes": 5103,
524
+ "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
525
  }
526
  },
527
  "failures": []
 
902
  "local": {
903
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/build_artifact_index.py",
904
  "exists": true,
905
+ "bytes": 13641,
906
+ "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
907
  },
908
  "mirrors": {
909
  "hf_artifacts": {
910
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/build_artifact_index.py",
911
  "exists": true,
912
+ "bytes": 13641,
913
+ "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
914
  },
915
  "hf_model": {
916
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/build_artifact_index.py",
917
  "exists": true,
918
+ "bytes": 13641,
919
+ "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
920
  }
921
  },
922
  "failures": []
 
952
  "local": {
953
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/verify_live_publication.py",
954
  "exists": true,
955
+ "bytes": 11753,
956
+ "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
957
  },
958
  "mirrors": {
959
  "hf_artifacts": {
960
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/verify_live_publication.py",
961
  "exists": true,
962
+ "bytes": 11753,
963
+ "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
964
  },
965
  "hf_model": {
966
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/verify_live_publication.py",
967
  "exists": true,
968
+ "bytes": 11753,
969
+ "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
970
  }
971
  },
972
  "failures": []
 
977
  "local": {
978
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_mirror_parity.py",
979
  "exists": true,
980
+ "bytes": 8517,
981
+ "sha256": "990b2d29ae7623a0f184c9fba8560604aef0d6311617a54cb0de94bd4fd48305"
982
  },
983
  "mirrors": {
984
  "hf_artifacts": {
985
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_mirror_parity.py",
986
  "exists": true,
987
+ "bytes": 8517,
988
+ "sha256": "990b2d29ae7623a0f184c9fba8560604aef0d6311617a54cb0de94bd4fd48305"
989
  },
990
  "hf_model": {
991
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_mirror_parity.py",
992
  "exists": true,
993
+ "bytes": 8517,
994
+ "sha256": "990b2d29ae7623a0f184c9fba8560604aef0d6311617a54cb0de94bd4fd48305"
995
  }
996
  },
997
  "failures": []
 
1002
  "local": {
1003
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_publication_package.py",
1004
  "exists": true,
1005
+ "bytes": 13018,
1006
+ "sha256": "730bd76ebadf907045fb713f549dd132c80b292a4ca6bcf52dee1bac4748cbd6"
1007
  },
1008
  "mirrors": {
1009
  "hf_artifacts": {
1010
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_publication_package.py",
1011
  "exists": true,
1012
+ "bytes": 13018,
1013
+ "sha256": "730bd76ebadf907045fb713f549dd132c80b292a4ca6bcf52dee1bac4748cbd6"
1014
  },
1015
  "hf_model": {
1016
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_publication_package.py",
1017
  "exists": true,
1018
+ "bytes": 13018,
1019
+ "sha256": "730bd76ebadf907045fb713f549dd132c80b292a4ca6bcf52dee1bac4748cbd6"
1020
  }
1021
  },
1022
  "failures": []
 
1077
  "local": {
1078
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/index.html",
1079
  "exists": true,
1080
+ "bytes": 94421,
1081
+ "sha256": "a8ff86e5b6f5898ffa807255986eb430d238da6a3bdbc1915c438a1d38d9dc82"
1082
  },
1083
  "mirrors": {
1084
  "hf_space": {
1085
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/index.html",
1086
  "exists": true,
1087
+ "bytes": 94421,
1088
+ "sha256": "a8ff86e5b6f5898ffa807255986eb430d238da6a3bdbc1915c438a1d38d9dc82"
1089
  },
1090
  "hf_artifacts_docs": {
1091
  "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/index.html",
1092
  "exists": true,
1093
+ "bytes": 94421,
1094
+ "sha256": "a8ff86e5b6f5898ffa807255986eb430d238da6a3bdbc1915c438a1d38d9dc82"
1095
  }
1096
  },
1097
  "failures": []
 
1126
  }
1127
  },
1128
  "failures": []
1129
+ },
1130
+ {
1131
+ "name": "docs/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
1132
+ "status": "pass",
1133
+ "local": {
1134
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
1135
+ "exists": true,
1136
+ "bytes": 7654,
1137
+ "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
1138
+ },
1139
+ "mirrors": {
1140
+ "hf_space": {
1141
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
1142
+ "exists": true,
1143
+ "bytes": 7654,
1144
+ "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
1145
+ },
1146
+ "hf_artifacts": {
1147
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
1148
+ "exists": true,
1149
+ "bytes": 7654,
1150
+ "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
1151
+ },
1152
+ "hf_model": {
1153
+ "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
1154
+ "exists": true,
1155
+ "bytes": 7654,
1156
+ "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
1157
+ }
1158
+ },
1159
+ "failures": []
1160
  }
1161
  ],
1162
  "failures": []
metrics/project_manifest.json CHANGED
@@ -30,8 +30,18 @@
30
  "xperience10m_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
31
  "xperience10m_sample_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample"
32
  },
 
 
 
 
 
 
 
 
33
  "evidence_files": {
34
  "artifact_guide": "ARTIFACT_GUIDE.md",
 
 
35
  "reproducibility_contract": "REPRODUCIBILITY.md",
36
  "reproducibility_matrix": "docs/data/reproducibility_matrix.json",
37
  "evidence_contract": "docs/data/evidence_contract.json",
 
30
  "xperience10m_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
31
  "xperience10m_sample_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample"
32
  },
33
+ "upstream_dataset_card_alignment": {
34
+ "source_repo": "ropedia-ai/xperience-10m",
35
+ "source_url": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
36
+ "observed_last_modified": "2026-04-21T05:03:45.000Z",
37
+ "observed_access": "manual gated access for approved non-commercial use",
38
+ "alignment_doc": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
39
+ "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json"
40
+ },
41
  "evidence_files": {
42
  "artifact_guide": "ARTIFACT_GUIDE.md",
43
+ "official_dataset_card_alignment": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
44
+ "official_dataset_card_alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
45
  "reproducibility_contract": "REPRODUCIBILITY.md",
46
  "reproducibility_matrix": "docs/data/reproducibility_matrix.json",
47
  "evidence_contract": "docs/data/evidence_contract.json",
metrics/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-01T07:33:21+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -45,6 +45,7 @@
45
  "codemeta.json": true,
46
  "ARTIFACT_GUIDE.md": true,
47
  "QUALITY_GATES.md": true,
 
48
  "REPRODUCIBILITY.md": true,
49
  "EVIDENCE_CONTRACT.md": true,
50
  "DATA_NOTICE.md": true,
@@ -59,6 +60,7 @@
59
  "docs/data/quality_gates.json": true,
60
  "docs/data/project_manifest.json": true,
61
  "docs/data/reviewer_packet.json": true,
 
62
  "docs/data/reproducibility_matrix.json": true,
63
  "docs/data/modality_atlas.json": true,
64
  "docs/data/mirror_parity.json": true,
@@ -95,7 +97,7 @@
95
  "surface": "github_repo",
96
  "path": "README.md",
97
  "exists": true,
98
- "required_marker_count": 3,
99
  "missing_markers": [],
100
  "status": "pass"
101
  },
@@ -103,7 +105,7 @@
103
  "surface": "hf_space_bundle",
104
  "path": "README.md",
105
  "exists": true,
106
- "required_marker_count": 4,
107
  "missing_markers": [],
108
  "status": "pass"
109
  },
@@ -111,7 +113,7 @@
111
  "surface": "hf_artifact_bundle",
112
  "path": "README.md",
113
  "exists": true,
114
- "required_marker_count": 3,
115
  "missing_markers": [],
116
  "status": "pass"
117
  },
@@ -119,7 +121,7 @@
119
  "surface": "hf_artifact_bundle",
120
  "path": "PROJECT_README.md",
121
  "exists": true,
122
- "required_marker_count": 3,
123
  "missing_markers": [],
124
  "status": "pass"
125
  },
@@ -127,7 +129,7 @@
127
  "surface": "hf_model_bundle",
128
  "path": "README.md",
129
  "exists": true,
130
- "required_marker_count": 4,
131
  "missing_markers": [],
132
  "status": "pass"
133
  }
@@ -136,8 +138,8 @@
136
  "github_repo": {
137
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy",
138
  "exists": true,
139
- "file_count": 291,
140
- "text_file_count": 236,
141
  "largest_file": {
142
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
143
  "bytes": 52601010
@@ -147,8 +149,8 @@
147
  "hf_space_bundle": {
148
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space",
149
  "exists": true,
150
- "file_count": 54,
151
- "text_file_count": 41,
152
  "largest_file": {
153
  "path": "assets/task_suite_infographic.png",
154
  "bytes": 2600527
@@ -158,8 +160,8 @@
158
  "hf_artifact_bundle": {
159
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts",
160
  "exists": true,
161
- "file_count": 271,
162
- "text_file_count": 229,
163
  "largest_file": {
164
  "path": "results/episode_task_suite/neural_mlp/temporal_order/model.pt",
165
  "bytes": 13406129
@@ -169,8 +171,8 @@
169
  "hf_model_bundle": {
170
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model",
171
  "exists": true,
172
- "file_count": 203,
173
- "text_file_count": 160,
174
  "largest_file": {
175
  "path": "artifacts/episode_task_suite/cross_modal_retrieval/model.npz",
176
  "bytes": 41310574
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-01T08:04:40+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
45
  "codemeta.json": true,
46
  "ARTIFACT_GUIDE.md": true,
47
  "QUALITY_GATES.md": true,
48
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md": true,
49
  "REPRODUCIBILITY.md": true,
50
  "EVIDENCE_CONTRACT.md": true,
51
  "DATA_NOTICE.md": true,
 
60
  "docs/data/quality_gates.json": true,
61
  "docs/data/project_manifest.json": true,
62
  "docs/data/reviewer_packet.json": true,
63
+ "docs/data/xperience10m_dataset_card_alignment.json": true,
64
  "docs/data/reproducibility_matrix.json": true,
65
  "docs/data/modality_atlas.json": true,
66
  "docs/data/mirror_parity.json": true,
 
97
  "surface": "github_repo",
98
  "path": "README.md",
99
  "exists": true,
100
+ "required_marker_count": 4,
101
  "missing_markers": [],
102
  "status": "pass"
103
  },
 
105
  "surface": "hf_space_bundle",
106
  "path": "README.md",
107
  "exists": true,
108
+ "required_marker_count": 5,
109
  "missing_markers": [],
110
  "status": "pass"
111
  },
 
113
  "surface": "hf_artifact_bundle",
114
  "path": "README.md",
115
  "exists": true,
116
+ "required_marker_count": 4,
117
  "missing_markers": [],
118
  "status": "pass"
119
  },
 
121
  "surface": "hf_artifact_bundle",
122
  "path": "PROJECT_README.md",
123
  "exists": true,
124
+ "required_marker_count": 4,
125
  "missing_markers": [],
126
  "status": "pass"
127
  },
 
129
  "surface": "hf_model_bundle",
130
  "path": "README.md",
131
  "exists": true,
132
+ "required_marker_count": 5,
133
  "missing_markers": [],
134
  "status": "pass"
135
  }
 
138
  "github_repo": {
139
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy",
140
  "exists": true,
141
+ "file_count": 293,
142
+ "text_file_count": 238,
143
  "largest_file": {
144
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
145
  "bytes": 52601010
 
149
  "hf_space_bundle": {
150
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space",
151
  "exists": true,
152
+ "file_count": 56,
153
+ "text_file_count": 43,
154
  "largest_file": {
155
  "path": "assets/task_suite_infographic.png",
156
  "bytes": 2600527
 
160
  "hf_artifact_bundle": {
161
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts",
162
  "exists": true,
163
+ "file_count": 273,
164
+ "text_file_count": 231,
165
  "largest_file": {
166
  "path": "results/episode_task_suite/neural_mlp/temporal_order/model.pt",
167
  "bytes": 13406129
 
171
  "hf_model_bundle": {
172
  "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model",
173
  "exists": true,
174
+ "file_count": 205,
175
+ "text_file_count": 162,
176
  "largest_file": {
177
  "path": "artifacts/episode_task_suite/cross_modal_retrieval/model.npz",
178
  "bytes": 41310574
metrics/quality_gates.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Publication Quality Gates",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-01T06:49:16+00:00",
5
  "rule": "Do not present a release as current unless every automated gate passes, then verify live GitHub/HF mirrors after publishing.",
6
  "automated_gates": [
7
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Publication Quality Gates",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-01T08:03:23+00:00",
5
  "rule": "Do not present a release as current unless every automated gate passes, then verify live GitHub/HF mirrors after publishing.",
6
  "automated_gates": [
7
  {
metrics/reviewer_packet.json CHANGED
@@ -21,8 +21,10 @@
21
  "primary_artifacts": [
22
  "EVIDENCE_CONTRACT.md",
23
  "ARTIFACT_GUIDE.md",
 
24
  "docs/data/evidence_contract.json",
25
  "docs/data/artifact_index.json",
 
26
  "docs/data/mirror_parity.json",
27
  "docs/data/publication_audit.json",
28
  "docs/data/scope_claims_audit.json",
@@ -32,6 +34,16 @@
32
  },
33
  {
34
  "step": 2,
 
 
 
 
 
 
 
 
 
 
35
  "question": "How can the public pipeline be reproduced?",
36
  "primary_artifacts": [
37
  "REPRODUCIBILITY.md",
@@ -41,7 +53,7 @@
41
  "readout": "The public sample pipeline has explicit commands, expected outputs, and a prior exact-match audit over the committed metrics."
42
  },
43
  {
44
- "step": 3,
45
  "question": "What is inside one model input?",
46
  "primary_artifacts": [
47
  "results/episode_task_suite/windows.csv",
@@ -52,7 +64,7 @@
52
  "readout": "The current model input is an 8,378-dimensional aligned window vector with explicit feature-block boundaries, and the readable atlas shows each public-sample modality without raw data redistribution."
53
  },
54
  {
55
- "step": 4,
56
  "question": "Do the task metrics have committed evidence?",
57
  "primary_artifacts": [
58
  "results/episode_task_suite/summary_report.json",
@@ -62,7 +74,7 @@
62
  "readout": "Each of the 12 tasks has minimal-head metrics and a matching neural MLP result over the same window contracts."
63
  },
64
  {
65
- "step": 5,
66
  "question": "How should this scale beyond one episode?",
67
  "primary_artifacts": [
68
  "results/omni_finetune/DATA_BLOCKER_REPORT.md",
 
21
  "primary_artifacts": [
22
  "EVIDENCE_CONTRACT.md",
23
  "ARTIFACT_GUIDE.md",
24
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
25
  "docs/data/evidence_contract.json",
26
  "docs/data/artifact_index.json",
27
+ "docs/data/xperience10m_dataset_card_alignment.json",
28
  "docs/data/mirror_parity.json",
29
  "docs/data/publication_audit.json",
30
  "docs/data/scope_claims_audit.json",
 
34
  },
35
  {
36
  "step": 2,
37
+ "question": "What does the official Xperience-10M dataset card say?",
38
+ "primary_artifacts": [
39
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
40
+ "docs/data/xperience10m_dataset_card_alignment.json",
41
+ "https://huggingface.co/datasets/ropedia-ai/xperience-10m"
42
+ ],
43
+ "readout": "The full upstream dataset is a manually gated large-scale 4D multimodal egocentric source; this repo validates only one public sample episode and records unsupported areas explicitly."
44
+ },
45
+ {
46
+ "step": 3,
47
  "question": "How can the public pipeline be reproduced?",
48
  "primary_artifacts": [
49
  "REPRODUCIBILITY.md",
 
53
  "readout": "The public sample pipeline has explicit commands, expected outputs, and a prior exact-match audit over the committed metrics."
54
  },
55
  {
56
+ "step": 4,
57
  "question": "What is inside one model input?",
58
  "primary_artifacts": [
59
  "results/episode_task_suite/windows.csv",
 
64
  "readout": "The current model input is an 8,378-dimensional aligned window vector with explicit feature-block boundaries, and the readable atlas shows each public-sample modality without raw data redistribution."
65
  },
66
  {
67
+ "step": 5,
68
  "question": "Do the task metrics have committed evidence?",
69
  "primary_artifacts": [
70
  "results/episode_task_suite/summary_report.json",
 
74
  "readout": "Each of the 12 tasks has minimal-head metrics and a matching neural MLP result over the same window contracts."
75
  },
76
  {
77
+ "step": 6,
78
  "question": "How should this scale beyond one episode?",
79
  "primary_artifacts": [
80
  "results/omni_finetune/DATA_BLOCKER_REPORT.md",
metrics/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-01T07:19:55+00:00",
4
  "summary": {
5
  "qwen3_omni_32_episode_claim": false,
6
  "dataset_manifest_num_episodes": 1,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-01T07:59:42+00:00",
4
  "summary": {
5
  "qwen3_omni_32_episode_claim": false,
6
  "dataset_manifest_num_episodes": 1,
metrics/website_integrity.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-01T07:19:57+00:00",
4
  "docs_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
7
  "html_pages": 2,
8
- "local_references": 61,
9
- "external_reference_count": 58,
10
- "json_files": 16,
11
  "image_assets_referenced": 18,
12
  "failure_count": 0
13
  },
@@ -43,30 +43,30 @@
43
  },
44
  {
45
  "path": "index.html",
46
- "id_count": 31,
47
- "reference_count": 60,
48
  "image_count": 20
49
  }
50
  ],
51
  "json_files": [
52
  {
53
  "path": "data/artifact_index.json",
54
- "bytes": 14655,
55
  "top_level_type": "dict"
56
  },
57
  {
58
  "path": "data/evidence_contract.json",
59
- "bytes": 7954,
60
  "top_level_type": "dict"
61
  },
62
  {
63
  "path": "data/live_publication_status.json",
64
- "bytes": 13893,
65
  "top_level_type": "dict"
66
  },
67
  {
68
  "path": "data/mirror_parity.json",
69
- "bytes": 48916,
70
  "top_level_type": "dict"
71
  },
72
  {
@@ -76,7 +76,7 @@
76
  },
77
  {
78
  "path": "data/project_manifest.json",
79
- "bytes": 2789,
80
  "top_level_type": "dict"
81
  },
82
  {
@@ -106,7 +106,7 @@
106
  },
107
  {
108
  "path": "data/reviewer_packet.json",
109
- "bytes": 4406,
110
  "top_level_type": "dict"
111
  },
112
  {
@@ -128,6 +128,11 @@
128
  "path": "data/website_integrity.json",
129
  "bytes": 6159,
130
  "top_level_type": "dict"
 
 
 
 
 
131
  }
132
  ],
133
  "images": [
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-01T07:59:42+00:00",
4
  "docs_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
7
  "html_pages": 2,
8
+ "local_references": 67,
9
+ "external_reference_count": 63,
10
+ "json_files": 17,
11
  "image_assets_referenced": 18,
12
  "failure_count": 0
13
  },
 
43
  },
44
  {
45
  "path": "index.html",
46
+ "id_count": 32,
47
+ "reference_count": 66,
48
  "image_count": 20
49
  }
50
  ],
51
  "json_files": [
52
  {
53
  "path": "data/artifact_index.json",
54
+ "bytes": 14654,
55
  "top_level_type": "dict"
56
  },
57
  {
58
  "path": "data/evidence_contract.json",
59
+ "bytes": 8483,
60
  "top_level_type": "dict"
61
  },
62
  {
63
  "path": "data/live_publication_status.json",
64
+ "bytes": 9711,
65
  "top_level_type": "dict"
66
  },
67
  {
68
  "path": "data/mirror_parity.json",
69
+ "bytes": 48912,
70
  "top_level_type": "dict"
71
  },
72
  {
 
76
  },
77
  {
78
  "path": "data/project_manifest.json",
79
+ "bytes": 3411,
80
  "top_level_type": "dict"
81
  },
82
  {
 
106
  },
107
  {
108
  "path": "data/reviewer_packet.json",
109
+ "bytes": 5044,
110
  "top_level_type": "dict"
111
  },
112
  {
 
128
  "path": "data/website_integrity.json",
129
  "bytes": 6159,
130
  "top_level_type": "dict"
131
+ },
132
+ {
133
+ "path": "data/xperience10m_dataset_card_alignment.json",
134
+ "bytes": 5103,
135
+ "top_level_type": "dict"
136
  }
137
  ],
138
  "images": [
metrics/xperience10m_dataset_card_alignment.json ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "title": "Xperience-10M Official Dataset Card Alignment",
3
+ "checked_at_utc": "2026-06-01T00:00:00+00:00",
4
+ "source_urls": {
5
+ "official_hf_dataset": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
6
+ "official_hf_api": "https://huggingface.co/api/datasets/ropedia-ai/xperience-10m",
7
+ "official_sample": "https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample",
8
+ "ropedia_dataset_site": "https://ropedia.com/dataset",
9
+ "ropedia_release_page": "https://ropedia.com/blog/20260316_xperience_10m",
10
+ "homie_toolkit": "https://github.com/Ropedia/HOMIE-toolkit"
11
+ },
12
+ "hf_repo_metadata_observed": {
13
+ "repo_id": "ropedia-ai/xperience-10m",
14
+ "last_modified": "2026-04-21T05:03:45.000Z",
15
+ "gated": "manual",
16
+ "task_categories": [
17
+ "video-classification",
18
+ "image-to-text",
19
+ "depth-estimation",
20
+ "robotics"
21
+ ],
22
+ "modalities": [
23
+ "3d",
24
+ "audio",
25
+ "video"
26
+ ],
27
+ "language": [
28
+ "en"
29
+ ],
30
+ "size_categories": [
31
+ "1M<n<10M"
32
+ ],
33
+ "license": "other",
34
+ "access_note": "Reviewed gated access for approved non-commercial use; an external agreement-signing step may be required before approval."
35
+ },
36
+ "official_dataset_summary": {
37
+ "description": "Large-scale egocentric multimodal human-experience data for embodied AI, robotics, world models, and spatial intelligence.",
38
+ "experience_units": "about 10 million",
39
+ "recording_hours": "about 10,000",
40
+ "storage_described_by_card": "about 1 PB"
41
+ },
42
+ "official_scale_statistics": {
43
+ "rgb_frames": "about 2.88 billion",
44
+ "depth_frames": "about 720 million",
45
+ "camera_pose_records": "about 576 million",
46
+ "motion_capture_frames": "about 576 million",
47
+ "imu_records": "about 7.2 billion",
48
+ "caption_sentences": "about 16 million",
49
+ "caption_words": "about 200 million",
50
+ "vocabulary_words": "about 6,000",
51
+ "object_annotations": "about 350,000",
52
+ "trajectory_distance": "about 39,000 km"
53
+ },
54
+ "official_modalities": [
55
+ "six RGB video streams: four fisheye views and two rectified stereo views",
56
+ "audio embedded in the video streams",
57
+ "stereo depth and confidence",
58
+ "camera pose, SLAM trajectory, and point cloud",
59
+ "two-hand motion capture",
60
+ "full-body motion capture",
61
+ "inertial accelerometer and gyroscope streams",
62
+ "hierarchical language and caption annotations",
63
+ "metadata and calibration records"
64
+ ],
65
+ "episode_layout": {
66
+ "folder_pattern": "<session_uuid>/ep<episode_id>/",
67
+ "required_for_valid_episode_in_this_repo": [
68
+ "annotation.hdf5"
69
+ ],
70
+ "preferred_for_full_omni_in_this_repo": [
71
+ "fisheye_cam0.mp4",
72
+ "fisheye_cam1.mp4",
73
+ "fisheye_cam2.mp4",
74
+ "fisheye_cam3.mp4",
75
+ "stereo_left.mp4",
76
+ "stereo_right.mp4"
77
+ ],
78
+ "optional_or_excluded": [
79
+ "visualization.rrd"
80
+ ]
81
+ },
82
+ "annotation_hdf5_groups": [
83
+ "calibration",
84
+ "slam / camera pose",
85
+ "depth",
86
+ "hand_mocap",
87
+ "full_body_mocap",
88
+ "imu",
89
+ "video timing",
90
+ "metadata",
91
+ "caption / language annotations"
92
+ ],
93
+ "official_intended_uses": [
94
+ "egocentric video and action understanding",
95
+ "task and subtask recognition",
96
+ "temporal action localization",
97
+ "human-object interaction analysis",
98
+ "object grounding and caption/language grounding",
99
+ "audio-visual learning and multimodal pretraining",
100
+ "embodied reasoning and world-model learning",
101
+ "robotics imitation learning",
102
+ "depth estimation, odometry, SLAM, and scene reconstruction",
103
+ "hand/body pose and human motion understanding",
104
+ "sensor fusion"
105
+ ],
106
+ "current_repo_alignment": {
107
+ "validated_episode_count": 1,
108
+ "validated_frames": 5821,
109
+ "validated_windows": 1161,
110
+ "current_feature_dim": 8378,
111
+ "raw_data_redistributed": false,
112
+ "audio_feature_status": "Audio is present in the sample MP4 streams and visualized, but not extracted into the current baseline feature vector.",
113
+ "implemented_task_count": 12,
114
+ "neural_head_count": 12,
115
+ "covered_by_current_tasks": [
116
+ "action/subtask recognition",
117
+ "next-action prediction",
118
+ "transition and temporal diagnostics",
119
+ "hand trajectory forecasting",
120
+ "contact prediction",
121
+ "object relevance",
122
+ "caption grounding",
123
+ "cross-modal retrieval",
124
+ "modality reconstruction",
125
+ "misalignment detection"
126
+ ],
127
+ "not_yet_claimed": [
128
+ "full audio-visual learning",
129
+ "caption generation",
130
+ "depth-pixel estimation",
131
+ "SLAM estimation",
132
+ "neural rendering",
133
+ "policy learning",
134
+ "cross-episode generalization",
135
+ "real 32-episode Qwen3-Omni model quality"
136
+ ]
137
+ },
138
+ "responsible_use_boundary": [
139
+ "No raw MP4, raw annotation.hdf5, private gated data, raw visualization.rrd, or full Qwen weights are redistributed.",
140
+ "The project does not support identity recognition, re-identification, biometric profiling, surveillance, sensitive attribute inference, or safety-critical deployment.",
141
+ "Dataset use remains governed by the official Ropedia/Xperience-10M terms."
142
+ ]
143
+ }
scripts/build_artifact_index.py CHANGED
@@ -41,6 +41,22 @@ ARTIFACTS = [
41
  "surface": "repo_hf",
42
  "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
43
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  {
45
  "id": "quality_gates",
46
  "title": "Publication quality gates",
 
41
  "surface": "repo_hf",
42
  "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
43
  },
44
+ {
45
+ "id": "official_dataset_card_alignment",
46
+ "title": "Official Xperience-10M dataset-card alignment",
47
+ "path": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
48
+ "kind": "source_alignment",
49
+ "surface": "repo_hf",
50
+ "proves": "Aligns public dataset wording with the official gated Xperience-10M dataset card and records unsupported areas.",
51
+ },
52
+ {
53
+ "id": "official_dataset_card_alignment_json",
54
+ "title": "Official Xperience-10M dataset-card alignment JSON",
55
+ "path": "docs/data/xperience10m_dataset_card_alignment.json",
56
+ "kind": "source_alignment",
57
+ "surface": "website_hf",
58
+ "proves": "Machine-readable upstream dataset-card alignment facts for website and HF mirrors.",
59
+ },
60
  {
61
  "id": "quality_gates",
62
  "title": "Publication quality gates",
scripts/validate_mirror_parity.py CHANGED
@@ -35,6 +35,7 @@ DATA_FILES = [
35
  "summary_metrics.json",
36
  "task_walkthroughs.json",
37
  "website_integrity.json",
 
38
  ]
39
 
40
  ASSET_FILES = [
@@ -66,6 +67,7 @@ WEBSITE_FILES = [
66
 
67
  DOC_FILES = [
68
  "QUALITY_GATES.md",
 
69
  ]
70
 
71
 
 
35
  "summary_metrics.json",
36
  "task_walkthroughs.json",
37
  "website_integrity.json",
38
+ "xperience10m_dataset_card_alignment.json",
39
  ]
40
 
41
  ASSET_FILES = [
 
67
 
68
  DOC_FILES = [
69
  "QUALITY_GATES.md",
70
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
71
  ]
72
 
73
 
scripts/validate_publication_package.py CHANGED
@@ -53,6 +53,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
53
  "relative_path": "README.md",
54
  "required": [
55
  "xperience10m-taskfirst-v12-modality-xl",
 
56
  "all 12 task families before the",
57
  "Public-sample modality thumbnails remain enlarged below",
58
  ],
@@ -62,6 +63,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
62
  "relative_path": "README.md",
63
  "required": [
64
  "xperience10m-taskfirst-v12-modality-xl",
 
65
  "task-first 12-task infographic",
66
  "native responsive modality atlas",
67
  "website HTML",
@@ -72,6 +74,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
72
  "relative_path": "README.md",
73
  "required": [
74
  "xperience10m-taskfirst-v12-modality-xl",
 
75
  "task-first 12-task map",
76
  "including critical website HTML",
77
  ],
@@ -81,6 +84,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
81
  "relative_path": "PROJECT_README.md",
82
  "required": [
83
  "xperience10m-taskfirst-v12-modality-xl",
 
84
  "all 12 task families before the",
85
  "Public-sample modality thumbnails remain enlarged below",
86
  ],
@@ -90,6 +94,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
90
  "relative_path": "README.md",
91
  "required": [
92
  "xperience10m-taskfirst-v12-modality-xl",
 
93
  "task-first 12-head",
94
  "responsive modality atlas",
95
  "website HTML",
@@ -194,6 +199,7 @@ def required_assets(root: Path) -> dict[str, bool]:
194
  "codemeta.json",
195
  "ARTIFACT_GUIDE.md",
196
  "QUALITY_GATES.md",
 
197
  "REPRODUCIBILITY.md",
198
  "EVIDENCE_CONTRACT.md",
199
  "DATA_NOTICE.md",
@@ -208,6 +214,7 @@ def required_assets(root: Path) -> dict[str, bool]:
208
  "docs/data/quality_gates.json",
209
  "docs/data/project_manifest.json",
210
  "docs/data/reviewer_packet.json",
 
211
  "docs/data/reproducibility_matrix.json",
212
  "docs/data/modality_atlas.json",
213
  "docs/data/mirror_parity.json",
 
53
  "relative_path": "README.md",
54
  "required": [
55
  "xperience10m-taskfirst-v12-modality-xl",
56
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
57
  "all 12 task families before the",
58
  "Public-sample modality thumbnails remain enlarged below",
59
  ],
 
63
  "relative_path": "README.md",
64
  "required": [
65
  "xperience10m-taskfirst-v12-modality-xl",
66
+ "xperience10m_dataset_card_alignment.json",
67
  "task-first 12-task infographic",
68
  "native responsive modality atlas",
69
  "website HTML",
 
74
  "relative_path": "README.md",
75
  "required": [
76
  "xperience10m-taskfirst-v12-modality-xl",
77
+ "xperience10m_dataset_card_alignment.json",
78
  "task-first 12-task map",
79
  "including critical website HTML",
80
  ],
 
84
  "relative_path": "PROJECT_README.md",
85
  "required": [
86
  "xperience10m-taskfirst-v12-modality-xl",
87
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
88
  "all 12 task families before the",
89
  "Public-sample modality thumbnails remain enlarged below",
90
  ],
 
94
  "relative_path": "README.md",
95
  "required": [
96
  "xperience10m-taskfirst-v12-modality-xl",
97
+ "xperience10m_dataset_card_alignment.json",
98
  "task-first 12-head",
99
  "responsive modality atlas",
100
  "website HTML",
 
199
  "codemeta.json",
200
  "ARTIFACT_GUIDE.md",
201
  "QUALITY_GATES.md",
202
+ "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
203
  "REPRODUCIBILITY.md",
204
  "EVIDENCE_CONTRACT.md",
205
  "DATA_NOTICE.md",
 
214
  "docs/data/quality_gates.json",
215
  "docs/data/project_manifest.json",
216
  "docs/data/reviewer_packet.json",
217
+ "docs/data/xperience10m_dataset_card_alignment.json",
218
  "docs/data/reproducibility_matrix.json",
219
  "docs/data/modality_atlas.json",
220
  "docs/data/mirror_parity.json",
scripts/verify_live_publication.py CHANGED
@@ -47,6 +47,17 @@ HASH_GROUPS = [
47
  "hf_model": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/resolve/main/metrics/quality_gates.json",
48
  },
49
  },
 
 
 
 
 
 
 
 
 
 
 
50
  {
51
  "id": "quality_gates_markdown",
52
  "title": "Quality-gate Markdown",
@@ -69,6 +80,7 @@ MARKER_CHECKS = [
69
  "required": [
70
  "Release gates are explicit",
71
  "quality_gates.json",
 
72
  "xperience10m-taskfirst-v12-modality-xl",
73
  ],
74
  "forbidden": [
@@ -83,6 +95,7 @@ MARKER_CHECKS = [
83
  "required": [
84
  "Release gates are explicit",
85
  "quality_gates.json",
 
86
  "xperience10m-taskfirst-v12-modality-xl",
87
  ],
88
  "forbidden": [
@@ -94,14 +107,22 @@ MARKER_CHECKS = [
94
  "id": "hf_artifacts_card_current",
95
  "title": "HF artifact card links quality gates",
96
  "url": "https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts/raw/main/README.md",
97
- "required": ["QUALITY_GATES.md", "docs/data/quality_gates.json"],
 
 
 
 
98
  "forbidden": ["xperience10m-" + "taskfirst-v10"],
99
  },
100
  {
101
  "id": "hf_model_card_current",
102
  "title": "HF model card links quality gates",
103
  "url": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/raw/main/README.md",
104
- "required": ["QUALITY_GATES.md", "metrics/quality_gates.json"],
 
 
 
 
105
  "forbidden": ["xperience10m-" + "taskfirst-v10"],
106
  },
107
  ]
 
47
  "hf_model": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/resolve/main/metrics/quality_gates.json",
48
  },
49
  },
50
+ {
51
+ "id": "xperience10m_dataset_card_alignment_json",
52
+ "title": "Official Xperience-10M dataset-card alignment JSON",
53
+ "local_path": "docs/data/xperience10m_dataset_card_alignment.json",
54
+ "urls": {
55
+ "github_pages": "https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/data/xperience10m_dataset_card_alignment.json",
56
+ "hf_space": "https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite/raw/main/data/xperience10m_dataset_card_alignment.json",
57
+ "hf_artifacts": "https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts/resolve/main/docs/data/xperience10m_dataset_card_alignment.json",
58
+ "hf_model": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/resolve/main/metrics/xperience10m_dataset_card_alignment.json",
59
+ },
60
+ },
61
  {
62
  "id": "quality_gates_markdown",
63
  "title": "Quality-gate Markdown",
 
80
  "required": [
81
  "Release gates are explicit",
82
  "quality_gates.json",
83
+ "xperience10m_dataset_card_alignment.json",
84
  "xperience10m-taskfirst-v12-modality-xl",
85
  ],
86
  "forbidden": [
 
95
  "required": [
96
  "Release gates are explicit",
97
  "quality_gates.json",
98
+ "xperience10m_dataset_card_alignment.json",
99
  "xperience10m-taskfirst-v12-modality-xl",
100
  ],
101
  "forbidden": [
 
107
  "id": "hf_artifacts_card_current",
108
  "title": "HF artifact card links quality gates",
109
  "url": "https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts/raw/main/README.md",
110
+ "required": [
111
+ "QUALITY_GATES.md",
112
+ "docs/data/quality_gates.json",
113
+ "xperience10m_dataset_card_alignment.json",
114
+ ],
115
  "forbidden": ["xperience10m-" + "taskfirst-v10"],
116
  },
117
  {
118
  "id": "hf_model_card_current",
119
  "title": "HF model card links quality gates",
120
  "url": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/raw/main/README.md",
121
+ "required": [
122
+ "QUALITY_GATES.md",
123
+ "metrics/quality_gates.json",
124
+ "xperience10m_dataset_card_alignment.json",
125
+ ],
126
  "forbidden": ["xperience10m-" + "taskfirst-v10"],
127
  },
128
  ]