cy0307 commited on 25 days ago

Commit

94a5118

verified ·

1 Parent(s): d1e380e

Publish Ropedia Xperience-10M task baseline cards

Browse files

Files changed (18) hide show

ARTIFACT_GUIDE.md +15 -4
EVIDENCE_CONTRACT.md +15 -11
README.md +10 -0
XPERIENCE10M_DATASET_CARD_ALIGNMENT.md +170 -0
metrics/artifact_index.json +41 -18
metrics/evidence_contract.json +11 -0
metrics/mirror_parity.json +150 -88
metrics/project_manifest.json +10 -0
metrics/publication_audit.json +16 -14
metrics/quality_gates.json +1 -1
metrics/reviewer_packet.json +15 -3
metrics/scope_claims_audit.json +1 -1
metrics/website_integrity.json +17 -12
metrics/xperience10m_dataset_card_alignment.json +143 -0
scripts/build_artifact_index.py +16 -0
scripts/validate_mirror_parity.py +2 -0
scripts/validate_publication_package.py +7 -0
scripts/verify_live_publication.py +23 -2

ARTIFACT_GUIDE.md CHANGED Viewed

@@ -8,13 +8,15 @@ The project intentionally separates five layers:
 1. **Proof boundary:** what is claimed, what is smoke-only, and what remains
    gated by data access.
-2. **Data contract:** how one public Xperience-10M sample episode becomes
    aligned model windows and feature blocks.
-3. **Task evidence:** minimal and neural results for the 12 task contracts plus
    four research-direction extension probes.
-4. **Reproducibility:** public commands, expected outputs, and exact-match audit
    evidence for the single-episode pipeline.
-5. **Scale-up status:** scripts and reports for the planned 32-episode
    Qwen3-Omni pilot, without claiming those results before data access lands.
 ## Start Here
@@ -23,8 +25,10 @@ The project intentionally separates five layers:
 | --- | --- |
 | [`EVIDENCE_CONTRACT.md`](EVIDENCE_CONTRACT.md) | Defines which claims are verified and which are explicitly not claimed. |
 | [`QUALITY_GATES.md`](QUALITY_GATES.md) | Lists the automated release gates and post-publish checks required before presenting a release as current. |
 | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
 | [`docs/data/artifact_index.json`](docs/data/artifact_index.json) | Lists reviewer-critical files with existence, size, and stable hashes. |
 | [`docs/data/quality_gates.json`](docs/data/quality_gates.json) | Machine-readable quality-gate summary for website and HF mirrors. |
 | [`docs/data/live_publication_status.json`](docs/data/live_publication_status.json) | Last live GitHub/HF verification after upload. |
 | [`docs/data/mirror_parity.json`](docs/data/mirror_parity.json) | Confirms prepared HF Space, artifact, and model mirrors match the repo for critical data, figures, website HTML, and validator scripts. |
@@ -33,6 +37,13 @@ The project intentionally separates five layers:
 | [`docs/data/website_integrity.json`](docs/data/website_integrity.json) | Confirms local site links, anchors, JSON bundles, and referenced images resolve. |
 | [`docs/data/reviewer_packet.json`](docs/data/reviewer_packet.json) | Gives the shortest machine-readable reviewer route. |
 ## Data Contract
 | Artifact | What it proves |

 1. **Proof boundary:** what is claimed, what is smoke-only, and what remains
    gated by data access.
+2. **Official source alignment:** what the upstream Xperience-10M dataset card
+   says, and which parts this repo currently covers.
+3. **Data contract:** how one public Xperience-10M sample episode becomes
    aligned model windows and feature blocks.
+4. **Task evidence:** minimal and neural results for the 12 task contracts plus
    four research-direction extension probes.
+5. **Reproducibility:** public commands, expected outputs, and exact-match audit
    evidence for the single-episode pipeline.
+6. **Scale-up status:** scripts and reports for the planned 32-episode
    Qwen3-Omni pilot, without claiming those results before data access lands.
 ## Start Here
 | --- | --- |
 | [`EVIDENCE_CONTRACT.md`](EVIDENCE_CONTRACT.md) | Defines which claims are verified and which are explicitly not claimed. |
 | [`QUALITY_GATES.md`](QUALITY_GATES.md) | Lists the automated release gates and post-publish checks required before presenting a release as current. |
+| [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md) | Aligns this repo's public dataset wording with the official gated Xperience-10M dataset card. |
 | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
 | [`docs/data/artifact_index.json`](docs/data/artifact_index.json) | Lists reviewer-critical files with existence, size, and stable hashes. |
+| [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json) | Machine-readable official dataset-card alignment summary. |
 | [`docs/data/quality_gates.json`](docs/data/quality_gates.json) | Machine-readable quality-gate summary for website and HF mirrors. |
 | [`docs/data/live_publication_status.json`](docs/data/live_publication_status.json) | Last live GitHub/HF verification after upload. |
 | [`docs/data/mirror_parity.json`](docs/data/mirror_parity.json) | Confirms prepared HF Space, artifact, and model mirrors match the repo for critical data, figures, website HTML, and validator scripts. |
 | [`docs/data/website_integrity.json`](docs/data/website_integrity.json) | Confirms local site links, anchors, JSON bundles, and referenced images resolve. |
 | [`docs/data/reviewer_packet.json`](docs/data/reviewer_packet.json) | Gives the shortest machine-readable reviewer route. |
+## Official Source Alignment
+| Artifact | What it proves |
+| --- | --- |
+| [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md) | Human-readable summary of the official gated Xperience-10M dataset card, scale, modalities, access boundary, intended uses, and limitations. |
+| [`docs/data/xperience10m_dataset_card_alignment.json`](docs/data/xperience10m_dataset_card_alignment.json) | Machine-readable copy of the same alignment facts for website and HF mirrors. |
 ## Data Contract
 | Artifact | What it proves |

EVIDENCE_CONTRACT.md CHANGED Viewed

@@ -5,6 +5,7 @@ local artifact that a reader can inspect before trusting the dashboard.
 | Claim | Current evidence | Status | Boundary |
 | --- | --- | --- | --- |
 | The public Xperience-10M sample has been converted into aligned model windows. | `results/episode_task_suite/windows.csv`, `results/episode_task_suite/shared_windows.npz`, `results/episode_task_suite/summary_report.json` | Verified for 5,821 frames and 1,161 windows | One public sample episode only |
 | The current feature contract is explicit and reviewable. | `results/episode_task_suite/feature_manifest.json`, `results/episode_task_suite/available_modalities.json` | Verified for an 8,378-d feature vector | Audio is present in MP4 streams but not yet a feature block |
 | The public sample modalities are inspectable without raw data redistribution. | `docs/data/modality_atlas.json`, `docs/assets/modalities/`, website modality atlas | Verified derived thumbnail atlas | Thumbnails are presentation/review assets, not a replacement for official raw data access |
@@ -29,28 +30,31 @@ local artifact that a reader can inspect before trusting the dashboard.
 1. Read `docs/data/reviewer_packet.json` for the shortest audit path and proof
    boundary.
-2. Read `ARTIFACT_GUIDE.md` and `docs/data/artifact_index.json` to see grouped
    reviewer artifacts, indexed proof artifacts,
    sizes, and stable-file hashes.
-3. Read `docs/assets/task_suite_infographic.png` and
    `docs/data/modality_atlas.json` for the high-level map and modality atlas.
-4. Read `REPRODUCIBILITY.md` and `docs/data/reproducibility_matrix.json` before
    rerunning the public pipeline.
-5. Inspect `results/episode_task_suite/summary_report.json` for the task and
    metric source of truth.
-6. Inspect `results/episode_task_suite/feature_manifest.json` to see which
    modalities enter the current feature vector.
-7. Inspect `results/episode_task_suite/neural_mlp/` to compare minimal and
    neural heads under the same splits.
-8. Inspect `docs/data/scope_claims_audit.json` before interpreting historical
    `32ep` strings in Qwen3-Omni smoke artifacts.
-9. Inspect `docs/data/mirror_parity.json` before assuming the GitHub and
    Hugging Face mirrors contain the same critical data, visual, HTML, and
    validator files.
-10. Inspect `results/omni_finetune/DATA_BLOCKER_REPORT.md` before interpreting
    any Qwen3-Omni artifact.
-11. Inspect `QUALITY_GATES.md`, `docs/data/quality_gates.json`,
    `docs/data/publication_audit.json`, and `docs/data/website_integrity.json`
    before publishing or sharing the project externally.
-12. Inspect `CITATION.cff`, `codemeta.json`, and `LICENSE` before reusing or
    citing the project.

 | Claim | Current evidence | Status | Boundary |
 | --- | --- | --- | --- |
+| The public dataset description is aligned with the official gated Xperience-10M dataset card. | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `docs/data/xperience10m_dataset_card_alignment.json` | Verified description alignment | Summarizes upstream public metadata and card facts; does not grant access or mirror raw data |
 | The public Xperience-10M sample has been converted into aligned model windows. | `results/episode_task_suite/windows.csv`, `results/episode_task_suite/shared_windows.npz`, `results/episode_task_suite/summary_report.json` | Verified for 5,821 frames and 1,161 windows | One public sample episode only |
 | The current feature contract is explicit and reviewable. | `results/episode_task_suite/feature_manifest.json`, `results/episode_task_suite/available_modalities.json` | Verified for an 8,378-d feature vector | Audio is present in MP4 streams but not yet a feature block |
 | The public sample modalities are inspectable without raw data redistribution. | `docs/data/modality_atlas.json`, `docs/assets/modalities/`, website modality atlas | Verified derived thumbnail atlas | Thumbnails are presentation/review assets, not a replacement for official raw data access |
 1. Read `docs/data/reviewer_packet.json` for the shortest audit path and proof
    boundary.
+2. Read `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` and
+   `docs/data/xperience10m_dataset_card_alignment.json` to check the official
+   dataset-card wording and how the current repo is scoped against it.
+3. Read `ARTIFACT_GUIDE.md` and `docs/data/artifact_index.json` to see grouped
    reviewer artifacts, indexed proof artifacts,
    sizes, and stable-file hashes.
+4. Read `docs/assets/task_suite_infographic.png` and
    `docs/data/modality_atlas.json` for the high-level map and modality atlas.
+5. Read `REPRODUCIBILITY.md` and `docs/data/reproducibility_matrix.json` before
    rerunning the public pipeline.
+6. Inspect `results/episode_task_suite/summary_report.json` for the task and
    metric source of truth.
+7. Inspect `results/episode_task_suite/feature_manifest.json` to see which
    modalities enter the current feature vector.
+8. Inspect `results/episode_task_suite/neural_mlp/` to compare minimal and
    neural heads under the same splits.
+9. Inspect `docs/data/scope_claims_audit.json` before interpreting historical
    `32ep` strings in Qwen3-Omni smoke artifacts.
+10. Inspect `docs/data/mirror_parity.json` before assuming the GitHub and
    Hugging Face mirrors contain the same critical data, visual, HTML, and
    validator files.
+11. Inspect `results/omni_finetune/DATA_BLOCKER_REPORT.md` before interpreting
    any Qwen3-Omni artifact.
+12. Inspect `QUALITY_GATES.md`, `docs/data/quality_gates.json`,
    `docs/data/publication_audit.json`, and `docs/data/website_integrity.json`
    before publishing or sharing the project externally.
+13. Inspect `CITATION.cff`, `codemeta.json`, and `LICENSE` before reusing or
    citing the project.

README.md CHANGED Viewed

@@ -73,6 +73,13 @@ map, then mirror the responsive modality atlas metadata in
 `metrics/modality_atlas.json`, with standalone derived thumbnails in
 `assets/modalities/`.
 The committed heads are intentionally small:
 - z-score + linear softmax classifiers,
@@ -98,6 +105,7 @@ Their purpose is to make every input/output contract auditable before scaling to
 | 5 | What is still pending? | companion GitHub `results/omni_finetune/DATA_BLOCKER_REPORT.md` and `A100_HF_RELAY_STATUS.md` |
 Human-readable artifact guide mirror: `ARTIFACT_GUIDE.md`.
 Publication quality gates mirror: `QUALITY_GATES.md` and `metrics/quality_gates.json`.
 Live publication status mirror: `metrics/live_publication_status.json`.
 Machine-readable reviewer packet mirror: `metrics/reviewer_packet.json`.
@@ -118,6 +126,7 @@ Source-of-truth artifact index mirror: `metrics/artifact_index.json`.
 | Website integrity | `metrics/website_integrity.json` and validator script mirror | local links, anchors, JSON bundles, and referenced images only |
 | Quality gates | `QUALITY_GATES.md`, `metrics/quality_gates.json`, and `scripts/build_quality_gates.py` | automated release gates plus live post-publish checks |
 | Live publication | `metrics/live_publication_status.json`, `scripts/verify_live_publication.py` | last public GitHub/HF URL verification after upload |
 | Artifact index | `metrics/artifact_index.json` and `scripts/build_artifact_index.py` | compact catalog of the reviewer-critical proof artifacts |
 | Artifact guide | `ARTIFACT_GUIDE.md` | human-readable map of proof boundary, task evidence, mirrors, and scale-up status |
 | Reproducibility | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json` | public commands, expected outputs, exact-match audit evidence, and non-reproducible boundaries |
@@ -149,6 +158,7 @@ transfers them to H20 for manifest building, training, and evaluation.
 | `assets/task_architectures.png` | shows the shared pipeline and all 12 heads |
 | `assets/task_suite_infographic.png` | presents the shared processing contract, 12 heads, verified metrics, and public-sample modality thumbnails |
 | `assets/modalities/`, `metrics/modality_atlas.json` | responsive modality-card thumbnails and metadata for sample inspection |
 | `metrics/artifact_index.json` | indexes proof artifacts with existence, size, and stable-file hashes |
 | `metrics/mirror_parity.json` | verifies prepared repo/HF mirrors have matching critical data, figures, website HTML, and validator files before upload |
 | `metrics/scope_claims_audit.json` | verifies historical `32ep` smoke-run identifiers are not presented as real 32-episode results |

 `metrics/modality_atlas.json`, with standalone derived thumbnails in
 `assets/modalities/`.
+The model repo also mirrors the official-source alignment artifact at
+`metrics/xperience10m_dataset_card_alignment.json` plus
+`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`. That file records the official
+`ropedia-ai/xperience-10m` card scope, gated access, full-scale modalities,
+episode layout, intended uses, and the claims this small baseline repo does
+not make.
 The committed heads are intentionally small:
 - z-score + linear softmax classifiers,
 | 5 | What is still pending? | companion GitHub `results/omni_finetune/DATA_BLOCKER_REPORT.md` and `A100_HF_RELAY_STATUS.md` |
 Human-readable artifact guide mirror: `ARTIFACT_GUIDE.md`.
+Official dataset-card alignment mirror: `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md` and `metrics/xperience10m_dataset_card_alignment.json`.
 Publication quality gates mirror: `QUALITY_GATES.md` and `metrics/quality_gates.json`.
 Live publication status mirror: `metrics/live_publication_status.json`.
 Machine-readable reviewer packet mirror: `metrics/reviewer_packet.json`.
 | Website integrity | `metrics/website_integrity.json` and validator script mirror | local links, anchors, JSON bundles, and referenced images only |
 | Quality gates | `QUALITY_GATES.md`, `metrics/quality_gates.json`, and `scripts/build_quality_gates.py` | automated release gates plus live post-publish checks |
 | Live publication | `metrics/live_publication_status.json`, `scripts/verify_live_publication.py` | last public GitHub/HF URL verification after upload |
+| Official dataset card alignment | `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `metrics/xperience10m_dataset_card_alignment.json` | official source scope, gated access, modality coverage, scale, and this repo's single-episode boundary |
 | Artifact index | `metrics/artifact_index.json` and `scripts/build_artifact_index.py` | compact catalog of the reviewer-critical proof artifacts |
 | Artifact guide | `ARTIFACT_GUIDE.md` | human-readable map of proof boundary, task evidence, mirrors, and scale-up status |
 | Reproducibility | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json` | public commands, expected outputs, exact-match audit evidence, and non-reproducible boundaries |
 | `assets/task_architectures.png` | shows the shared pipeline and all 12 heads |
 | `assets/task_suite_infographic.png` | presents the shared processing contract, 12 heads, verified metrics, and public-sample modality thumbnails |
 | `assets/modalities/`, `metrics/modality_atlas.json` | responsive modality-card thumbnails and metadata for sample inspection |
+| `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`, `metrics/xperience10m_dataset_card_alignment.json` | aligns public wording with the official gated Xperience-10M dataset card |
 | `metrics/artifact_index.json` | indexes proof artifacts with existence, size, and stable-file hashes |
 | `metrics/mirror_parity.json` | verifies prepared repo/HF mirrors have matching critical data, figures, website HTML, and validator files before upload |
 | `metrics/scope_claims_audit.json` | verifies historical `32ep` smoke-run identifiers are not presented as real 32-episode results |

XPERIENCE10M_DATASET_CARD_ALIGNMENT.md ADDED Viewed

	@@ -0,0 +1,170 @@

+# Xperience-10M Official Dataset Card Alignment
+This file records the public description of the official
+[`ropedia-ai/xperience-10m`](https://huggingface.co/datasets/ropedia-ai/xperience-10m)
+dataset card and how this repo uses only one public sample episode from that
+larger source. It is a description-alignment artifact, not a raw-data mirror.
+Checked on: 2026-06-01.
+## Official Dataset Scope
+The official Xperience-10M dataset is described by Ropedia as a large-scale
+egocentric multimodal dataset for embodied AI, robotics, world models, and
+spatial intelligence. The dataset card frames it as human-experience data with
+roughly 10 million interaction/experience units and about 10,000 hours of
+synchronized first-person recording.
+The official card metadata lists these task and modality categories:
+- task categories: video classification, image-to-text, depth estimation, robotics
+- modalities: 3D, audio, video
+- language: English
+- license field: `other`
+- size category: `1M<n<10M`
+- access: manually gated, reviewed access for approved non-commercial use
+The current public Hugging Face API metadata reports the dataset repo as
+`gated: manual` and notes that an external DocuSign agreement may be required
+before approval.
+## Official Modalities
+The official dataset card describes the full dataset as synchronized 4D
+multimodal egocentric data spanning:
+- six RGB video streams: four fisheye views and two rectified stereo views
+- audio embedded in the video streams
+- stereo depth and depth confidence
+- camera pose, SLAM trajectory, and point-cloud information
+- two-hand motion capture, including hand joints and MANO-related data
+- full-body motion capture, keypoints, contacts, and body orientation data
+- inertial sensing from accelerometer and gyroscope streams
+- hierarchical language/caption annotations
+- metadata and calibration records
+## Official Scale Statistics
+The official dataset card describes Xperience-10M at full scale with these
+headline counts:
+| Quantity | Official-card scale |
+| --- | --- |
+| Human experience / interaction units | about 10 million |
+| Recording duration | about 10,000 hours |
+| RGB frames | about 2.88 billion |
+| Depth frames | about 720 million |
+| Camera-pose records | about 576 million |
+| Motion-capture frames | about 576 million |
+| IMU records | about 7.2 billion |
+| Caption sentences | about 16 million |
+| Caption words | about 200 million |
+| Vocabulary size | about 6,000 words |
+| Object annotations | about 350,000 objects |
+| Trajectory distance | about 39,000 km |
+| Total storage described by the card | about 1 PB |
+The public Hugging Face page may show a smaller currently listed file-size
+summary for the gated repo. This project keeps those concepts separate: the
+official card scale describes the dataset design, while this repo validates
+only the files that are actually available to the project.
+## Episode File Layout
+The official gated file listing and the public sample use episode folders with
+this practical layout:
+```text
+<session_uuid>/
+  ep<episode_id>/
+    fisheye_cam0.mp4
+    fisheye_cam1.mp4
+    fisheye_cam2.mp4
+    fisheye_cam3.mp4
+    stereo_left.mp4
+    stereo_right.mp4
+    annotation.hdf5
+    visualization.rrd        # optional viewer artifact; excluded from training downloads
+```
+For this repo, a valid training/evaluation episode requires `annotation.hdf5`.
+Full-omni mode prefers all six MP4 streams. Degraded mode may use
+`fisheye_cam0.mp4` plus the annotation file, but must record missing views in
+the manifest.
+## Annotation File Content
+The official card describes the HDF5 annotation file as carrying aligned
+multimodal records. The relevant groups include:
+- calibration: camera intrinsics/extrinsics for fisheye and stereo cameras
+- SLAM/camera pose: quaternions, translations, frame names, and point cloud
+- depth: depth map, confidence, scale, min/max, and validity metadata
+- hand motion capture: left/right hand joints, translations, and MANO-related records
+- full-body motion capture: body keypoints, contacts, transforms, and body rotations
+- IMU: timestamps, accelerometer, gyroscope, and keyframe metadata
+- video timing: timestamps, frame numbers, and video duration
+- language/caption annotations and metadata
+This repo's current 8,378-d feature vector uses video-derived statistics,
+depth, pose/SLAM, calibration, mocap, IMU, and language-derived blocks. Audio
+is documented and visualized, but it is not yet extracted into the current
+baseline feature vector.
+## Intended Research Uses
+The official dataset card supports research directions such as:
+- egocentric video/action understanding
+- task and subtask recognition
+- temporal action localization and human-object interaction analysis
+- object grounding and caption/language grounding
+- audio-visual learning and multimodal pretraining
+- embodied reasoning, world-model learning, and robotics imitation learning
+- depth estimation, visual odometry, camera trajectory, SLAM, and scene reconstruction
+- hand/body pose, human motion understanding, and sensor fusion
+This repo currently implements a single-episode audit suite that starts several
+of those directions, but it does not solve the full official task list. The 12
+current tasks cover action/subtask labels, next-action prediction, transition
+and temporal diagnostics, hand trajectory forecasting, contact prediction,
+object relevance, caption grounding, cross-modal retrieval, modality
+reconstruction, and misalignment detection. Missing or only-proxy coverage
+includes real audio-visual modeling, full caption generation, depth-pixel
+estimation, full SLAM estimation, neural rendering, policy learning, and
+cross-episode generalization.
+## Responsible-Use Boundary
+The official dataset is gated and intended for approved non-commercial research
+use. This repo therefore does not redistribute raw MP4 files, raw
+`annotation.hdf5`, private gated data, raw `visualization.rrd`, or any full
+Qwen weights. Public assets here are derived metrics, small thumbnails,
+manifests, scripts, charts, and lightweight baseline artifacts.
+The official card also makes clear that the data is not meant for identity
+recognition, re-identification, biometric profiling, surveillance, sensitive
+attribute inference, or safety-critical deployment without appropriate
+safeguards.
+## Limitations To Preserve In This Project
+When describing Xperience-10M in this repo, keep these limitations visible:
+- one public sample episode cannot prove cross-environment generalization
+- full-dataset claims require gated access, many episodes, and held-out episode splits
+- motion capture, SLAM, depth, captions, and other annotations can contain noise
+- language annotations are not exhaustive descriptions of every scene state
+- large-scale training requires substantial storage, preprocessing, and compute
+- the current feature vector does not include an extracted audio feature block
+## Current Project Alignment
+| Official dataset card concept | Current repo status |
+| --- | --- |
+| Full Xperience-10M is large, gated, and multi-episode | Acknowledged; not redistributed |
+| Public sample includes video/audio/depth/pose/mocap/IMU/language | Represented in the modality atlas |
+| Episode layout uses six MP4 streams and `annotation.hdf5` | Used by sample inspection and pilot-readiness scripts |
+| Audio exists in MP4 streams | Documented and visualized, not featurized |
+| 4D reconstruction/world modeling are intended research directions | Represented by proxy/diagnostic tasks only |
+| Real model quality requires held-out multi-episode evaluation | Not claimed yet; 32-episode pilot remains gated |

metrics/artifact_index.json CHANGED Viewed

@@ -1,12 +1,13 @@
 {
   "title": "Ropedia Xperience-10M Task Suite Artifact Index",
-  "generated_at_utc": "2026-06-01T07:34:10+00:00",
   "status": "pass",
-  "artifact_count": 33,
   "missing": [],
   "by_kind": {
     "claim_boundary": 1,
     "review_path": 3,
     "quality_gate": 4,
     "reproducibility": 2,
     "hygiene_report": 1,
@@ -36,8 +37,8 @@
       "surface": "repo",
       "proves": "Defines what is verified, what is smoke-only, and what must not be inferred.",
       "exists": true,
-      "bytes": 7046,
-      "sha256": "fd4d09938147487f9c3e713c6ced07b3e6103426f3ccc58266047365bf4ed1ea"
     },
     {
       "id": "reviewer_packet",
@@ -47,8 +48,8 @@
       "surface": "website_hf",
       "proves": "Gives a short audit path with scope status and public surfaces.",
       "exists": true,
-      "bytes": 4406,
-      "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
     },
     {
       "id": "artifact_guide",
@@ -58,8 +59,30 @@
       "surface": "repo_hf",
       "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
       "exists": true,
-      "bytes": 6943,
-      "sha256": "81204b332da6bd1c3ebec603990eeacbec984534499df59463cad9aa6ab7841f"
     },
     {
       "id": "quality_gates",
@@ -81,7 +104,7 @@
       "proves": "Machine-readable release-gate summary for validators, mirrors, and reviewer surfaces.",
       "exists": true,
       "bytes": 4228,
-      "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
     },
     {
       "id": "live_publication_status",
@@ -103,8 +126,8 @@
       "surface": "repo",
       "proves": "Fetches the published GitHub/HF URLs and compares live hashes and public-card markers against the release assets.",
       "exists": true,
-      "bytes": 10587,
-      "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
     },
     {
       "id": "reproducibility_contract",
@@ -136,8 +159,8 @@
       "surface": "repo_hf",
       "proves": "Generates the selective proof-artifact catalog from local files.",
       "exists": true,
-      "bytes": 12875,
-      "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
     },
     {
       "id": "publication_audit",
@@ -148,7 +171,7 @@
       "volatile": true,
       "proves": "Confirms public bundles pass raw-data, cache, archive, and token-string checks.",
       "exists": true,
-      "bytes": 5508,
       "hash_policy": "existence_and_size_only"
     },
     {
@@ -172,7 +195,7 @@
       "volatile": true,
       "proves": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
       "exists": true,
-      "bytes": 48916,
       "hash_policy": "existence_and_size_only"
     },
     {
@@ -184,7 +207,7 @@
       "volatile": true,
       "proves": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
       "exists": true,
-      "bytes": 6159,
       "hash_policy": "existence_and_size_only"
     },
     {
@@ -195,8 +218,8 @@
       "surface": "website_hf",
       "proves": "Lists public URLs, upstream sources, and machine-readable project metadata.",
       "exists": true,
-      "bytes": 2789,
-      "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
     },
     {
       "id": "task_summary",

 {
   "title": "Ropedia Xperience-10M Task Suite Artifact Index",
+  "generated_at_utc": "2026-06-01T08:04:57+00:00",
   "status": "pass",
+  "artifact_count": 35,
   "missing": [],
   "by_kind": {
     "claim_boundary": 1,
     "review_path": 3,
+    "source_alignment": 2,
     "quality_gate": 4,
     "reproducibility": 2,
     "hygiene_report": 1,
       "surface": "repo",
       "proves": "Defines what is verified, what is smoke-only, and what must not be inferred.",
       "exists": true,
+      "bytes": 7572,
+      "sha256": "1b4c78c3d92c8592dcc7532b94103743bfef2a36b025245968c79fd51fa5c42c"
     },
     {
       "id": "reviewer_packet",
       "surface": "website_hf",
       "proves": "Gives a short audit path with scope status and public surfaces.",
       "exists": true,
+      "bytes": 5044,
+      "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
     },
     {
       "id": "artifact_guide",
       "surface": "repo_hf",
       "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
       "exists": true,
+      "bytes": 7925,
+      "sha256": "79c81d9f5631df046892e020f979773ac0933381f17b6d2b9f3ff503d6c332b7"
+    },
+    {
+      "id": "official_dataset_card_alignment",
+      "title": "Official Xperience-10M dataset-card alignment",
+      "path": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+      "kind": "source_alignment",
+      "surface": "repo_hf",
+      "proves": "Aligns public dataset wording with the official gated Xperience-10M dataset card and records unsupported areas.",
+      "exists": true,
+      "bytes": 7654,
+      "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
+    },
+    {
+      "id": "official_dataset_card_alignment_json",
+      "title": "Official Xperience-10M dataset-card alignment JSON",
+      "path": "docs/data/xperience10m_dataset_card_alignment.json",
+      "kind": "source_alignment",
+      "surface": "website_hf",
+      "proves": "Machine-readable upstream dataset-card alignment facts for website and HF mirrors.",
+      "exists": true,
+      "bytes": 5103,
+      "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
     },
     {
       "id": "quality_gates",
       "proves": "Machine-readable release-gate summary for validators, mirrors, and reviewer surfaces.",
       "exists": true,
       "bytes": 4228,
+      "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
     },
     {
       "id": "live_publication_status",
       "surface": "repo",
       "proves": "Fetches the published GitHub/HF URLs and compares live hashes and public-card markers against the release assets.",
       "exists": true,
+      "bytes": 11753,
+      "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
     },
     {
       "id": "reproducibility_contract",
       "surface": "repo_hf",
       "proves": "Generates the selective proof-artifact catalog from local files.",
       "exists": true,
+      "bytes": 13641,
+      "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
     },
     {
       "id": "publication_audit",
       "volatile": true,
       "proves": "Confirms public bundles pass raw-data, cache, archive, and token-string checks.",
       "exists": true,
+      "bytes": 5624,
       "hash_policy": "existence_and_size_only"
     },
     {
       "volatile": true,
       "proves": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
       "exists": true,
+      "bytes": 51819,
       "hash_policy": "existence_and_size_only"
     },
     {
       "volatile": true,
       "proves": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
       "exists": true,
+      "bytes": 6286,
       "hash_policy": "existence_and_size_only"
     },
     {
       "surface": "website_hf",
       "proves": "Lists public URLs, upstream sources, and machine-readable project metadata.",
       "exists": true,
+      "bytes": 3411,
+      "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
     },
     {
       "id": "task_summary",

metrics/evidence_contract.json CHANGED Viewed

@@ -2,6 +2,17 @@
   "project": "Ropedia Xperience-10M Task Suite",
   "scope": "single public Xperience-10M sample episode",
   "claims": [
     {
       "id": "aligned_windows",
       "claim": "The public Xperience-10M sample has been converted into aligned model windows.",

   "project": "Ropedia Xperience-10M Task Suite",
   "scope": "single public Xperience-10M sample episode",
   "claims": [
+    {
+      "id": "official_dataset_card_alignment",
+      "claim": "The public dataset description is aligned with the official gated Xperience-10M dataset card.",
+      "status": "verified",
+      "evidence": [
+        "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+        "docs/data/xperience10m_dataset_card_alignment.json",
+        "https://huggingface.co/datasets/ropedia-ai/xperience-10m"
+      ],
+      "boundary": "summarizes upstream public metadata and dataset-card facts; does not grant access or mirror raw data"
+    },
     {
       "id": "aligned_windows",
       "claim": "The public Xperience-10M sample has been converted into aligned model windows.",

metrics/mirror_parity.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-01T07:35:01+00:00",
   "hf_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish",
   "summary": {
-    "group_count": 34,
     "failure_count": 0,
     "failures_by_surface": {}
   },
@@ -36,27 +36,27 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/artifact_index.json",
         "exists": true,
-        "bytes": 14654,
-        "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/artifact_index.json",
           "exists": true,
-          "bytes": 14654,
-          "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/artifact_index.json",
           "exists": true,
-          "bytes": 14654,
-          "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/artifact_index.json",
           "exists": true,
-          "bytes": 14654,
-          "sha256": "28ddd6791143bc03508dcc1e82925d4721c3fd24e0fd10aa6b57c8baa431995d"
         }
       },
       "failures": []
@@ -67,27 +67,27 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/evidence_contract.json",
         "exists": true,
-        "bytes": 7954,
-        "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/evidence_contract.json",
           "exists": true,
-          "bytes": 7954,
-          "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/evidence_contract.json",
           "exists": true,
-          "bytes": 7954,
-          "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/evidence_contract.json",
           "exists": true,
-          "bytes": 7954,
-          "sha256": "bf3a8a9f4c8dd618358ffb1387e60fc5446dfb6d901af447b3ec729c08c70fe5"
         }
       },
       "failures": []
@@ -160,27 +160,27 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/project_manifest.json",
         "exists": true,
-        "bytes": 2789,
-        "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/project_manifest.json",
           "exists": true,
-          "bytes": 2789,
-          "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/project_manifest.json",
           "exists": true,
-          "bytes": 2789,
-          "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/project_manifest.json",
           "exists": true,
-          "bytes": 2789,
-          "sha256": "333d9876affa556f502d7038ad299c242a68b7bf3be90c3f6e10edf0e081c010"
         }
       },
       "failures": []
@@ -191,27 +191,27 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/publication_audit.json",
         "exists": true,
-        "bytes": 5508,
-        "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/publication_audit.json",
           "exists": true,
-          "bytes": 5508,
-          "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/publication_audit.json",
           "exists": true,
-          "bytes": 5508,
-          "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/publication_audit.json",
           "exists": true,
-          "bytes": 5508,
-          "sha256": "cb6ec1c4cf3ec8de45f94c82a3aa1b074dde08f0ea582f7ae960622d909f5825"
         }
       },
       "failures": []
@@ -223,26 +223,26 @@
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/quality_gates.json",
         "exists": true,
         "bytes": 4228,
-        "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/quality_gates.json",
           "exists": true,
           "bytes": 4228,
-          "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/quality_gates.json",
           "exists": true,
           "bytes": 4228,
-          "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/quality_gates.json",
           "exists": true,
           "bytes": 4228,
-          "sha256": "d5b145e83d6a520c628353a894ad3a438418604e262b032767674e66911f893e"
         }
       },
       "failures": []
@@ -346,27 +346,27 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/reviewer_packet.json",
         "exists": true,
-        "bytes": 4406,
-        "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/reviewer_packet.json",
           "exists": true,
-          "bytes": 4406,
-          "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/reviewer_packet.json",
           "exists": true,
-          "bytes": 4406,
-          "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/reviewer_packet.json",
           "exists": true,
-          "bytes": 4406,
-          "sha256": "c3669df9fce7adc2cbdb95fa4d1cd75644ababf4bcda88bb19090b4296f8514a"
         }
       },
       "failures": []
@@ -378,26 +378,26 @@
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/scope_claims_audit.json",
         "exists": true,
         "bytes": 19964,
-        "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/scope_claims_audit.json",
           "exists": true,
           "bytes": 19964,
-          "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/scope_claims_audit.json",
           "exists": true,
           "bytes": 19964,
-          "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/scope_claims_audit.json",
           "exists": true,
           "bytes": 19964,
-          "sha256": "105f1861f0adf139150ab04058d9b424812e687d13449f696a33c8d63e2a4c27"
         }
       },
       "failures": []
@@ -470,27 +470,58 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/website_integrity.json",
         "exists": true,
-        "bytes": 6159,
-        "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/website_integrity.json",
           "exists": true,
-          "bytes": 6159,
-          "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/website_integrity.json",
           "exists": true,
-          "bytes": 6159,
-          "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/website_integrity.json",
           "exists": true,
-          "bytes": 6159,
-          "sha256": "02e6ec63a7d67c64717b7e8ca235c4519f0e54467171ff8febc98278f23529db"
         }
       },
       "failures": []
@@ -871,21 +902,21 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/build_artifact_index.py",
         "exists": true,
-        "bytes": 12875,
-        "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/build_artifact_index.py",
           "exists": true,
-          "bytes": 12875,
-          "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/build_artifact_index.py",
           "exists": true,
-          "bytes": 12875,
-          "sha256": "9dd7b6e3a511db843d15f15f33d7f0481c41c19dc80031749f006f123162a637"
         }
       },
       "failures": []
@@ -921,21 +952,21 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/verify_live_publication.py",
         "exists": true,
-        "bytes": 10587,
-        "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/verify_live_publication.py",
           "exists": true,
-          "bytes": 10587,
-          "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/verify_live_publication.py",
           "exists": true,
-          "bytes": 10587,
-          "sha256": "dd8456784c1442ccb622c0fb0da0369cad587dc0023142038b08613ec28a40b4"
         }
       },
       "failures": []
@@ -946,21 +977,21 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_mirror_parity.py",
         "exists": true,
-        "bytes": 8423,
-        "sha256": "213f46788af2f22763ba2a998b23dc8db17596c148196654c45ae287a58a330f"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_mirror_parity.py",
           "exists": true,
-          "bytes": 8423,
-          "sha256": "213f46788af2f22763ba2a998b23dc8db17596c148196654c45ae287a58a330f"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_mirror_parity.py",
           "exists": true,
-          "bytes": 8423,
-          "sha256": "213f46788af2f22763ba2a998b23dc8db17596c148196654c45ae287a58a330f"
         }
       },
       "failures": []
@@ -971,21 +1002,21 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_publication_package.py",
         "exists": true,
-        "bytes": 12630,
-        "sha256": "c7ca01135dfc6414b3accc42ec833905feada6e8e82f65b6e3bf855657dba5d9"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_publication_package.py",
           "exists": true,
-          "bytes": 12630,
-          "sha256": "c7ca01135dfc6414b3accc42ec833905feada6e8e82f65b6e3bf855657dba5d9"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_publication_package.py",
           "exists": true,
-          "bytes": 12630,
-          "sha256": "c7ca01135dfc6414b3accc42ec833905feada6e8e82f65b6e3bf855657dba5d9"
         }
       },
       "failures": []
@@ -1046,21 +1077,21 @@
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/index.html",
         "exists": true,
-        "bytes": 91007,
-        "sha256": "4e1a1fd3d4b3de962adbd2e2b1b3c6fe5771c77a65a478035d4ca33e7d999263"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/index.html",
           "exists": true,
-          "bytes": 91007,
-          "sha256": "4e1a1fd3d4b3de962adbd2e2b1b3c6fe5771c77a65a478035d4ca33e7d999263"
         },
         "hf_artifacts_docs": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/index.html",
           "exists": true,
-          "bytes": 91007,
-          "sha256": "4e1a1fd3d4b3de962adbd2e2b1b3c6fe5771c77a65a478035d4ca33e7d999263"
         }
       },
       "failures": []
@@ -1095,6 +1126,37 @@
         }
       },
       "failures": []
     }
   ],
   "failures": []

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-01T08:04:40+00:00",
   "hf_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish",
   "summary": {
+    "group_count": 36,
     "failure_count": 0,
     "failures_by_surface": {}
   },
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/artifact_index.json",
         "exists": true,
+        "bytes": 15675,
+        "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/artifact_index.json",
           "exists": true,
+          "bytes": 15675,
+          "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/artifact_index.json",
           "exists": true,
+          "bytes": 15675,
+          "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/artifact_index.json",
           "exists": true,
+          "bytes": 15675,
+          "sha256": "d929367afd699e719d223c6c0edbedc090040cc64633782e187706617fdaaaa0"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/evidence_contract.json",
         "exists": true,
+        "bytes": 8483,
+        "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/evidence_contract.json",
           "exists": true,
+          "bytes": 8483,
+          "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/evidence_contract.json",
           "exists": true,
+          "bytes": 8483,
+          "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/evidence_contract.json",
           "exists": true,
+          "bytes": 8483,
+          "sha256": "3d6035195dd3db9b2adaa074bd9e824c498f791fb2c735a907fd0b95d5490c2e"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/project_manifest.json",
         "exists": true,
+        "bytes": 3411,
+        "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/project_manifest.json",
           "exists": true,
+          "bytes": 3411,
+          "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/project_manifest.json",
           "exists": true,
+          "bytes": 3411,
+          "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/project_manifest.json",
           "exists": true,
+          "bytes": 3411,
+          "sha256": "99e0d386e088e7c532a318c33e4519da9b77d8d7c300c123c5aa7f866cd3c6b4"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/publication_audit.json",
         "exists": true,
+        "bytes": 5624,
+        "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/publication_audit.json",
           "exists": true,
+          "bytes": 5624,
+          "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/publication_audit.json",
           "exists": true,
+          "bytes": 5624,
+          "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/publication_audit.json",
           "exists": true,
+          "bytes": 5624,
+          "sha256": "0b7db27a09446d851787fd59b6f552b720a04f38a604e22d2951c10041e0cdd8"
         }
       },
       "failures": []
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/quality_gates.json",
         "exists": true,
         "bytes": 4228,
+        "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/quality_gates.json",
           "exists": true,
           "bytes": 4228,
+          "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/quality_gates.json",
           "exists": true,
           "bytes": 4228,
+          "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/quality_gates.json",
           "exists": true,
           "bytes": 4228,
+          "sha256": "42cd50ceb83503bbda33245dd1442ae956a9002f9f0d8a729b3b7d068217b836"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/reviewer_packet.json",
         "exists": true,
+        "bytes": 5044,
+        "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/reviewer_packet.json",
           "exists": true,
+          "bytes": 5044,
+          "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/reviewer_packet.json",
           "exists": true,
+          "bytes": 5044,
+          "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/reviewer_packet.json",
           "exists": true,
+          "bytes": 5044,
+          "sha256": "9b99e1828b74ba2cf99f281925ee6c113d0c73d9e06e700e924513e391c83cd8"
         }
       },
       "failures": []
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/scope_claims_audit.json",
         "exists": true,
         "bytes": 19964,
+        "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/scope_claims_audit.json",
           "exists": true,
           "bytes": 19964,
+          "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/scope_claims_audit.json",
           "exists": true,
           "bytes": 19964,
+          "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/scope_claims_audit.json",
           "exists": true,
           "bytes": 19964,
+          "sha256": "9f094a164b423aa9e51b90549ec0c1bc73a10dcf52c4f89d7144c9b15db53682"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/website_integrity.json",
         "exists": true,
+        "bytes": 6286,
+        "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/website_integrity.json",
           "exists": true,
+          "bytes": 6286,
+          "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
         },
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/website_integrity.json",
           "exists": true,
+          "bytes": 6286,
+          "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/website_integrity.json",
           "exists": true,
+          "bytes": 6286,
+          "sha256": "aced09460f6ae6af5fd1962b5e028ff15b12e79f44907c772afbe53ca324d661"
+        }
+      },
+      "failures": []
+    },
+    {
+      "name": "data/xperience10m_dataset_card_alignment.json",
+      "status": "pass",
+      "local": {
+        "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/data/xperience10m_dataset_card_alignment.json",
+        "exists": true,
+        "bytes": 5103,
+        "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
+      },
+      "mirrors": {
+        "hf_space": {
+          "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/data/xperience10m_dataset_card_alignment.json",
+          "exists": true,
+          "bytes": 5103,
+          "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
+        },
+        "hf_artifacts": {
+          "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/data/xperience10m_dataset_card_alignment.json",
+          "exists": true,
+          "bytes": 5103,
+          "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
+        },
+        "hf_model": {
+          "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/metrics/xperience10m_dataset_card_alignment.json",
+          "exists": true,
+          "bytes": 5103,
+          "sha256": "157f8616cb6cb45ad4d72bc371d4d68c60a990340e4257a4d7e874c577d44f24"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/build_artifact_index.py",
         "exists": true,
+        "bytes": 13641,
+        "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/build_artifact_index.py",
           "exists": true,
+          "bytes": 13641,
+          "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/build_artifact_index.py",
           "exists": true,
+          "bytes": 13641,
+          "sha256": "3d0a88e0c2212913699c13362027eb97a0cd84789a47e72b41d43cf3d2d6545b"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/verify_live_publication.py",
         "exists": true,
+        "bytes": 11753,
+        "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/verify_live_publication.py",
           "exists": true,
+          "bytes": 11753,
+          "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/verify_live_publication.py",
           "exists": true,
+          "bytes": 11753,
+          "sha256": "297364d079c1eea4e790fd7b2f8ae42ddd7d93aa28d7a2362806729789813626"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_mirror_parity.py",
         "exists": true,
+        "bytes": 8517,
+        "sha256": "990b2d29ae7623a0f184c9fba8560604aef0d6311617a54cb0de94bd4fd48305"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_mirror_parity.py",
           "exists": true,
+          "bytes": 8517,
+          "sha256": "990b2d29ae7623a0f184c9fba8560604aef0d6311617a54cb0de94bd4fd48305"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_mirror_parity.py",
           "exists": true,
+          "bytes": 8517,
+          "sha256": "990b2d29ae7623a0f184c9fba8560604aef0d6311617a54cb0de94bd4fd48305"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/scripts/validate_publication_package.py",
         "exists": true,
+        "bytes": 13018,
+        "sha256": "730bd76ebadf907045fb713f549dd132c80b292a4ca6bcf52dee1bac4748cbd6"
       },
       "mirrors": {
         "hf_artifacts": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/scripts/validate_publication_package.py",
           "exists": true,
+          "bytes": 13018,
+          "sha256": "730bd76ebadf907045fb713f549dd132c80b292a4ca6bcf52dee1bac4748cbd6"
         },
         "hf_model": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/scripts/validate_publication_package.py",
           "exists": true,
+          "bytes": 13018,
+          "sha256": "730bd76ebadf907045fb713f549dd132c80b292a4ca6bcf52dee1bac4748cbd6"
         }
       },
       "failures": []
       "local": {
         "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs/index.html",
         "exists": true,
+        "bytes": 94421,
+        "sha256": "a8ff86e5b6f5898ffa807255986eb430d238da6a3bdbc1915c438a1d38d9dc82"
       },
       "mirrors": {
         "hf_space": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/index.html",
           "exists": true,
+          "bytes": 94421,
+          "sha256": "a8ff86e5b6f5898ffa807255986eb430d238da6a3bdbc1915c438a1d38d9dc82"
         },
         "hf_artifacts_docs": {
           "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/docs/index.html",
           "exists": true,
+          "bytes": 94421,
+          "sha256": "a8ff86e5b6f5898ffa807255986eb430d238da6a3bdbc1915c438a1d38d9dc82"
         }
       },
       "failures": []
         }
       },
       "failures": []
+    },
+    {
+      "name": "docs/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+      "status": "pass",
+      "local": {
+        "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+        "exists": true,
+        "bytes": 7654,
+        "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
+      },
+      "mirrors": {
+        "hf_space": {
+          "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+          "exists": true,
+          "bytes": 7654,
+          "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
+        },
+        "hf_artifacts": {
+          "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+          "exists": true,
+          "bytes": 7654,
+          "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
+        },
+        "hf_model": {
+          "path": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+          "exists": true,
+          "bytes": 7654,
+          "sha256": "0866357a6d9922c961dc89a872b88b9517a37adf7b8130bfeb64b471045d01da"
+        }
+      },
+      "failures": []
     }
   ],
   "failures": []

metrics/project_manifest.json CHANGED Viewed

@@ -30,8 +30,18 @@
     "xperience10m_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
     "xperience10m_sample_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample"
   },
   "evidence_files": {
     "artifact_guide": "ARTIFACT_GUIDE.md",
     "reproducibility_contract": "REPRODUCIBILITY.md",
     "reproducibility_matrix": "docs/data/reproducibility_matrix.json",
     "evidence_contract": "docs/data/evidence_contract.json",

     "xperience10m_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
     "xperience10m_sample_hf": "https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample"
   },
+  "upstream_dataset_card_alignment": {
+    "source_repo": "ropedia-ai/xperience-10m",
+    "source_url": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
+    "observed_last_modified": "2026-04-21T05:03:45.000Z",
+    "observed_access": "manual gated access for approved non-commercial use",
+    "alignment_doc": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+    "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json"
+  },
   "evidence_files": {
     "artifact_guide": "ARTIFACT_GUIDE.md",
+    "official_dataset_card_alignment": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+    "official_dataset_card_alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
     "reproducibility_contract": "REPRODUCIBILITY.md",
     "reproducibility_matrix": "docs/data/reproducibility_matrix.json",
     "evidence_contract": "docs/data/evidence_contract.json",

metrics/publication_audit.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-01T07:33:21+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
@@ -45,6 +45,7 @@
     "codemeta.json": true,
     "ARTIFACT_GUIDE.md": true,
     "QUALITY_GATES.md": true,
     "REPRODUCIBILITY.md": true,
     "EVIDENCE_CONTRACT.md": true,
     "DATA_NOTICE.md": true,
@@ -59,6 +60,7 @@
     "docs/data/quality_gates.json": true,
     "docs/data/project_manifest.json": true,
     "docs/data/reviewer_packet.json": true,
     "docs/data/reproducibility_matrix.json": true,
     "docs/data/modality_atlas.json": true,
     "docs/data/mirror_parity.json": true,
@@ -95,7 +97,7 @@
       "surface": "github_repo",
       "path": "README.md",
       "exists": true,
-      "required_marker_count": 3,
       "missing_markers": [],
       "status": "pass"
     },
@@ -103,7 +105,7 @@
       "surface": "hf_space_bundle",
       "path": "README.md",
       "exists": true,
-      "required_marker_count": 4,
       "missing_markers": [],
       "status": "pass"
     },
@@ -111,7 +113,7 @@
       "surface": "hf_artifact_bundle",
       "path": "README.md",
       "exists": true,
-      "required_marker_count": 3,
       "missing_markers": [],
       "status": "pass"
     },
@@ -119,7 +121,7 @@
       "surface": "hf_artifact_bundle",
       "path": "PROJECT_README.md",
       "exists": true,
-      "required_marker_count": 3,
       "missing_markers": [],
       "status": "pass"
     },
@@ -127,7 +129,7 @@
       "surface": "hf_model_bundle",
       "path": "README.md",
       "exists": true,
-      "required_marker_count": 4,
       "missing_markers": [],
       "status": "pass"
     }
@@ -136,8 +138,8 @@
     "github_repo": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy",
       "exists": true,
-      "file_count": 291,
-      "text_file_count": 236,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 52601010
@@ -147,8 +149,8 @@
     "hf_space_bundle": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space",
       "exists": true,
-      "file_count": 54,
-      "text_file_count": 41,
       "largest_file": {
         "path": "assets/task_suite_infographic.png",
         "bytes": 2600527
@@ -158,8 +160,8 @@
     "hf_artifact_bundle": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts",
       "exists": true,
-      "file_count": 271,
-      "text_file_count": 229,
       "largest_file": {
         "path": "results/episode_task_suite/neural_mlp/temporal_order/model.pt",
         "bytes": 13406129
@@ -169,8 +171,8 @@
     "hf_model_bundle": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model",
       "exists": true,
-      "file_count": 203,
-      "text_file_count": 160,
       "largest_file": {
         "path": "artifacts/episode_task_suite/cross_modal_retrieval/model.npz",
         "bytes": 41310574

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-01T08:04:40+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
     "codemeta.json": true,
     "ARTIFACT_GUIDE.md": true,
     "QUALITY_GATES.md": true,
+    "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md": true,
     "REPRODUCIBILITY.md": true,
     "EVIDENCE_CONTRACT.md": true,
     "DATA_NOTICE.md": true,
     "docs/data/quality_gates.json": true,
     "docs/data/project_manifest.json": true,
     "docs/data/reviewer_packet.json": true,
+    "docs/data/xperience10m_dataset_card_alignment.json": true,
     "docs/data/reproducibility_matrix.json": true,
     "docs/data/modality_atlas.json": true,
     "docs/data/mirror_parity.json": true,
       "surface": "github_repo",
       "path": "README.md",
       "exists": true,
+      "required_marker_count": 4,
       "missing_markers": [],
       "status": "pass"
     },
       "surface": "hf_space_bundle",
       "path": "README.md",
       "exists": true,
+      "required_marker_count": 5,
       "missing_markers": [],
       "status": "pass"
     },
       "surface": "hf_artifact_bundle",
       "path": "README.md",
       "exists": true,
+      "required_marker_count": 4,
       "missing_markers": [],
       "status": "pass"
     },
       "surface": "hf_artifact_bundle",
       "path": "PROJECT_README.md",
       "exists": true,
+      "required_marker_count": 4,
       "missing_markers": [],
       "status": "pass"
     },
       "surface": "hf_model_bundle",
       "path": "README.md",
       "exists": true,
+      "required_marker_count": 5,
       "missing_markers": [],
       "status": "pass"
     }
     "github_repo": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy",
       "exists": true,
+      "file_count": 293,
+      "text_file_count": 238,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 52601010
     "hf_space_bundle": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/space",
       "exists": true,
+      "file_count": 56,
+      "text_file_count": 43,
       "largest_file": {
         "path": "assets/task_suite_infographic.png",
         "bytes": 2600527
     "hf_artifact_bundle": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/artifacts",
       "exists": true,
+      "file_count": 273,
+      "text_file_count": 231,
       "largest_file": {
         "path": "results/episode_task_suite/neural_mlp/temporal_order/model.pt",
         "bytes": 13406129
     "hf_model_bundle": {
       "root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/hf_publish/model",
       "exists": true,
+      "file_count": 205,
+      "text_file_count": 162,
       "largest_file": {
         "path": "artifacts/episode_task_suite/cross_modal_retrieval/model.npz",
         "bytes": 41310574

metrics/quality_gates.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "title": "Ropedia Xperience-10M Publication Quality Gates",
   "status": "pass",
-  "generated_at_utc": "2026-06-01T06:49:16+00:00",
   "rule": "Do not present a release as current unless every automated gate passes, then verify live GitHub/HF mirrors after publishing.",
   "automated_gates": [
     {

 {
   "title": "Ropedia Xperience-10M Publication Quality Gates",
   "status": "pass",
+  "generated_at_utc": "2026-06-01T08:03:23+00:00",
   "rule": "Do not present a release as current unless every automated gate passes, then verify live GitHub/HF mirrors after publishing.",
   "automated_gates": [
     {

metrics/reviewer_packet.json CHANGED Viewed

@@ -21,8 +21,10 @@
       "primary_artifacts": [
         "EVIDENCE_CONTRACT.md",
         "ARTIFACT_GUIDE.md",
         "docs/data/evidence_contract.json",
         "docs/data/artifact_index.json",
         "docs/data/mirror_parity.json",
         "docs/data/publication_audit.json",
         "docs/data/scope_claims_audit.json",
@@ -32,6 +34,16 @@
     },
     {
       "step": 2,
       "question": "How can the public pipeline be reproduced?",
       "primary_artifacts": [
         "REPRODUCIBILITY.md",
@@ -41,7 +53,7 @@
       "readout": "The public sample pipeline has explicit commands, expected outputs, and a prior exact-match audit over the committed metrics."
     },
     {
-      "step": 3,
       "question": "What is inside one model input?",
       "primary_artifacts": [
         "results/episode_task_suite/windows.csv",
@@ -52,7 +64,7 @@
       "readout": "The current model input is an 8,378-dimensional aligned window vector with explicit feature-block boundaries, and the readable atlas shows each public-sample modality without raw data redistribution."
     },
     {
-      "step": 4,
       "question": "Do the task metrics have committed evidence?",
       "primary_artifacts": [
         "results/episode_task_suite/summary_report.json",
@@ -62,7 +74,7 @@
       "readout": "Each of the 12 tasks has minimal-head metrics and a matching neural MLP result over the same window contracts."
     },
     {
-      "step": 5,
       "question": "How should this scale beyond one episode?",
       "primary_artifacts": [
         "results/omni_finetune/DATA_BLOCKER_REPORT.md",

       "primary_artifacts": [
         "EVIDENCE_CONTRACT.md",
         "ARTIFACT_GUIDE.md",
+        "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
         "docs/data/evidence_contract.json",
         "docs/data/artifact_index.json",
+        "docs/data/xperience10m_dataset_card_alignment.json",
         "docs/data/mirror_parity.json",
         "docs/data/publication_audit.json",
         "docs/data/scope_claims_audit.json",
     },
     {
       "step": 2,
+      "question": "What does the official Xperience-10M dataset card say?",
+      "primary_artifacts": [
+        "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+        "docs/data/xperience10m_dataset_card_alignment.json",
+        "https://huggingface.co/datasets/ropedia-ai/xperience-10m"
+      ],
+      "readout": "The full upstream dataset is a manually gated large-scale 4D multimodal egocentric source; this repo validates only one public sample episode and records unsupported areas explicitly."
+    },
+    {
+      "step": 3,
       "question": "How can the public pipeline be reproduced?",
       "primary_artifacts": [
         "REPRODUCIBILITY.md",
       "readout": "The public sample pipeline has explicit commands, expected outputs, and a prior exact-match audit over the committed metrics."
     },
     {
+      "step": 4,
       "question": "What is inside one model input?",
       "primary_artifacts": [
         "results/episode_task_suite/windows.csv",
       "readout": "The current model input is an 8,378-dimensional aligned window vector with explicit feature-block boundaries, and the readable atlas shows each public-sample modality without raw data redistribution."
     },
     {
+      "step": 5,
       "question": "Do the task metrics have committed evidence?",
       "primary_artifacts": [
         "results/episode_task_suite/summary_report.json",
       "readout": "Each of the 12 tasks has minimal-head metrics and a matching neural MLP result over the same window contracts."
     },
     {
+      "step": 6,
       "question": "How should this scale beyond one episode?",
       "primary_artifacts": [
         "results/omni_finetune/DATA_BLOCKER_REPORT.md",

metrics/scope_claims_audit.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-01T07:19:55+00:00",
   "summary": {
     "qwen3_omni_32_episode_claim": false,
     "dataset_manifest_num_episodes": 1,

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-01T07:59:42+00:00",
   "summary": {
     "qwen3_omni_32_episode_claim": false,
     "dataset_manifest_num_episodes": 1,

metrics/website_integrity.json CHANGED Viewed

@@ -1,13 +1,13 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-01T07:19:57+00:00",
   "docs_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
     "html_pages": 2,
-    "local_references": 61,
-    "external_reference_count": 58,
-    "json_files": 16,
     "image_assets_referenced": 18,
     "failure_count": 0
   },
@@ -43,30 +43,30 @@
     },
     {
       "path": "index.html",
-      "id_count": 31,
-      "reference_count": 60,
       "image_count": 20
     }
   ],
   "json_files": [
     {
       "path": "data/artifact_index.json",
-      "bytes": 14655,
       "top_level_type": "dict"
     },
     {
       "path": "data/evidence_contract.json",
-      "bytes": 7954,
       "top_level_type": "dict"
     },
     {
       "path": "data/live_publication_status.json",
-      "bytes": 13893,
       "top_level_type": "dict"
     },
     {
       "path": "data/mirror_parity.json",
-      "bytes": 48916,
       "top_level_type": "dict"
     },
     {
@@ -76,7 +76,7 @@
     },
     {
       "path": "data/project_manifest.json",
-      "bytes": 2789,
       "top_level_type": "dict"
     },
     {
@@ -106,7 +106,7 @@
     },
     {
       "path": "data/reviewer_packet.json",
-      "bytes": 4406,
       "top_level_type": "dict"
     },
     {
@@ -128,6 +128,11 @@
       "path": "data/website_integrity.json",
       "bytes": 6159,
       "top_level_type": "dict"
     }
   ],
   "images": [

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-01T07:59:42+00:00",
   "docs_root": "/Users/chaoyue/Documents/Codex/2026-05-29/i-am-learning-this-dataset-https/working_repo_copy/docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
     "html_pages": 2,
+    "local_references": 67,
+    "external_reference_count": 63,
+    "json_files": 17,
     "image_assets_referenced": 18,
     "failure_count": 0
   },
     },
     {
       "path": "index.html",
+      "id_count": 32,
+      "reference_count": 66,
       "image_count": 20
     }
   ],
   "json_files": [
     {
       "path": "data/artifact_index.json",
+      "bytes": 14654,
       "top_level_type": "dict"
     },
     {
       "path": "data/evidence_contract.json",
+      "bytes": 8483,
       "top_level_type": "dict"
     },
     {
       "path": "data/live_publication_status.json",
+      "bytes": 9711,
       "top_level_type": "dict"
     },
     {
       "path": "data/mirror_parity.json",
+      "bytes": 48912,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/project_manifest.json",
+      "bytes": 3411,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/reviewer_packet.json",
+      "bytes": 5044,
       "top_level_type": "dict"
     },
     {
       "path": "data/website_integrity.json",
       "bytes": 6159,
       "top_level_type": "dict"
+    },
+    {
+      "path": "data/xperience10m_dataset_card_alignment.json",
+      "bytes": 5103,
+      "top_level_type": "dict"
     }
   ],
   "images": [

metrics/xperience10m_dataset_card_alignment.json ADDED Viewed

	@@ -0,0 +1,143 @@

+{
+  "title": "Xperience-10M Official Dataset Card Alignment",
+  "checked_at_utc": "2026-06-01T00:00:00+00:00",
+  "source_urls": {
+    "official_hf_dataset": "https://huggingface.co/datasets/ropedia-ai/xperience-10m",
+    "official_hf_api": "https://huggingface.co/api/datasets/ropedia-ai/xperience-10m",
+    "official_sample": "https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample",
+    "ropedia_dataset_site": "https://ropedia.com/dataset",
+    "ropedia_release_page": "https://ropedia.com/blog/20260316_xperience_10m",
+    "homie_toolkit": "https://github.com/Ropedia/HOMIE-toolkit"
+  },
+  "hf_repo_metadata_observed": {
+    "repo_id": "ropedia-ai/xperience-10m",
+    "last_modified": "2026-04-21T05:03:45.000Z",
+    "gated": "manual",
+    "task_categories": [
+      "video-classification",
+      "image-to-text",
+      "depth-estimation",
+      "robotics"
+    ],
+    "modalities": [
+      "3d",
+      "audio",
+      "video"
+    ],
+    "language": [
+      "en"
+    ],
+    "size_categories": [
+      "1M<n<10M"
+    ],
+    "license": "other",
+    "access_note": "Reviewed gated access for approved non-commercial use; an external agreement-signing step may be required before approval."
+  },
+  "official_dataset_summary": {
+    "description": "Large-scale egocentric multimodal human-experience data for embodied AI, robotics, world models, and spatial intelligence.",
+    "experience_units": "about 10 million",
+    "recording_hours": "about 10,000",
+    "storage_described_by_card": "about 1 PB"
+  },
+  "official_scale_statistics": {
+    "rgb_frames": "about 2.88 billion",
+    "depth_frames": "about 720 million",
+    "camera_pose_records": "about 576 million",
+    "motion_capture_frames": "about 576 million",
+    "imu_records": "about 7.2 billion",
+    "caption_sentences": "about 16 million",
+    "caption_words": "about 200 million",
+    "vocabulary_words": "about 6,000",
+    "object_annotations": "about 350,000",
+    "trajectory_distance": "about 39,000 km"
+  },
+  "official_modalities": [
+    "six RGB video streams: four fisheye views and two rectified stereo views",
+    "audio embedded in the video streams",
+    "stereo depth and confidence",
+    "camera pose, SLAM trajectory, and point cloud",
+    "two-hand motion capture",
+    "full-body motion capture",
+    "inertial accelerometer and gyroscope streams",
+    "hierarchical language and caption annotations",
+    "metadata and calibration records"
+  ],
+  "episode_layout": {
+    "folder_pattern": "<session_uuid>/ep<episode_id>/",
+    "required_for_valid_episode_in_this_repo": [
+      "annotation.hdf5"
+    ],
+    "preferred_for_full_omni_in_this_repo": [
+      "fisheye_cam0.mp4",
+      "fisheye_cam1.mp4",
+      "fisheye_cam2.mp4",
+      "fisheye_cam3.mp4",
+      "stereo_left.mp4",
+      "stereo_right.mp4"
+    ],
+    "optional_or_excluded": [
+      "visualization.rrd"
+    ]
+  },
+  "annotation_hdf5_groups": [
+    "calibration",
+    "slam / camera pose",
+    "depth",
+    "hand_mocap",
+    "full_body_mocap",
+    "imu",
+    "video timing",
+    "metadata",
+    "caption / language annotations"
+  ],
+  "official_intended_uses": [
+    "egocentric video and action understanding",
+    "task and subtask recognition",
+    "temporal action localization",
+    "human-object interaction analysis",
+    "object grounding and caption/language grounding",
+    "audio-visual learning and multimodal pretraining",
+    "embodied reasoning and world-model learning",
+    "robotics imitation learning",
+    "depth estimation, odometry, SLAM, and scene reconstruction",
+    "hand/body pose and human motion understanding",
+    "sensor fusion"
+  ],
+  "current_repo_alignment": {
+    "validated_episode_count": 1,
+    "validated_frames": 5821,
+    "validated_windows": 1161,
+    "current_feature_dim": 8378,
+    "raw_data_redistributed": false,
+    "audio_feature_status": "Audio is present in the sample MP4 streams and visualized, but not extracted into the current baseline feature vector.",
+    "implemented_task_count": 12,
+    "neural_head_count": 12,
+    "covered_by_current_tasks": [
+      "action/subtask recognition",
+      "next-action prediction",
+      "transition and temporal diagnostics",
+      "hand trajectory forecasting",
+      "contact prediction",
+      "object relevance",
+      "caption grounding",
+      "cross-modal retrieval",
+      "modality reconstruction",
+      "misalignment detection"
+    ],
+    "not_yet_claimed": [
+      "full audio-visual learning",
+      "caption generation",
+      "depth-pixel estimation",
+      "SLAM estimation",
+      "neural rendering",
+      "policy learning",
+      "cross-episode generalization",
+      "real 32-episode Qwen3-Omni model quality"
+    ]
+  },
+  "responsible_use_boundary": [
+    "No raw MP4, raw annotation.hdf5, private gated data, raw visualization.rrd, or full Qwen weights are redistributed.",
+    "The project does not support identity recognition, re-identification, biometric profiling, surveillance, sensitive attribute inference, or safety-critical deployment.",
+    "Dataset use remains governed by the official Ropedia/Xperience-10M terms."
+  ]
+}

scripts/build_artifact_index.py CHANGED Viewed

@@ -41,6 +41,22 @@ ARTIFACTS = [
         "surface": "repo_hf",
         "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
     },
     {
         "id": "quality_gates",
         "title": "Publication quality gates",

         "surface": "repo_hf",
         "proves": "Gives the human-readable map from proof boundary to data, tasks, platform mirrors, and scale-up status.",
     },
+    {
+        "id": "official_dataset_card_alignment",
+        "title": "Official Xperience-10M dataset-card alignment",
+        "path": "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
+        "kind": "source_alignment",
+        "surface": "repo_hf",
+        "proves": "Aligns public dataset wording with the official gated Xperience-10M dataset card and records unsupported areas.",
+    },
+    {
+        "id": "official_dataset_card_alignment_json",
+        "title": "Official Xperience-10M dataset-card alignment JSON",
+        "path": "docs/data/xperience10m_dataset_card_alignment.json",
+        "kind": "source_alignment",
+        "surface": "website_hf",
+        "proves": "Machine-readable upstream dataset-card alignment facts for website and HF mirrors.",
+    },
     {
         "id": "quality_gates",
         "title": "Publication quality gates",

scripts/validate_mirror_parity.py CHANGED Viewed

@@ -35,6 +35,7 @@ DATA_FILES = [
     "summary_metrics.json",
     "task_walkthroughs.json",
     "website_integrity.json",
 ]
 ASSET_FILES = [
@@ -66,6 +67,7 @@ WEBSITE_FILES = [
 DOC_FILES = [
     "QUALITY_GATES.md",
 ]

     "summary_metrics.json",
     "task_walkthroughs.json",
     "website_integrity.json",
+    "xperience10m_dataset_card_alignment.json",
 ]
 ASSET_FILES = [
 DOC_FILES = [
     "QUALITY_GATES.md",
+    "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
 ]

scripts/validate_publication_package.py CHANGED Viewed

@@ -53,6 +53,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
             "all 12 task families before the",
             "Public-sample modality thumbnails remain enlarged below",
         ],
@@ -62,6 +63,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
             "task-first 12-task infographic",
             "native responsive modality atlas",
             "website HTML",
@@ -72,6 +74,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
             "task-first 12-task map",
             "including critical website HTML",
         ],
@@ -81,6 +84,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
         "relative_path": "PROJECT_README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
             "all 12 task families before the",
             "Public-sample modality thumbnails remain enlarged below",
         ],
@@ -90,6 +94,7 @@ CARD_FRESHNESS_EXPECTATIONS = [
         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
             "task-first 12-head",
             "responsive modality atlas",
             "website HTML",
@@ -194,6 +199,7 @@ def required_assets(root: Path) -> dict[str, bool]:
         "codemeta.json",
         "ARTIFACT_GUIDE.md",
         "QUALITY_GATES.md",
         "REPRODUCIBILITY.md",
         "EVIDENCE_CONTRACT.md",
         "DATA_NOTICE.md",
@@ -208,6 +214,7 @@ def required_assets(root: Path) -> dict[str, bool]:
         "docs/data/quality_gates.json",
         "docs/data/project_manifest.json",
         "docs/data/reviewer_packet.json",
         "docs/data/reproducibility_matrix.json",
         "docs/data/modality_atlas.json",
         "docs/data/mirror_parity.json",

         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
+            "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
             "all 12 task families before the",
             "Public-sample modality thumbnails remain enlarged below",
         ],
         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
+            "xperience10m_dataset_card_alignment.json",
             "task-first 12-task infographic",
             "native responsive modality atlas",
             "website HTML",
         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
+            "xperience10m_dataset_card_alignment.json",
             "task-first 12-task map",
             "including critical website HTML",
         ],
         "relative_path": "PROJECT_README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
+            "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
             "all 12 task families before the",
             "Public-sample modality thumbnails remain enlarged below",
         ],
         "relative_path": "README.md",
         "required": [
             "xperience10m-taskfirst-v12-modality-xl",
+            "xperience10m_dataset_card_alignment.json",
             "task-first 12-head",
             "responsive modality atlas",
             "website HTML",
         "codemeta.json",
         "ARTIFACT_GUIDE.md",
         "QUALITY_GATES.md",
+        "XPERIENCE10M_DATASET_CARD_ALIGNMENT.md",
         "REPRODUCIBILITY.md",
         "EVIDENCE_CONTRACT.md",
         "DATA_NOTICE.md",
         "docs/data/quality_gates.json",
         "docs/data/project_manifest.json",
         "docs/data/reviewer_packet.json",
+        "docs/data/xperience10m_dataset_card_alignment.json",
         "docs/data/reproducibility_matrix.json",
         "docs/data/modality_atlas.json",
         "docs/data/mirror_parity.json",

scripts/verify_live_publication.py CHANGED Viewed

@@ -47,6 +47,17 @@ HASH_GROUPS = [
             "hf_model": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/resolve/main/metrics/quality_gates.json",
         },
     },
     {
         "id": "quality_gates_markdown",
         "title": "Quality-gate Markdown",
@@ -69,6 +80,7 @@ MARKER_CHECKS = [
         "required": [
             "Release gates are explicit",
             "quality_gates.json",
             "xperience10m-taskfirst-v12-modality-xl",
         ],
         "forbidden": [
@@ -83,6 +95,7 @@ MARKER_CHECKS = [
         "required": [
             "Release gates are explicit",
             "quality_gates.json",
             "xperience10m-taskfirst-v12-modality-xl",
         ],
         "forbidden": [
@@ -94,14 +107,22 @@ MARKER_CHECKS = [
         "id": "hf_artifacts_card_current",
         "title": "HF artifact card links quality gates",
         "url": "https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts/raw/main/README.md",
-        "required": ["QUALITY_GATES.md", "docs/data/quality_gates.json"],
         "forbidden": ["xperience10m-" + "taskfirst-v10"],
     },
     {
         "id": "hf_model_card_current",
         "title": "HF model card links quality gates",
         "url": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/raw/main/README.md",
-        "required": ["QUALITY_GATES.md", "metrics/quality_gates.json"],
         "forbidden": ["xperience10m-" + "taskfirst-v10"],
     },
 ]

             "hf_model": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/resolve/main/metrics/quality_gates.json",
         },
     },
+    {
+        "id": "xperience10m_dataset_card_alignment_json",
+        "title": "Official Xperience-10M dataset-card alignment JSON",
+        "local_path": "docs/data/xperience10m_dataset_card_alignment.json",
+        "urls": {
+            "github_pages": "https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/data/xperience10m_dataset_card_alignment.json",
+            "hf_space": "https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite/raw/main/data/xperience10m_dataset_card_alignment.json",
+            "hf_artifacts": "https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts/resolve/main/docs/data/xperience10m_dataset_card_alignment.json",
+            "hf_model": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/resolve/main/metrics/xperience10m_dataset_card_alignment.json",
+        },
+    },
     {
         "id": "quality_gates_markdown",
         "title": "Quality-gate Markdown",
         "required": [
             "Release gates are explicit",
             "quality_gates.json",
+            "xperience10m_dataset_card_alignment.json",
             "xperience10m-taskfirst-v12-modality-xl",
         ],
         "forbidden": [
         "required": [
             "Release gates are explicit",
             "quality_gates.json",
+            "xperience10m_dataset_card_alignment.json",
             "xperience10m-taskfirst-v12-modality-xl",
         ],
         "forbidden": [
         "id": "hf_artifacts_card_current",
         "title": "HF artifact card links quality gates",
         "url": "https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts/raw/main/README.md",
+        "required": [
+            "QUALITY_GATES.md",
+            "docs/data/quality_gates.json",
+            "xperience10m_dataset_card_alignment.json",
+        ],
         "forbidden": ["xperience10m-" + "taskfirst-v10"],
     },
     {
         "id": "hf_model_card_current",
         "title": "HF model card links quality gates",
         "url": "https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines/raw/main/README.md",
+        "required": [
+            "QUALITY_GATES.md",
+            "metrics/quality_gates.json",
+            "xperience10m_dataset_card_alignment.json",
+        ],
         "forbidden": ["xperience10m-" + "taskfirst-v10"],
     },
 ]