ropedia-xperience-10m-task-baselines / notes /reproducibility_audit.md
cy0307's picture
Publish Ropedia Xperience-10M task baseline cards
c9dfd11 verified
|
Raw
History Blame
3.84 kB

Reproducibility Audit

Audit date: 2026-05-30 Asia/Singapore.

Purpose: verify that the committed Ropedia Xperience-10M Task Suite artifacts are real outputs from the scripts, not placeholder or fabricated metrics.

Raw Inputs Checked

The audit used the local public sample episode:

data/sample/xperience-10m-sample/
  annotation.hdf5
  fisheye_cam0.mp4
  fisheye_cam1.mp4
  fisheye_cam2.mp4
  fisheye_cam3.mp4
  stereo_left.mp4
  stereo_right.mp4

annotation.hdf5 contains 5,821 aligned frames with depth, hand mocap, body mocap, IMU, SLAM, calibration, and caption metadata. The video feature cache was rebuilt from all six video files during the audit.

Commands Re-run

All audit outputs were written outside the repo:

AUDIT=/private/tmp/ropedia-audit
WORKSPACE=/path/to/Ropedia
ANN=$WORKSPACE/data/sample/xperience-10m-sample/annotation.hdf5
PY=$WORKSPACE/.venv/bin/python

$PY -B scripts/train_min_action_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $AUDIT/min_action_model \
  --target action

$PY -B scripts/train_min_action_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $AUDIT/min_subtask_model \
  --target subtask

$PY -B scripts/train_all_modalities_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $AUDIT/min_all_modalities_action_model \
  --cache-dir $AUDIT/cache \
  --target action

$PY -B scripts/train_all_modalities_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $AUDIT/min_all_modalities_subtask_model \
  --cache-dir $AUDIT/cache \
  --target subtask

$PY -B scripts/episode_task_suite.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $AUDIT/episode_task_suite \
  --cache-dir $AUDIT/cache

Exact Match Checks

The regenerated files matched the committed files:

min_action_model/metrics.json: MATCH
min_subtask_model/metrics.json: MATCH
min_all_modalities_action_model/metrics.json: MATCH
min_all_modalities_subtask_model/metrics.json: MATCH
episode_task_suite/summary_report.json: MATCH
episode_task_suite/feature_manifest.json: MATCH
episode_task_suite/available_modalities.json: MATCH

Every per-task metrics.json also matched:

caption_grounding/metrics.json: MATCH
contact_prediction/metrics.json: MATCH
cross_modal_retrieval/metrics.json: MATCH
hand_trajectory_forecast/metrics.json: MATCH
misalignment_detection/metrics.json: MATCH
modality_reconstruction/metrics.json: MATCH
next_action/metrics.json: MATCH
object_relevance/metrics.json: MATCH
temporal_order/metrics.json: MATCH
timeline_action/metrics.json: MATCH
timeline_subtask/metrics.json: MATCH
transition_detection/metrics.json: MATCH

Fresh Cache Evidence

The all-modality audit rebuilt a fresh feature cache:

depth_n5821_grid8.npz: shape=(5821, 140), nonzero=809107
video_fisheye_cam0_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570458
video_fisheye_cam1_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570400
video_fisheye_cam2_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570458
video_fisheye_cam3_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=568723
video_stereo_left_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570249
video_stereo_right_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570430

This confirms the committed metrics are reproducible from the raw sample and that the all-modality pipeline reads real depth/video files instead of using empty placeholder features.

Caveats

The scripts contain a zero-feature fallback if a video file is missing. That is not the path used in this audit: all six videos existed and produced nonzero features. The repo remains a single-episode learning and pipeline-validation project, not evidence of cross-episode generalization.