ropedia-xperience-10m-task-baselines / notes /reproducibility_audit.md
cy0307's picture
Publish Ropedia Xperience-10M task baseline cards
540e67a verified
|
Raw
History Blame
3.85 kB

Reproduction Record

Run date: 2026-05-30 Asia/Singapore.

Purpose: show that the committed Ropedia Xperience-10M Task Suite artifacts are real outputs from the scripts and can be reproduced from the public sample.

Raw Inputs Checked

The run used the local public sample episode:

data/sample/xperience-10m-sample/
  annotation.hdf5
  fisheye_cam0.mp4
  fisheye_cam1.mp4
  fisheye_cam2.mp4
  fisheye_cam3.mp4
  stereo_left.mp4
  stereo_right.mp4

annotation.hdf5 contains 5,821 aligned frames with depth, hand mocap, body mocap, IMU, SLAM, calibration, and caption metadata. The video feature cache was rebuilt from all six video files during the run.

Commands Re-run

All reproduction outputs were written outside the repo:

REPRO=/path/to/ignored-scratch-workspace
WORKSPACE=/path/to/Ropedia
ANN=$WORKSPACE/data/sample/xperience-10m-sample/annotation.hdf5
PY=$WORKSPACE/.venv/bin/python

$PY -B scripts/train_min_action_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $REPRO/min_action_model \
  --target action

$PY -B scripts/train_min_action_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $REPRO/min_subtask_model \
  --target subtask

$PY -B scripts/train_all_modalities_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $REPRO/min_all_modalities_action_model \
  --cache-dir $REPRO/cache \
  --target action

$PY -B scripts/train_all_modalities_model.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $REPRO/min_all_modalities_subtask_model \
  --cache-dir $REPRO/cache \
  --target subtask

$PY -B scripts/episode_task_suite.py \
  --workspace $WORKSPACE \
  --annotation $ANN \
  --output-dir $REPRO/episode_task_suite \
  --cache-dir $REPRO/cache

Exact Match Checks

The regenerated files matched the committed files:

min_action_model/metrics.json: MATCH
min_subtask_model/metrics.json: MATCH
min_all_modalities_action_model/metrics.json: MATCH
min_all_modalities_subtask_model/metrics.json: MATCH
episode_task_suite/summary_report.json: MATCH
episode_task_suite/feature_manifest.json: MATCH
episode_task_suite/available_modalities.json: MATCH

Every per-task metrics.json also matched:

caption_grounding/metrics.json: MATCH
contact_prediction/metrics.json: MATCH
cross_modal_retrieval/metrics.json: MATCH
hand_trajectory_forecast/metrics.json: MATCH
misalignment_detection/metrics.json: MATCH
modality_reconstruction/metrics.json: MATCH
next_action/metrics.json: MATCH
object_relevance/metrics.json: MATCH
temporal_order/metrics.json: MATCH
timeline_action/metrics.json: MATCH
timeline_subtask/metrics.json: MATCH
transition_detection/metrics.json: MATCH

Fresh Cache Evidence

The all-modality run rebuilt a fresh feature cache:

depth_n5821_grid8.npz: shape=(5821, 140), nonzero=809107
video_fisheye_cam0_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570458
video_fisheye_cam1_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570400
video_fisheye_cam2_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570458
video_fisheye_cam3_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=568723
video_stereo_left_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570249
video_stereo_right_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570430

This confirms the committed metrics are reproducible from the raw sample and that the all-modality pipeline reads real depth/video files instead of using empty placeholder features.

Caveats

The scripts contain a zero-feature fallback if a video file is missing. That is not the path used in this run: all six videos existed and produced nonzero features. The repo remains a single-episode learning and pipeline-validation project, not evidence of cross-episode generalization.