Publish Ropedia Xperience-10M task baseline cards

540e67a verified 25 days ago

3.85 kB

	# Reproduction Record

	Run date: 2026-05-30 Asia/Singapore.

	Purpose: show that the committed Ropedia Xperience-10M Task Suite artifacts are
	real outputs from the scripts and can be reproduced from the public sample.

	## Raw Inputs Checked

	The run used the local public sample episode:

	```text
	data/sample/xperience-10m-sample/
	annotation.hdf5
	fisheye_cam0.mp4
	fisheye_cam1.mp4
	fisheye_cam2.mp4
	fisheye_cam3.mp4
	stereo_left.mp4
	stereo_right.mp4
	```

	`annotation.hdf5` contains 5,821 aligned frames with depth, hand mocap, body
	mocap, IMU, SLAM, calibration, and caption metadata. The video feature cache was
	rebuilt from all six video files during the run.

	## Commands Re-run

	All reproduction outputs were written outside the repo:

	```bash
	REPRO=/path/to/ignored-scratch-workspace
	WORKSPACE=/path/to/Ropedia
	ANN=$WORKSPACE/data/sample/xperience-10m-sample/annotation.hdf5
	PY=$WORKSPACE/.venv/bin/python

	$PY -B scripts/train_min_action_model.py \
	--workspace $WORKSPACE \
	--annotation $ANN \
	--output-dir $REPRO/min_action_model \
	--target action

	$PY -B scripts/train_min_action_model.py \
	--workspace $WORKSPACE \
	--annotation $ANN \
	--output-dir $REPRO/min_subtask_model \
	--target subtask

	$PY -B scripts/train_all_modalities_model.py \
	--workspace $WORKSPACE \
	--annotation $ANN \
	--output-dir $REPRO/min_all_modalities_action_model \
	--cache-dir $REPRO/cache \
	--target action

	$PY -B scripts/train_all_modalities_model.py \
	--workspace $WORKSPACE \
	--annotation $ANN \
	--output-dir $REPRO/min_all_modalities_subtask_model \
	--cache-dir $REPRO/cache \
	--target subtask

	$PY -B scripts/episode_task_suite.py \
	--workspace $WORKSPACE \
	--annotation $ANN \
	--output-dir $REPRO/episode_task_suite \
	--cache-dir $REPRO/cache
	```

	## Exact Match Checks

	The regenerated files matched the committed files:

	```text
	min_action_model/metrics.json: MATCH
	min_subtask_model/metrics.json: MATCH
	min_all_modalities_action_model/metrics.json: MATCH
	min_all_modalities_subtask_model/metrics.json: MATCH
	episode_task_suite/summary_report.json: MATCH
	episode_task_suite/feature_manifest.json: MATCH
	episode_task_suite/available_modalities.json: MATCH
	```

	Every per-task `metrics.json` also matched:

	```text
	caption_grounding/metrics.json: MATCH
	contact_prediction/metrics.json: MATCH
	cross_modal_retrieval/metrics.json: MATCH
	hand_trajectory_forecast/metrics.json: MATCH
	misalignment_detection/metrics.json: MATCH
	modality_reconstruction/metrics.json: MATCH
	next_action/metrics.json: MATCH
	object_relevance/metrics.json: MATCH
	temporal_order/metrics.json: MATCH
	timeline_action/metrics.json: MATCH
	timeline_subtask/metrics.json: MATCH
	transition_detection/metrics.json: MATCH
	```

	## Fresh Cache Evidence

	The all-modality run rebuilt a fresh feature cache:

	```text
	depth_n5821_grid8.npz: shape=(5821, 140), nonzero=809107
	video_fisheye_cam0_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570458
	video_fisheye_cam1_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570400
	video_fisheye_cam2_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570458
	video_fisheye_cam3_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=568723
	video_stereo_left_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570249
	video_stereo_right_n5821_img32_grid8_hist8.npz: shape=(5821, 98), nonzero=570430
	```

	This confirms the committed metrics are reproducible from the raw sample and
	that the all-modality pipeline reads real depth/video files instead of using
	empty placeholder features.

	## Caveats

	The scripts contain a zero-feature fallback if a video file is missing. That is
	not the path used in this run: all six videos existed and produced nonzero
	features. The repo remains a single-episode learning and pipeline-validation
	project, not evidence of cross-episode generalization.