Reproducibility Contract

This file defines what can be reproduced from the public repo and the official Xperience-10M sample, what each command should produce, and which claims remain outside the current public data boundary.

Scope

Layer	Reproducible now	Boundary
Sample download	Yes, from `ropedia-ai/xperience-10m-sample` or ModelScope sample mirror	Raw data is not redistributed in this repo.
Minimal baselines	Yes	One public sample episode, chronological split.
12-task suite	Yes	Uses the current 8,378-d feature contract; audio is documented but not featurized.
Neural MLP heads	Yes, when `torch` is installed	Compact task heads only, not a foundation model.
Website figures and charts	Yes	Generated from committed metrics and sample thumbnails.
Publication audit	Yes	Checks public repo and prepared HF bundles.
32-episode Qwen3-Omni LoRA pilot	Not yet	Gated by full Xperience-10M access and held-out-episode evaluation.

Environment

Use Python 3.12 when possible. The current public scripts depend on the HOMIE toolkit environment plus lightweight plotting and Hub tooling.

git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch

Data

Download the public sample from Hugging Face:

hf download ropedia-ai/xperience-10m-sample \
  --repo-type dataset \
  --local-dir data/sample/xperience-10m-sample

On mainland-China servers, use the included ModelScope helper:

python scripts/omni/download_sample_modelscope.py \
  --output-dir data/sample/xperience-10m-sample \
  --mode all-training

--mode all-training downloads annotation.hdf5 and the six MP4 streams while skipping visualization.rrd.

Core Commands

Run these from the repo root after setting WORKSPACE to the folder that owns data/sample/xperience-10m-sample.

export WORKSPACE=/path/to/workspace

python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"

python scripts/episode_task_suite.py \
  --workspace "$WORKSPACE" \
  --include-neural

python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/task_walkthroughs.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/validate_website_integrity.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py

Expected Public Outputs

Command group	Expected artifacts
Minimal baselines	`artifacts/min_action_model/`, `artifacts/min_all_modalities_action_model/`, metrics and model weights
12-task suite	`artifacts/episode_task_suite/summary_report.json`, per-task `metrics.json`, predictions, confusion matrices
Neural heads	`artifacts/episode_task_suite/neural_mlp/**/metrics.json`, histories, model checkpoints
Research directions	`artifacts/episode_task_suite/research_directions/`, `metrics/research_directions.json`
Direction probes	`artifacts/episode_task_suite/research_direction_extensions/`, `metrics/research_direction_extensions.json`
Walkthroughs	`artifacts/episode_task_suite/task_walkthroughs/`, `metrics/task_walkthroughs.json`
Figures	`assets/.png`, `assets/charts/.svg`
Modality atlas	`metrics/modality_atlas.json`, `assets/modalities/*`
Website integrity	`metrics/website_integrity.json`
Publication checks	`metrics/artifact_index.json`, `metrics/mirror_parity.json`, `metrics/publication_audit.json`, `metrics/scope_claims_audit.json`

Exact-Match Audit

The last full metric reproducibility audit was run on 2026-05-30 Asia/Singapore from a fresh output directory outside the repo. It rebuilt the minimal baselines, all-modality baselines, and the 12-task suite from the local public sample. The regenerated metrics matched the committed artifacts after float normalization.

Evidence:

Non-Reproducible From This Public Repo Alone

The following require gated data, large model weights, or private compute state, so this repo does not claim they are publicly reproducible yet:

a real 32-episode Qwen3-Omni LoRA run,
held-out episode metrics for Qwen3-Omni,
full Xperience-10M-scale pretraining,
raw Xperience-10M video or annotation redistribution,
full Qwen weights or large full checkpoints.

Before interpreting any Qwen3-Omni result, read metrics/scope_claims_audit.json, plus the companion GitHub repo's results/omni_finetune/DATA_BLOCKER_REPORT.md and results/omni_finetune/A100_HF_RELAY_STATUS.md.