File size: 5,193 Bytes

# Reproducibility Contract

This file defines what can be reproduced from the public repo and the official
Xperience-10M sample, what each command should produce, and which claims remain
outside the current public data boundary.

## Scope

| Layer | Reproducible now | Boundary |
| --- | --- | --- |
| Sample download | Yes, from `ropedia-ai/xperience-10m-sample` or ModelScope sample mirror | Raw data is not redistributed in this repo. |
| Minimal baselines | Yes | One public sample episode, chronological split. |
| 12-task suite | Yes | Uses the current 8,378-d feature contract; audio is documented but not featurized. |
| Neural MLP heads | Yes, when `torch` is installed | Compact task heads only, not a foundation model. |
| Website figures and charts | Yes | Generated from committed metrics and sample thumbnails. |
| Publication audit | Yes | Checks public repo and prepared HF bundles. |
| 32-episode Qwen3-Omni LoRA pilot | Not yet | Gated by full Xperience-10M access and held-out-episode evaluation. |

## Environment

Use Python 3.12 when possible. The current public scripts depend on the HOMIE
toolkit environment plus lightweight plotting and Hub tooling.

```bash
git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch
```

## Data

Download the public sample from Hugging Face:

```bash
hf download ropedia-ai/xperience-10m-sample \
  --repo-type dataset \
  --local-dir data/sample/xperience-10m-sample
```

On mainland-China servers, use the included ModelScope helper:

```bash
python scripts/omni/download_sample_modelscope.py \
  --output-dir data/sample/xperience-10m-sample \
  --mode all-training
```

`--mode all-training` downloads `annotation.hdf5` and the six MP4 streams while
skipping `visualization.rrd`.

## Core Commands

Run these from the repo root after setting `WORKSPACE` to the folder that owns
`data/sample/xperience-10m-sample`.

```bash
export WORKSPACE=/path/to/workspace

python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"

python scripts/episode_task_suite.py \
  --workspace "$WORKSPACE" \
  --include-neural

python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/task_walkthroughs.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/validate_website_integrity.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py
```

## Expected Public Outputs

| Command group | Expected artifacts |
| --- | --- |
| Minimal baselines | `artifacts/min_action_model/`, `artifacts/min_all_modalities_action_model/`, metrics and model weights |
| 12-task suite | `artifacts/episode_task_suite/summary_report.json`, per-task `metrics.json`, predictions, confusion matrices |
| Neural heads | `artifacts/episode_task_suite/neural_mlp/**/metrics.json`, histories, model checkpoints |
| Research directions | `artifacts/episode_task_suite/research_directions/`, `metrics/research_directions.json` |
| Direction probes | `artifacts/episode_task_suite/research_direction_extensions/`, `metrics/research_direction_extensions.json` |
| Walkthroughs | `artifacts/episode_task_suite/task_walkthroughs/`, `metrics/task_walkthroughs.json` |
| Figures | `assets/*.png`, `assets/charts/*.svg` |
| Modality atlas | `metrics/modality_atlas.json`, `assets/modalities/*` |
| Website integrity | `metrics/website_integrity.json` |
| Publication checks | `metrics/artifact_index.json`, `metrics/mirror_parity.json`, `metrics/publication_audit.json`, `metrics/scope_claims_audit.json` |

## Exact-Match Audit

The last full metric reproducibility audit was run on **2026-05-30
Asia/Singapore** from a fresh output directory outside the repo. It rebuilt the
minimal baselines, all-modality baselines, and the 12-task suite from the local
public sample. The regenerated metrics matched the committed artifacts after
float normalization.

Evidence:

- [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md)
- [`metrics/reproducibility_matrix.json`](metrics/reproducibility_matrix.json)

## Non-Reproducible From This Public Repo Alone

The following require gated data, large model weights, or private compute
state, so this repo does not claim they are publicly reproducible yet:

- a real 32-episode Qwen3-Omni LoRA run,
- held-out episode metrics for Qwen3-Omni,
- full Xperience-10M-scale pretraining,
- raw Xperience-10M video or annotation redistribution,
- full Qwen weights or large full checkpoints.

Before interpreting any Qwen3-Omni result, read
[`metrics/scope_claims_audit.json`](metrics/scope_claims_audit.json),
plus the companion GitHub repo's
`results/omni_finetune/DATA_BLOCKER_REPORT.md` and
`results/omni_finetune/A100_HF_RELAY_STATUS.md`.