File size: 6,523 Bytes

7faed79
 
f590d7e
540e67a
6a1869c
7faed79
f590d7e
7faed79
6a1869c
f590d7e
cca436c
f590d7e
45c1706
f590d7e
 
6a1869c
eeac43c
7faed79
f590d7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cfd29be
 
f590d7e
 
 
 
 
 
3e04138
f590d7e
 
3e04138
cca436c
 
 
 
 
f590d7e
 
 
 
3e04138
 
f590d7e
 
 
 
 
 
 
 
 
 
 
 
29331c9
cca436c
3e04138
 
 
 
2c5b88c
9d58132
7977885
4173e02
f590d7e
 
0f9a8e2
f590d7e
3e04138
 
f590d7e
 
 
 
cca436c
 
 
 
 
 
4173e02
c325020
cca436c
 
2c5b88c
9d58132
cca436c
 
540e67a
f590d7e
540e67a
f590d7e
540e67a
f590d7e
 
 
 
 
 
 
 
cca436c
f590d7e
 
 
 
eeac43c
f590d7e
eeac43c
f590d7e
 
 
 
 
cca436c
476e8e8
cca436c
cfd29be

# Reproducibility Contract

This file defines what can be reproduced from the public repo and the official
Xperience-10M sample, what each command should produce, and which results remain
outside the current public data scope.

## Scope

| Layer | Reproducible now | Current scope |
| --- | --- | --- |
| Sample download | Yes, from `ropedia-ai/xperience-10m-sample` or ModelScope sample mirror | Sample card lists `cc-by-nc-4.0`; raw data is not redistributed in this repo. |
| Minimal baselines | Yes | One public sample episode, chronological split. |
| 12-task suite | Yes | Uses the current 8,546-d synchronized multimodal feature contract. |
| Neural MLP heads | Yes, when `torch` is installed | Compact task heads only, not a foundation model. |
| Website figures and charts | Yes | Generated from committed metrics and sample thumbnails. |
| Public bundle contents | Yes | Covers public repo and prepared HF bundles. |
| Multi-episode Qwen3-Omni LoRA pilot | Yes, as a public-safe verified result package | The selected 96/16/16 episode split produced a validation-monitored diagnostic held-out result package with 3,808 exported windows, 512 validation windows, 448 test predictions, and weak model-quality metrics that motivate the next structured-output improvement pass. |

## Environment

Use Python 3.12 when possible. The current public scripts depend on the HOMIE
toolkit environment plus lightweight plotting and Hub tooling.

```bash
git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch
```

## Data

Download the public sample from Hugging Face:

```bash
hf download ropedia-ai/xperience-10m-sample \
  --repo-type dataset \
  --local-dir data/sample/xperience-10m-sample
```

If Hugging Face access is unavailable in your environment, use the included
ModelScope helper:

```bash
python scripts/omni/download_sample_modelscope.py \
  --output-dir data/sample/xperience-10m-sample \
  --mode all-training
```

`--mode all-training` downloads `annotation.hdf5` and the six MP4 streams while
skipping `visualization.rrd`.

The sample card points to HOMIE Toolkit for inspecting videos and annotations.
When `visualization.rrd` is downloaded for human inspection, open it with Rerun
0.29.0. The `.rrd` viewer artifact is not used by the training/evaluation
scripts and is excluded from public publication bundles.

## Core Commands

Run these from the repo root after setting `WORKSPACE` to the folder that owns
`data/sample/xperience-10m-sample`.

```bash
export WORKSPACE=/path/to/workspace

python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"

python scripts/episode_task_suite.py \
  --workspace "$WORKSPACE" \
  --include-neural

python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/task_walkthroughs.py
python scripts/validate_source_alignment.py
python scripts/build_evaluation_protocol.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/build_brand_assets.py
python scripts/build_figure_index.py
python scripts/validate_website_integrity.py
python scripts/validate_task_surface.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py
```

## Expected Public Outputs

| Command group | Expected artifacts |
| --- | --- |
| Minimal baselines | `results/min_action_model/`, `results/min_all_modalities_action_model/`, metrics and model weights |
| 12-task suite | `results/episode_task_suite/summary_report.json`, per-task `metrics.json`, predictions, confusion matrices |
| Neural heads | `results/episode_task_suite/neural_mlp/**/metrics.json`, histories, model checkpoints |
| Research directions | `results/episode_task_suite/research_directions/`, `docs/data/research_directions.json` |
| Direction probes | `results/episode_task_suite/research_direction_extensions/`, `docs/data/research_direction_extensions.json` |
| Walkthroughs | `results/episode_task_suite/task_walkthroughs/`, `docs/data/task_walkthroughs.json` |
| Task surface integrity | `docs/data/task_surface_integrity.json` |
| Source alignment | `SOURCE_ALIGNMENT_AUDIT.md`, `docs/data/source_alignment_audit.json` |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` |
| Figures | `docs/assets/*.png`, `docs/assets/charts/*.svg` |
| Brand assets | `docs/assets/brand/*.png`, `docs/favicon.png`, `docs/apple-touch-icon.png`, `docs/data/brand_assets.json` |
| Figure index | `FIGURE_INDEX.md`, `docs/data/figure_index.json` |
| Modality atlas | `docs/data/modality_atlas.json`, `docs/assets/modalities/*` |
| Website integrity | `docs/data/website_integrity.json` |
| Release reports | `docs/data/artifact_index.json`, `docs/data/mirror_parity.json`, `docs/data/publication_audit.json`, `docs/data/scope_claims_audit.json` |

## Exact-Match Reproduction Record

The last full metric reproduction run was completed on **2026-05-30
Asia/Singapore** from a fresh output directory outside the repo. It rebuilt the
minimal baselines, all-modality baselines, and the 12-task suite from the local
public sample. The regenerated metrics matched the committed artifacts after
float normalization.

Evidence:

- [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md)
- [`docs/data/reproducibility_matrix.json`](docs/data/reproducibility_matrix.json)

## Non-Reproducible From This Public Repo Alone

The following require gated data, large model weights, or private compute
state, so this repo does not provide public reproduction for:

- rerunning the multi-episode Qwen3-Omni LoRA pilot from raw gated data,
- full Xperience-10M-scale pretraining,
- raw Xperience-10M video or annotation redistribution,
- full Qwen weights or large full checkpoints.

Before interpreting any Qwen3-Omni result, read
[`docs/data/scope_claims_audit.json`](docs/data/scope_claims_audit.json),
[`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md)
and
[`results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md).