File size: 5,193 Bytes
7faed79 f590d7e 7faed79 f590d7e 7faed79 f590d7e 7faed79 f590d7e 3e04138 f590d7e 3e04138 f590d7e 3e04138 f590d7e 3e04138 7977885 f590d7e 0f9a8e2 f590d7e 3e04138 f590d7e 0f9a8e2 f590d7e 0f9a8e2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | # Reproducibility Contract
This file defines what can be reproduced from the public repo and the official
Xperience-10M sample, what each command should produce, and which claims remain
outside the current public data boundary.
## Scope
| Layer | Reproducible now | Boundary |
| --- | --- | --- |
| Sample download | Yes, from `ropedia-ai/xperience-10m-sample` or ModelScope sample mirror | Raw data is not redistributed in this repo. |
| Minimal baselines | Yes | One public sample episode, chronological split. |
| 12-task suite | Yes | Uses the current 8,378-d feature contract; audio is documented but not featurized. |
| Neural MLP heads | Yes, when `torch` is installed | Compact task heads only, not a foundation model. |
| Website figures and charts | Yes | Generated from committed metrics and sample thumbnails. |
| Publication audit | Yes | Checks public repo and prepared HF bundles. |
| 32-episode Qwen3-Omni LoRA pilot | Not yet | Gated by full Xperience-10M access and held-out-episode evaluation. |
## Environment
Use Python 3.12 when possible. The current public scripts depend on the HOMIE
toolkit environment plus lightweight plotting and Hub tooling.
```bash
git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch
```
## Data
Download the public sample from Hugging Face:
```bash
hf download ropedia-ai/xperience-10m-sample \
--repo-type dataset \
--local-dir data/sample/xperience-10m-sample
```
On mainland-China servers, use the included ModelScope helper:
```bash
python scripts/omni/download_sample_modelscope.py \
--output-dir data/sample/xperience-10m-sample \
--mode all-training
```
`--mode all-training` downloads `annotation.hdf5` and the six MP4 streams while
skipping `visualization.rrd`.
## Core Commands
Run these from the repo root after setting `WORKSPACE` to the folder that owns
`data/sample/xperience-10m-sample`.
```bash
export WORKSPACE=/path/to/workspace
python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"
python scripts/episode_task_suite.py \
--workspace "$WORKSPACE" \
--include-neural
python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/task_walkthroughs.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/validate_website_integrity.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py
```
## Expected Public Outputs
| Command group | Expected artifacts |
| --- | --- |
| Minimal baselines | `artifacts/min_action_model/`, `artifacts/min_all_modalities_action_model/`, metrics and model weights |
| 12-task suite | `artifacts/episode_task_suite/summary_report.json`, per-task `metrics.json`, predictions, confusion matrices |
| Neural heads | `artifacts/episode_task_suite/neural_mlp/**/metrics.json`, histories, model checkpoints |
| Research directions | `artifacts/episode_task_suite/research_directions/`, `metrics/research_directions.json` |
| Direction probes | `artifacts/episode_task_suite/research_direction_extensions/`, `metrics/research_direction_extensions.json` |
| Walkthroughs | `artifacts/episode_task_suite/task_walkthroughs/`, `metrics/task_walkthroughs.json` |
| Figures | `assets/*.png`, `assets/charts/*.svg` |
| Modality atlas | `metrics/modality_atlas.json`, `assets/modalities/*` |
| Website integrity | `metrics/website_integrity.json` |
| Publication checks | `metrics/artifact_index.json`, `metrics/mirror_parity.json`, `metrics/publication_audit.json`, `metrics/scope_claims_audit.json` |
## Exact-Match Audit
The last full metric reproducibility audit was run on **2026-05-30
Asia/Singapore** from a fresh output directory outside the repo. It rebuilt the
minimal baselines, all-modality baselines, and the 12-task suite from the local
public sample. The regenerated metrics matched the committed artifacts after
float normalization.
Evidence:
- [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md)
- [`metrics/reproducibility_matrix.json`](metrics/reproducibility_matrix.json)
## Non-Reproducible From This Public Repo Alone
The following require gated data, large model weights, or private compute
state, so this repo does not claim they are publicly reproducible yet:
- a real 32-episode Qwen3-Omni LoRA run,
- held-out episode metrics for Qwen3-Omni,
- full Xperience-10M-scale pretraining,
- raw Xperience-10M video or annotation redistribution,
- full Qwen weights or large full checkpoints.
Before interpreting any Qwen3-Omni result, read
[`metrics/scope_claims_audit.json`](metrics/scope_claims_audit.json),
plus the companion GitHub repo's
`results/omni_finetune/DATA_BLOCKER_REPORT.md` and
`results/omni_finetune/A100_HF_RELAY_STATUS.md`.
|