cy0307's picture
Publish Ropedia Xperience-10M task baseline cards
eeac43c verified
|
Raw
History Blame
6.52 kB
# Reproducibility Contract
This file defines what can be reproduced from the public repo and the official
Xperience-10M sample, what each command should produce, and which results remain
outside the current public data scope.
## Scope
| Layer | Reproducible now | Current scope |
| --- | --- | --- |
| Sample download | Yes, from `ropedia-ai/xperience-10m-sample` or ModelScope sample mirror | Sample card lists `cc-by-nc-4.0`; raw data is not redistributed in this repo. |
| Minimal baselines | Yes | One public sample episode, chronological split. |
| 12-task suite | Yes | Uses the current 8,546-d synchronized multimodal feature contract. |
| Neural MLP heads | Yes, when `torch` is installed | Compact task heads only, not a foundation model. |
| Website figures and charts | Yes | Generated from committed metrics and sample thumbnails. |
| Public bundle contents | Yes | Covers public repo and prepared HF bundles. |
| Multi-episode Qwen3-Omni LoRA pilot | Yes, as a public-safe verified result package | The selected 96/16/16 episode split produced a validation-monitored diagnostic held-out result package with 3,808 exported windows, 512 validation windows, 448 test predictions, and weak model-quality metrics that motivate the next structured-output improvement pass. |
## Environment
Use Python 3.12 when possible. The current public scripts depend on the HOMIE
toolkit environment plus lightweight plotting and Hub tooling.
```bash
git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch
```
## Data
Download the public sample from Hugging Face:
```bash
hf download ropedia-ai/xperience-10m-sample \
--repo-type dataset \
--local-dir data/sample/xperience-10m-sample
```
If Hugging Face access is unavailable in your environment, use the included
ModelScope helper:
```bash
python scripts/omni/download_sample_modelscope.py \
--output-dir data/sample/xperience-10m-sample \
--mode all-training
```
`--mode all-training` downloads `annotation.hdf5` and the six MP4 streams while
skipping `visualization.rrd`.
The sample card points to HOMIE Toolkit for inspecting videos and annotations.
When `visualization.rrd` is downloaded for human inspection, open it with Rerun
0.29.0. The `.rrd` viewer artifact is not used by the training/evaluation
scripts and is excluded from public publication bundles.
## Core Commands
Run these from the repo root after setting `WORKSPACE` to the folder that owns
`data/sample/xperience-10m-sample`.
```bash
export WORKSPACE=/path/to/workspace
python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"
python scripts/episode_task_suite.py \
--workspace "$WORKSPACE" \
--include-neural
python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/task_walkthroughs.py
python scripts/validate_source_alignment.py
python scripts/build_evaluation_protocol.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/build_brand_assets.py
python scripts/build_figure_index.py
python scripts/validate_website_integrity.py
python scripts/validate_task_surface.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py
```
## Expected Public Outputs
| Command group | Expected artifacts |
| --- | --- |
| Minimal baselines | `results/min_action_model/`, `results/min_all_modalities_action_model/`, metrics and model weights |
| 12-task suite | `results/episode_task_suite/summary_report.json`, per-task `metrics.json`, predictions, confusion matrices |
| Neural heads | `results/episode_task_suite/neural_mlp/**/metrics.json`, histories, model checkpoints |
| Research directions | `results/episode_task_suite/research_directions/`, `docs/data/research_directions.json` |
| Direction probes | `results/episode_task_suite/research_direction_extensions/`, `docs/data/research_direction_extensions.json` |
| Walkthroughs | `results/episode_task_suite/task_walkthroughs/`, `docs/data/task_walkthroughs.json` |
| Task surface integrity | `docs/data/task_surface_integrity.json` |
| Source alignment | `SOURCE_ALIGNMENT_AUDIT.md`, `docs/data/source_alignment_audit.json` |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` |
| Figures | `docs/assets/*.png`, `docs/assets/charts/*.svg` |
| Brand assets | `docs/assets/brand/*.png`, `docs/favicon.png`, `docs/apple-touch-icon.png`, `docs/data/brand_assets.json` |
| Figure index | `FIGURE_INDEX.md`, `docs/data/figure_index.json` |
| Modality atlas | `docs/data/modality_atlas.json`, `docs/assets/modalities/*` |
| Website integrity | `docs/data/website_integrity.json` |
| Release reports | `docs/data/artifact_index.json`, `docs/data/mirror_parity.json`, `docs/data/publication_audit.json`, `docs/data/scope_claims_audit.json` |
## Exact-Match Reproduction Record
The last full metric reproduction run was completed on **2026-05-30
Asia/Singapore** from a fresh output directory outside the repo. It rebuilt the
minimal baselines, all-modality baselines, and the 12-task suite from the local
public sample. The regenerated metrics matched the committed artifacts after
float normalization.
Evidence:
- [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md)
- [`docs/data/reproducibility_matrix.json`](docs/data/reproducibility_matrix.json)
## Non-Reproducible From This Public Repo Alone
The following require gated data, large model weights, or private compute
state, so this repo does not provide public reproduction for:
- rerunning the multi-episode Qwen3-Omni LoRA pilot from raw gated data,
- full Xperience-10M-scale pretraining,
- raw Xperience-10M video or annotation redistribution,
- full Qwen weights or large full checkpoints.
Before interpreting any Qwen3-Omni result, read
[`docs/data/scope_claims_audit.json`](docs/data/scope_claims_audit.json),
[`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md)
and
[`results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md).