cy0307's picture
Publish Ropedia Xperience-10M task baseline cards
6a1869c verified
|
Raw
History Blame
6.31 kB

Reproducibility Contract

This file defines what can be reproduced from the public repo and the official Xperience-10M sample, what each command should produce, and which results remain outside the current public data scope.

Scope

Layer Reproducible now Current scope
Sample download Yes, from ropedia-ai/xperience-10m-sample or ModelScope sample mirror Sample card lists cc-by-nc-4.0; raw data is not redistributed in this repo.
Minimal baselines Yes One public sample episode, chronological split.
12-task suite Yes Uses the current 8,378-d feature contract; audio is documented but not featurized.
Neural MLP heads Yes, when torch is installed Compact task heads only, not a foundation model.
Website figures and charts Yes Generated from committed metrics and sample thumbnails.
Public bundle contents Yes Covers public repo and prepared HF bundles.
32-episode Qwen3-Omni LoRA pilot Not yet Gated by full Xperience-10M access and held-out-episode evaluation.

Environment

Use Python 3.12 when possible. The current public scripts depend on the HOMIE toolkit environment plus lightweight plotting and Hub tooling.

git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch

Data

Download the public sample from Hugging Face:

hf download ropedia-ai/xperience-10m-sample \
  --repo-type dataset \
  --local-dir data/sample/xperience-10m-sample

If Hugging Face access is unavailable in your environment, use the included ModelScope helper:

python scripts/omni/download_sample_modelscope.py \
  --output-dir data/sample/xperience-10m-sample \
  --mode all-training

--mode all-training downloads annotation.hdf5 and the six MP4 streams while skipping visualization.rrd.

The sample card points to HOMIE Toolkit for inspecting videos and annotations. When visualization.rrd is downloaded for human inspection, open it with Rerun 0.29.0. The .rrd viewer artifact is not used by the training/evaluation scripts and is excluded from public publication bundles.

Core Commands

Run these from the repo root after setting WORKSPACE to the folder that owns data/sample/xperience-10m-sample.

export WORKSPACE=/path/to/workspace

python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"

python scripts/episode_task_suite.py \
  --workspace "$WORKSPACE" \
  --include-neural

python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/task_walkthroughs.py
python scripts/validate_source_alignment.py
python scripts/build_evaluation_protocol.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/build_brand_assets.py
python scripts/build_figure_index.py
python scripts/validate_website_integrity.py
python scripts/validate_task_surface.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py

Expected Public Outputs

Command group Expected artifacts
Minimal baselines results/min_action_model/, results/min_all_modalities_action_model/, metrics and model weights
12-task suite results/episode_task_suite/summary_report.json, per-task metrics.json, predictions, confusion matrices
Neural heads results/episode_task_suite/neural_mlp/**/metrics.json, histories, model checkpoints
Research directions results/episode_task_suite/research_directions/, docs/data/research_directions.json
Direction probes results/episode_task_suite/research_direction_extensions/, docs/data/research_direction_extensions.json
Walkthroughs results/episode_task_suite/task_walkthroughs/, docs/data/task_walkthroughs.json
Task surface integrity docs/data/task_surface_integrity.json
Source alignment SOURCE_ALIGNMENT_AUDIT.md, docs/data/source_alignment_audit.json
Evaluation protocol EVALUATION_PROTOCOL.md, docs/data/evaluation_protocol.json
Figures docs/assets/*.png, docs/assets/charts/*.svg
Brand assets docs/assets/brand/*.png, docs/favicon.png, docs/apple-touch-icon.png, docs/data/brand_assets.json
Figure index FIGURE_INDEX.md, docs/data/figure_index.json
Modality atlas docs/data/modality_atlas.json, docs/assets/modalities/*
Website integrity docs/data/website_integrity.json
Release reports docs/data/artifact_index.json, docs/data/mirror_parity.json, docs/data/publication_audit.json, docs/data/scope_claims_audit.json

Exact-Match Reproduction Record

The last full metric reproduction run was completed on 2026-05-30 Asia/Singapore from a fresh output directory outside the repo. It rebuilt the minimal baselines, all-modality baselines, and the 12-task suite from the local public sample. The regenerated metrics matched the committed artifacts after float normalization.

Evidence:

Non-Reproducible From This Public Repo Alone

The following require gated data, large model weights, or private compute state, so this repo does not yet provide public reproduction for:

  • a real 32-episode Qwen3-Omni LoRA run,
  • held-out episode metrics for Qwen3-Omni,
  • full Xperience-10M-scale pretraining,
  • raw Xperience-10M video or annotation redistribution,
  • full Qwen weights or large full checkpoints.

Before interpreting any Qwen3-Omni result, read docs/data/scope_claims_audit.json, results/omni_finetune/DATA_ACCESS_STATUS.md and results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md.