cy0307's picture
Publish Ropedia Xperience-10M task baseline cards
0f9a8e2 verified
|
Raw
History Blame
5.19 kB

Reproducibility Contract

This file defines what can be reproduced from the public repo and the official Xperience-10M sample, what each command should produce, and which claims remain outside the current public data boundary.

Scope

Layer Reproducible now Boundary
Sample download Yes, from ropedia-ai/xperience-10m-sample or ModelScope sample mirror Raw data is not redistributed in this repo.
Minimal baselines Yes One public sample episode, chronological split.
12-task suite Yes Uses the current 8,378-d feature contract; audio is documented but not featurized.
Neural MLP heads Yes, when torch is installed Compact task heads only, not a foundation model.
Website figures and charts Yes Generated from committed metrics and sample thumbnails.
Publication audit Yes Checks public repo and prepared HF bundles.
32-episode Qwen3-Omni LoRA pilot Not yet Gated by full Xperience-10M access and held-out-episode evaluation.

Environment

Use Python 3.12 when possible. The current public scripts depend on the HOMIE toolkit environment plus lightweight plotting and Hub tooling.

git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch

Data

Download the public sample from Hugging Face:

hf download ropedia-ai/xperience-10m-sample \
  --repo-type dataset \
  --local-dir data/sample/xperience-10m-sample

On mainland-China servers, use the included ModelScope helper:

python scripts/omni/download_sample_modelscope.py \
  --output-dir data/sample/xperience-10m-sample \
  --mode all-training

--mode all-training downloads annotation.hdf5 and the six MP4 streams while skipping visualization.rrd.

Core Commands

Run these from the repo root after setting WORKSPACE to the folder that owns data/sample/xperience-10m-sample.

export WORKSPACE=/path/to/workspace

python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"

python scripts/episode_task_suite.py \
  --workspace "$WORKSPACE" \
  --include-neural

python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/task_walkthroughs.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/validate_website_integrity.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py

Expected Public Outputs

Command group Expected artifacts
Minimal baselines artifacts/min_action_model/, artifacts/min_all_modalities_action_model/, metrics and model weights
12-task suite artifacts/episode_task_suite/summary_report.json, per-task metrics.json, predictions, confusion matrices
Neural heads artifacts/episode_task_suite/neural_mlp/**/metrics.json, histories, model checkpoints
Research directions artifacts/episode_task_suite/research_directions/, metrics/research_directions.json
Direction probes artifacts/episode_task_suite/research_direction_extensions/, metrics/research_direction_extensions.json
Walkthroughs artifacts/episode_task_suite/task_walkthroughs/, metrics/task_walkthroughs.json
Figures assets/*.png, assets/charts/*.svg
Modality atlas metrics/modality_atlas.json, assets/modalities/*
Website integrity metrics/website_integrity.json
Publication checks metrics/artifact_index.json, metrics/mirror_parity.json, metrics/publication_audit.json, metrics/scope_claims_audit.json

Exact-Match Audit

The last full metric reproducibility audit was run on 2026-05-30 Asia/Singapore from a fresh output directory outside the repo. It rebuilt the minimal baselines, all-modality baselines, and the 12-task suite from the local public sample. The regenerated metrics matched the committed artifacts after float normalization.

Evidence:

Non-Reproducible From This Public Repo Alone

The following require gated data, large model weights, or private compute state, so this repo does not claim they are publicly reproducible yet:

  • a real 32-episode Qwen3-Omni LoRA run,
  • held-out episode metrics for Qwen3-Omni,
  • full Xperience-10M-scale pretraining,
  • raw Xperience-10M video or annotation redistribution,
  • full Qwen weights or large full checkpoints.

Before interpreting any Qwen3-Omni result, read metrics/scope_claims_audit.json, plus the companion GitHub repo's results/omni_finetune/DATA_BLOCKER_REPORT.md and results/omni_finetune/A100_HF_RELAY_STATUS.md.