cy0307's picture
Add files using upload-large-folder tool
2ebe45d verified
|
Raw
History Blame
10.1 kB

Reproducibility Contract

This file defines what can be reproduced from the public repo and the official Xperience-10M sample, what each command should produce, and which results remain outside the current public data scope.

Scope

Layer Reproducible now Current scope
Sample download Yes, from ropedia-ai/xperience-10m-sample or ModelScope sample mirror Sample card lists cc-by-nc-4.0; raw data is not redistributed in this repo.
Minimal baselines Yes One public sample episode, chronological split.
Unified 20-task suite Yes; tasks 13-20 require annotation.hdf5 plus h5py or HOMIE Toolkit for regeneration Uses the current 8,546-d synchronized multimodal feature contract, the same 20-frame windows, and the same chronological split.
Neural MLP heads Yes, when torch is installed Compact task heads only, not a foundation model.
Website figures and charts Yes Generated from committed metrics and sample thumbnails.
Public bundle contents Yes Covers public repo and prepared HF bundles.
Multi-episode Qwen3-Omni LoRA pilot Yes, as a public-safe verified result package The selected 96/16/16 episode split produced verified held-out packages; the latest v6 package records 34,269 exported multiscale windows and 4,032 held-out predictions. Public readers can inspect the package, but rerunning requires gated Xperience data and base-model weights.
Owner-side staged Qwen3-Omni v6 reproduction Yes, on the private staged GPU host only The staged host has the exported media cache, path-rewritten JSONL, Qwen3-Omni base-model cache, v6 adapter, HF mirrors, and a one-sample smoke with exit_code=0 on 2026-06-14.

Environment

Use Python 3.12 when possible. The current public scripts depend on the HOMIE toolkit environment plus lightweight plotting and Hub tooling.

git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch

Data

Download the public sample from Hugging Face:

hf download ropedia-ai/xperience-10m-sample \
  --repo-type dataset \
  --local-dir data/sample/xperience-10m-sample

If Hugging Face access is unavailable in your environment, use the included ModelScope helper:

python scripts/omni/download_sample_modelscope.py \
  --output-dir data/sample/xperience-10m-sample \
  --mode all-training

--mode all-training downloads annotation.hdf5 and the six MP4 streams while skipping visualization.rrd.

The sample card points to HOMIE Toolkit for inspecting videos and annotations. When visualization.rrd is downloaded for human inspection, open it with Rerun 0.29.0. The .rrd viewer artifact is not used by the training/evaluation scripts and is excluded from public publication bundles.

Core Commands

Run these from the repo root after setting WORKSPACE to the folder that owns data/sample/xperience-10m-sample.

export WORKSPACE=/path/to/workspace

python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"

python scripts/episode_task_suite.py \
  --workspace "$WORKSPACE" \
  --include-neural

python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/tier2_task_suite.py
python scripts/build_unified_task_suite.py
python scripts/build_unified_task_model_radar.py
python scripts/task_walkthroughs.py
python scripts/validate_source_alignment.py
python scripts/build_evaluation_protocol.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/build_brand_assets.py
python scripts/build_figure_index.py
python scripts/validate_website_integrity.py
python scripts/validate_task_surface.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py

scripts/tier2_task_suite.py has a historical file name, but it now regenerates tasks 13-20 for the unified 20-task suite. It can use HOMIE Toolkit when present, or a direct h5py fallback for the public sample's caption JSON. It reads the local raw annotation.hdf5 only to regenerate interaction/object targets; the raw HDF5 is still ignored by git and excluded from public bundles.

Owner-Side Staged Qwen3-Omni v6 Reproduction

This section is for the private staged GPU host, not for public reruns from the GitHub repo alone. It preserves the verified result path after the original training host is released.

Expected private staging layout:

Item Staged path
Staging root /mnt/kgc/chaoyue/ropedia-h20-side
Repo <staged-repo-root>
Qwen3-Omni base model /mnt/kgc/chaoyue/ropedia-h20-side/modelscope_models/Qwen__Qwen3-Omni-30B-A3B-Instruct
v6 adapter checkpoints/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora/adapter_lora
Staged eval JSONL results/omni_finetune/xperience10m_qwen3_omni_128ep_multiscale_cap96_v5_full8gpu_lora_dataset/dataset_a100_eval.jsonl
Private handoff manifest /mnt/kgc/chaoyue/ropedia-h20-side/STAGING_MANIFEST_20260614.md

The staged JSONL has the same 34,269 rows as the original export JSONL, with exported media paths rewritten from the training-host repo root to the private staging root. Raw upstream Xperience-10M source files are not required for this train/eval cache reproduction and were not copied because the selected raw source tree is about 278 GB.

Run this from the staged repo:

cd <staged-repo-root>
CUDA_VISIBLE_DEVICES=0,1,2,3 \
RUN_ID=a100_repro_qwen_v6_eval_smoke1_manual \
SAMPLE_LIMIT=1 \
MAX_NEW_TOKENS=1 \
scripts/omni/run_private_gpu_qwen3_v6_repro_smoke.sh

The launcher first applies/checks the narrow Transformers Qwen3-Omni video-feature compatibility patch. The expected compatible installed source hash is da5feea4afc11767db3ca7eedb85ac129c66605643dadc6272c4288b03be7d25; the known incompatible pre-patch hash is 2aa5752c32965dbaeee230a016afbbbb30d459a46a12c88c1d6f712e12ba95ad.

Verified staged-GPU smoke evidence from 2026-06-14:

Field Value
Run id a100_repro_qwen_v6_eval_smoke1_preflight_busy_20260614
Exit code 0
Samples 1
JSON validity 1.0
Transition accuracy 1.0
Contact accuracy 1.0
Object micro-F1 0.28571428571428575
Metrics path results/omni_finetune/a100_repro_qwen_v6_eval_smoke1_preflight_busy_20260614/metrics.json

Expected Public Outputs

Command group Expected artifacts
Minimal baselines results/min_action_model/, results/min_all_modalities_action_model/, metrics and model weights
Unified 20-task suite TASK_SUITE_20.md, docs/data/task_suite_20.json, results/episode_task_suite/summary_report.json, per-task metrics.json, predictions, confusion matrices, and the tasks 13-20 historical tier2_task_suite result bundle
Unified 20-task model radar docs/data/unified_task_model_radar.json, docs/assets/charts/unified_task_model_radar.svg
Neural heads results/episode_task_suite/neural_mlp/**/metrics.json, histories, model checkpoints
Research directions results/episode_task_suite/research_directions/, docs/data/research_directions.json
Direction probes results/episode_task_suite/research_direction_extensions/, docs/data/research_direction_extensions.json
Walkthroughs results/episode_task_suite/task_walkthroughs/, docs/data/task_walkthroughs.json
Task surface integrity docs/data/task_surface_integrity.json
Source alignment SOURCE_ALIGNMENT_AUDIT.md, docs/data/source_alignment_audit.json
Evaluation protocol EVALUATION_PROTOCOL.md, docs/data/evaluation_protocol.json
Figures docs/assets/*.png, docs/assets/charts/*.svg
Brand assets docs/assets/brand/*.png, docs/favicon.png, docs/apple-touch-icon.png, docs/data/brand_assets.json
Figure index FIGURE_INDEX.md, docs/data/figure_index.json
Modality atlas docs/data/modality_atlas.json, docs/assets/modalities/*
Website integrity docs/data/website_integrity.json
Release reports docs/data/artifact_index.json, docs/data/mirror_parity.json, docs/data/publication_audit.json, docs/data/scope_claims_audit.json

Exact-Match Reproduction Record

The last full metric reproduction run was completed on 2026-05-30 Asia/Singapore from a fresh output directory outside the repo. It rebuilt the minimal baselines, all-modality baselines, and the original 12 task artifacts from the local public sample. The regenerated metrics matched the committed artifacts after float normalization; the current public framing now indexes those artifacts together with tasks 13-20 as one 20-task suite.

Evidence:

Non-Reproducible From This Public Repo Alone

The following require gated data, large model weights, or private compute state, so this repo does not provide public reproduction for:

  • rerunning the multi-episode Qwen3-Omni LoRA pilot from raw gated data,
  • full Xperience-10M-scale pretraining,
  • raw Xperience-10M video or annotation redistribution,
  • full Qwen weights or large full checkpoints.

Before interpreting any Qwen3-Omni result, read docs/data/scope_claims_audit.json, results/omni_finetune/DATA_ACCESS_STATUS.md and results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md.