Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 10,060 Bytes
7faed79 f590d7e 540e67a 6a1869c 7faed79 f590d7e 7faed79 6a1869c f590d7e cca436c f590d7e d208a41 f590d7e 6a1869c a3929d0 7faed79 f590d7e cfd29be f590d7e 3e04138 f590d7e 3e04138 cca436c f590d7e 3e04138 f590d7e c614c4e d9be7c0 2ebe45d f590d7e 29331c9 cca436c 3e04138 2c5b88c 9d58132 7977885 4173e02 f590d7e 0f9a8e2 f590d7e 3e04138 d9be7c0 d208a41 d9be7c0 c614c4e a3929d0 f590d7e cca436c d208a41 2ebe45d cca436c 4173e02 c325020 cca436c 2c5b88c 9d58132 cca436c 540e67a f590d7e 540e67a f590d7e 540e67a f590d7e 3c21768 d9be7c0 d208a41 f590d7e cca436c f590d7e eeac43c f590d7e eeac43c f590d7e cca436c 476e8e8 cca436c cfd29be | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | # Reproducibility Contract
This file defines what can be reproduced from the public repo and the official
Xperience-10M sample, what each command should produce, and which results remain
outside the current public data scope.
## Scope
| Layer | Reproducible now | Current scope |
| --- | --- | --- |
| Sample download | Yes, from `ropedia-ai/xperience-10m-sample` or ModelScope sample mirror | Sample card lists `cc-by-nc-4.0`; raw data is not redistributed in this repo. |
| Minimal baselines | Yes | One public sample episode, chronological split. |
| Unified 20-task suite | Yes; the historical provenance bundle requires `annotation.hdf5` plus `h5py` or HOMIE Toolkit for regeneration | Uses the current 8,546-d synchronized multimodal feature contract, the same 20-frame windows, and the same chronological split. |
| Neural MLP heads | Yes, when `torch` is installed | Compact task heads only, not a foundation model. |
| Website figures and charts | Yes | Generated from committed metrics and sample thumbnails. |
| Public bundle contents | Yes | Covers public repo and prepared HF bundles. |
| Multi-episode Qwen3-Omni LoRA pilot | Yes, as a public-safe verified result package | The selected 96/16/16 episode split produced verified held-out packages; the latest v6 package records 34,269 exported multiscale windows and 4,032 held-out predictions. Public readers can inspect the package, but rerunning requires gated Xperience data and base-model weights. |
| Owner-side staged Qwen3-Omni v6 reproduction | Yes, on the private staged GPU host only | The staged host has the exported media cache, path-rewritten JSONL, Qwen3-Omni base-model cache, v6 adapter, HF mirrors, and a one-sample smoke with `exit_code=0` on 2026-06-14. |
## Environment
Use Python 3.12 when possible. The current public scripts depend on the HOMIE
toolkit environment plus lightweight plotting and Hub tooling.
```bash
git clone https://github.com/Ropedia/HOMIE-toolkit.git
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
pip install -r ropedia-xperience-10m-task-suite/requirements.txt
pip install torch
```
## Data
Download the public sample from Hugging Face:
```bash
hf download ropedia-ai/xperience-10m-sample \
--repo-type dataset \
--local-dir data/sample/xperience-10m-sample
```
If Hugging Face access is unavailable in your environment, use the included
ModelScope helper:
```bash
python scripts/omni/download_sample_modelscope.py \
--output-dir data/sample/xperience-10m-sample \
--mode all-training
```
`--mode all-training` downloads `annotation.hdf5` and the six MP4 streams while
skipping `visualization.rrd`.
The sample card points to HOMIE Toolkit for inspecting videos and annotations.
When `visualization.rrd` is downloaded for human inspection, open it with Rerun
0.29.0. The `.rrd` viewer artifact is not used by the training/evaluation
scripts and is excluded from public publication bundles.
## Core Commands
Run these from the repo root after setting `WORKSPACE` to the folder that owns
`data/sample/xperience-10m-sample`.
```bash
export WORKSPACE=/path/to/workspace
python scripts/train_min_action_model.py --workspace "$WORKSPACE"
python scripts/train_all_modalities_model.py --workspace "$WORKSPACE"
python scripts/episode_task_suite.py \
--workspace "$WORKSPACE" \
--include-neural
python scripts/research_direction_taxonomy.py
python scripts/research_direction_extension_tasks.py
python scripts/tier2_task_suite.py
python scripts/build_unified_task_suite.py
python scripts/build_unified_task_model_radar.py
python scripts/task_walkthroughs.py
python scripts/validate_source_alignment.py
python scripts/build_evaluation_protocol.py
python scripts/generate_visualizations.py
python scripts/render_overview_figures.py
python scripts/render_task_suite_infographic.py
python scripts/export_modality_atlas_assets.py
python scripts/build_brand_assets.py
python scripts/build_figure_index.py
python scripts/validate_website_integrity.py
python scripts/validate_task_surface.py
python scripts/validate_scope_claims.py
python scripts/build_artifact_index.py
python scripts/validate_mirror_parity.py
python scripts/validate_publication_package.py
```
`scripts/tier2_task_suite.py` has a historical file name, but it now regenerates
provenance rows inside the unified 20-task suite. It can use HOMIE Toolkit when
present, or a direct `h5py` fallback for the public sample's caption JSON. It
reads the local raw `annotation.hdf5` only to regenerate interaction/object
targets; the raw HDF5 is still ignored by git and excluded from public bundles.
## Owner-Side Staged Qwen3-Omni v6 Reproduction
This section is for the private staged GPU host, not for public reruns from the
GitHub repo alone. It preserves the verified result path after the original
training host is released.
Expected private staging layout:
| Item | Staged path |
| --- | --- |
| Staging root | `/mnt/kgc/chaoyue/ropedia-h20-side` |
| Repo | `<staged-repo-root>` |
| Qwen3-Omni base model | `/mnt/kgc/chaoyue/ropedia-h20-side/modelscope_models/Qwen__Qwen3-Omni-30B-A3B-Instruct` |
| v6 adapter | `checkpoints/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora/adapter_lora` |
| Staged eval JSONL | `results/omni_finetune/xperience10m_qwen3_omni_128ep_multiscale_cap96_v5_full8gpu_lora_dataset/dataset_a100_eval.jsonl` |
| Private handoff manifest | `/mnt/kgc/chaoyue/ropedia-h20-side/STAGING_MANIFEST_20260614.md` |
The staged JSONL has the same 34,269 rows as the original export JSONL, with
exported media paths rewritten from the training-host repo root to the private
staging root. Raw upstream Xperience-10M source files are not required for this
train/eval cache reproduction and were not copied because the selected raw
source tree is about 278 GB.
Run this from the staged repo:
```bash
cd <staged-repo-root>
CUDA_VISIBLE_DEVICES=0,1,2,3 \
RUN_ID=a100_repro_qwen_v6_eval_smoke1_manual \
SAMPLE_LIMIT=1 \
MAX_NEW_TOKENS=1 \
scripts/omni/run_private_gpu_qwen3_v6_repro_smoke.sh
```
The launcher first applies/checks the narrow Transformers Qwen3-Omni
video-feature compatibility patch. The expected compatible installed source
hash is `da5feea4afc11767db3ca7eedb85ac129c66605643dadc6272c4288b03be7d25`;
the known incompatible pre-patch hash is
`2aa5752c32965dbaeee230a016afbbbb30d459a46a12c88c1d6f712e12ba95ad`.
Verified staged-GPU smoke evidence from 2026-06-14:
| Field | Value |
| --- | --- |
| Run id | `a100_repro_qwen_v6_eval_smoke1_preflight_busy_20260614` |
| Exit code | `0` |
| Samples | `1` |
| JSON validity | `1.0` |
| Transition accuracy | `1.0` |
| Contact accuracy | `1.0` |
| Object micro-F1 | `0.28571428571428575` |
| Metrics path | `results/omni_finetune/a100_repro_qwen_v6_eval_smoke1_preflight_busy_20260614/metrics.json` |
## Expected Public Outputs
| Command group | Expected artifacts |
| --- | --- |
| Minimal baselines | `results/min_action_model/`, `results/min_all_modalities_action_model/`, metrics and model weights |
| Unified 20-task suite | `TASK_SUITE_20.md`, `docs/data/task_suite_20.json`, `results/episode_task_suite/summary_report.json`, per-task `metrics.json`, predictions, confusion matrices, and the historical `tier2_task_suite` provenance bundle |
| Unified 20-task model radar | `docs/data/unified_task_model_radar.json`, `docs/assets/charts/unified_task_model_radar.svg` |
| Neural heads | `results/episode_task_suite/neural_mlp/**/metrics.json`, histories, model checkpoints |
| Research directions | `results/episode_task_suite/research_directions/`, `docs/data/research_directions.json` |
| Direction probes | `results/episode_task_suite/research_direction_extensions/`, `docs/data/research_direction_extensions.json` |
| Walkthroughs | `results/episode_task_suite/task_walkthroughs/`, `docs/data/task_walkthroughs.json` |
| Task surface integrity | `docs/data/task_surface_integrity.json` |
| Source alignment | `SOURCE_ALIGNMENT_AUDIT.md`, `docs/data/source_alignment_audit.json` |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` |
| Figures | `docs/assets/*.png`, `docs/assets/charts/*.svg` |
| Brand assets | `docs/assets/brand/*.png`, `docs/favicon.png`, `docs/apple-touch-icon.png`, `docs/data/brand_assets.json` |
| Figure index | `FIGURE_INDEX.md`, `docs/data/figure_index.json` |
| Modality atlas | `docs/data/modality_atlas.json`, `docs/assets/modalities/*` |
| Website integrity | `docs/data/website_integrity.json` |
| Release reports | `docs/data/artifact_index.json`, `docs/data/mirror_parity.json`, `docs/data/publication_audit.json`, `docs/data/scope_claims_audit.json` |
## Exact-Match Reproduction Record
The last full metric reproduction run was completed on **2026-05-30
Asia/Singapore** from a fresh output directory outside the repo. It rebuilt the
minimal baselines, all-modality baselines, and the original core task artifacts
from the local public sample. The regenerated metrics matched the committed
artifacts after float normalization; the current public framing now indexes
those artifacts together as one 20-task suite.
Evidence:
- [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md)
- [`docs/data/reproducibility_matrix.json`](docs/data/reproducibility_matrix.json)
## Non-Reproducible From This Public Repo Alone
The following require gated data, large model weights, or private compute
state, so this repo does not provide public reproduction for:
- rerunning the multi-episode Qwen3-Omni LoRA pilot from raw gated data,
- full Xperience-10M-scale pretraining,
- raw Xperience-10M video or annotation redistribution,
- full Qwen weights or large full checkpoints.
Before interpreting any Qwen3-Omni result, read
[`docs/data/scope_claims_audit.json`](docs/data/scope_claims_audit.json),
[`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md)
and
[`results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md).
|