Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Document model download entrypoint
Browse files
README.md
CHANGED
|
@@ -84,6 +84,35 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
|
|
| 84 |
| Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
|
| 85 |
| Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
## Research Project Overview
|
| 88 |
|
| 89 |
| Theme | Current implementation |
|
|
@@ -179,7 +208,7 @@ The generated evaluation protocol is at
|
|
| 179 |
The generated research takeaways are at
|
| 180 |
[`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
|
| 181 |
[`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
|
| 182 |
-
The
|
| 183 |
[`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
|
| 184 |
[`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
|
| 185 |
The foundation-model selection plan is at
|
|
@@ -352,7 +381,7 @@ Hugging Face Space app:
|
|
| 352 |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
|
| 353 |
| Task surface integrity | `docs/data/task_surface_integrity.json` | Checks the public task cards, readable task names, representative modality thumbnails, and interactive walkthrough storyboard |
|
| 354 |
| Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json` | Records the browser-level page load, tab navigation, walkthrough deep link, player interaction, and console-health result |
|
| 355 |
-
| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the
|
| 356 |
| Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
|
| 357 |
| Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
|
| 358 |
| Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
|
|
@@ -481,7 +510,7 @@ docs/
|
|
| 481 |
data/website_integrity.json # machine-readable website integrity check
|
| 482 |
data/project_manifest.json # machine-readable public-surface metadata
|
| 483 |
data/project_packet.json # machine-readable project path and scope summary
|
| 484 |
-
data/research_roadmap.json #
|
| 485 |
data/research_directions.json # four-track website data bundle
|
| 486 |
data/research_direction_extensions.json # four extra probe data bundle
|
| 487 |
data/task_walkthroughs.json # human-readable task-card and walkthrough-storyboard data
|
|
@@ -605,13 +634,13 @@ The useful distinction is:
|
|
| 605 |
The figure shows the intended end-to-end training flow: raw valid episodes enter
|
| 606 |
episode-level split validation, parallel media/sensor export creates Qwen-style
|
| 607 |
JSONL records, Qwen3-Omni receives video/audio/text directly, the sensor bridge
|
| 608 |
-
adds depth/pose/mocap/IMU features, LoRA adapters are trained on
|
| 609 |
train/val episodes, and sealed held-out test evaluation produces predictions,
|
| 610 |
metrics, run reports, and upload-ready adapter artifacts.
|
| 611 |
|
| 612 |
The current scale-up artifacts show that the export, manifest, sensor-feature,
|
| 613 |
LoRA, and evaluation scripts can run on the available sample episode. They do
|
| 614 |
-
not show a real multi-episode result. A real pilot requires
|
| 615 |
episodes, held-out episode splits, training metadata, predictions, metrics, and
|
| 616 |
a run report; the current selected pilot target is 128 episodes.
|
| 617 |
|
|
@@ -653,7 +682,7 @@ Current status in this repo:
|
|
| 653 |
- gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
|
| 654 |
- selected_episode_plan: 128 metadata-balanced episodes, 96/16/16 train/val/test
|
| 655 |
- selected_download_size: 277.71 GiB excluding `visualization.rrd`
|
| 656 |
-
- ready_for_held_out_pilot: false until the selected episodes are fully
|
| 657 |
- gated dataset: available for selected multi-episode data preparation
|
| 658 |
- source_discovery: `results/omni_finetune/source_discovery.json`
|
| 659 |
- data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
|
|
@@ -668,7 +697,7 @@ episode per top-level session UUID.
|
|
| 668 |
### Progressive Train/Validation Pilot
|
| 669 |
|
| 670 |
The selected 128-episode plan can be used before every episode has arrived by
|
| 671 |
-
training only on
|
| 672 |
The final `test` episodes stay sealed until the end, so early development does
|
| 673 |
not contaminate held-out evaluation.
|
| 674 |
|
|
@@ -688,7 +717,7 @@ running final test evaluation. The exporter uses session-qualified episode IDs
|
|
| 688 |
and path-based split matching so repeated folder names such as `ep1` cannot
|
| 689 |
collide across different sessions.
|
| 690 |
|
| 691 |
-
For larger
|
| 692 |
uses the same split guard, exports episodes in parallel CPU shards, skips and
|
| 693 |
reports episodes that contain no labeled windows under the configured label
|
| 694 |
rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
|
|
@@ -715,7 +744,7 @@ assuming one backbone solves every Xperience-10M objective.
|
|
| 715 |
| Branch | Current role | When to use it |
|
| 716 |
| --- | --- | --- |
|
| 717 |
| Qwen3-Omni | First trainable multimodal LoRA pilot | Use for the selected 128-episode held-out baseline over video/audio/language plus sensor-bridge features. |
|
| 718 |
-
| Cosmos 3 | First world-model/action-generation branch | Use after data
|
| 719 |
| GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
|
| 720 |
| OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
|
| 721 |
| Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
|
|
|
|
| 84 |
| Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
|
| 85 |
| Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
|
| 86 |
|
| 87 |
+
## Download And Accounting
|
| 88 |
+
|
| 89 |
+
This repository is a baseline/artifact model repository rather than a single
|
| 90 |
+
Transformers checkpoint. The main weights live under task-specific paths such
|
| 91 |
+
as `artifacts/**/model.npz` and
|
| 92 |
+
`artifacts/episode_task_suite/neural_mlp/**/model.pt`.
|
| 93 |
+
|
| 94 |
+
For Hugging Face Hub download accounting, this repo includes a root
|
| 95 |
+
[`config.json`](config.json) as the canonical query file. The displayed
|
| 96 |
+
monthly download count can lag behind actual file requests, and direct browser
|
| 97 |
+
downloads of arbitrary nested files may not be reflected immediately.
|
| 98 |
+
|
| 99 |
+
Recommended programmatic access:
|
| 100 |
+
|
| 101 |
+
```python
|
| 102 |
+
from huggingface_hub import snapshot_download
|
| 103 |
+
|
| 104 |
+
local_dir = snapshot_download(
|
| 105 |
+
repo_id="cy0307/ropedia-xperience-10m-task-baselines",
|
| 106 |
+
allow_patterns=[
|
| 107 |
+
"config.json",
|
| 108 |
+
"artifacts/**/*.npz",
|
| 109 |
+
"artifacts/**/*.pt",
|
| 110 |
+
"artifacts/**/metrics.json",
|
| 111 |
+
"artifacts/**/*predictions*",
|
| 112 |
+
],
|
| 113 |
+
)
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
## Research Project Overview
|
| 117 |
|
| 118 |
| Theme | Current implementation |
|
|
|
|
| 208 |
The generated research takeaways are at
|
| 209 |
[`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
|
| 210 |
[`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
|
| 211 |
+
The research roadmap is at
|
| 212 |
[`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
|
| 213 |
[`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
|
| 214 |
The foundation-model selection plan is at
|
|
|
|
| 381 |
| Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
|
| 382 |
| Task surface integrity | `docs/data/task_surface_integrity.json` | Checks the public task cards, readable task names, representative modality thumbnails, and interactive walkthrough storyboard |
|
| 383 |
| Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json` | Records the browser-level page load, tab navigation, walkthrough deep link, player interaction, and console-health result |
|
| 384 |
+
| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode and larger omni-model work |
|
| 385 |
| Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
|
| 386 |
| Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
|
| 387 |
| Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
|
|
|
|
| 510 |
data/website_integrity.json # machine-readable website integrity check
|
| 511 |
data/project_manifest.json # machine-readable public-surface metadata
|
| 512 |
data/project_packet.json # machine-readable project path and scope summary
|
| 513 |
+
data/research_roadmap.json # multi-episode and omni-model roadmap
|
| 514 |
data/research_directions.json # four-track website data bundle
|
| 515 |
data/research_direction_extensions.json # four extra probe data bundle
|
| 516 |
data/task_walkthroughs.json # human-readable task-card and walkthrough-storyboard data
|
|
|
|
| 634 |
The figure shows the intended end-to-end training flow: raw valid episodes enter
|
| 635 |
episode-level split validation, parallel media/sensor export creates Qwen-style
|
| 636 |
JSONL records, Qwen3-Omni receives video/audio/text directly, the sensor bridge
|
| 637 |
+
adds depth/pose/mocap/IMU features, LoRA adapters are trained on prepared
|
| 638 |
train/val episodes, and sealed held-out test evaluation produces predictions,
|
| 639 |
metrics, run reports, and upload-ready adapter artifacts.
|
| 640 |
|
| 641 |
The current scale-up artifacts show that the export, manifest, sensor-feature,
|
| 642 |
LoRA, and evaluation scripts can run on the available sample episode. They do
|
| 643 |
+
not show a real multi-episode result. A real pilot requires valid prepared
|
| 644 |
episodes, held-out episode splits, training metadata, predictions, metrics, and
|
| 645 |
a run report; the current selected pilot target is 128 episodes.
|
| 646 |
|
|
|
|
| 682 |
- gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
|
| 683 |
- selected_episode_plan: 128 metadata-balanced episodes, 96/16/16 train/val/test
|
| 684 |
- selected_download_size: 277.71 GiB excluding `visualization.rrd`
|
| 685 |
+
- ready_for_held_out_pilot: false until the selected episodes are fully prepared and checked
|
| 686 |
- gated dataset: available for selected multi-episode data preparation
|
| 687 |
- source_discovery: `results/omni_finetune/source_discovery.json`
|
| 688 |
- data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
|
|
|
|
| 697 |
### Progressive Train/Validation Pilot
|
| 698 |
|
| 699 |
The selected 128-episode plan can be used before every episode has arrived by
|
| 700 |
+
training only on prepared `train` episodes and monitoring prepared `val` episodes.
|
| 701 |
The final `test` episodes stay sealed until the end, so early development does
|
| 702 |
not contaminate held-out evaluation.
|
| 703 |
|
|
|
|
| 717 |
and path-based split matching so repeated folder names such as `ep1` cannot
|
| 718 |
collide across different sessions.
|
| 719 |
|
| 720 |
+
For larger prepared subsets, `scripts/omni/run_trainval_parallel_export_8gpu.sh`
|
| 721 |
uses the same split guard, exports episodes in parallel CPU shards, skips and
|
| 722 |
reports episodes that contain no labeled windows under the configured label
|
| 723 |
rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
|
|
|
|
| 744 |
| Branch | Current role | When to use it |
|
| 745 |
| --- | --- | --- |
|
| 746 |
| Qwen3-Omni | First trainable multimodal LoRA pilot | Use for the selected 128-episode held-out baseline over video/audio/language plus sensor-bridge features. |
|
| 747 |
+
| Cosmos 3 | First world-model/action-generation branch | Use after data preparation for future-window prediction, action-conditioned world modeling, and synthetic-data usefulness tests. |
|
| 748 |
| GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
|
| 749 |
| OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
|
| 750 |
| Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
|