Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| # 128-Episode Task Suite Enhancement Pack | |
| Run id: `task_suite_enhancement_128_v1_20260608` | |
| This non-overwriting enhancement pack records how to push the current 128-episode task suite harder without adding more raw episodes. | |
| ## Current Evidence | |
| - Current public export windows: `3808` | |
| - Window split counts: `train 2848 / val 512 / test 448` | |
| - Selected episode split: `train 96 / val 16 / test 16` | |
| - Windowed episode ids in baseline CSV: `train 89 / val 16 / test 14` | |
| - Qwen3 v4 JSON validity: `1.0000` | |
| - Qwen3 v4 action macro-F1: `0.001868` | |
| - Qwen3 v4 subtask accuracy: `0.000000` | |
| - Qwen3 v4 unseen-label sample share: `0.7076` | |
| ## Dense-Window Scenarios | |
| | scenario | estimated windows | multiplier | role | | |
| | --- | ---: | ---: | --- | | |
| | `current_export` | 3808 | 1.0 | current public 128-episode JSON-task export | | |
| | `dense_20f_stride20` | 30422 | 7.99 | non-overlap dense coverage over each observed episode frame span | | |
| | `dense_20f_stride10` | 60725 | 15.95 | 2x overlap action/subtask densification | | |
| | `dense_20f_stride5` | 121331 | 31.86 | high-overlap action boundary and transition stress setting | | |
| | `medium_40f_stride20` | 30303 | 7.96 | subtask/procedure context window | | |
| | `long_80f_stride40` | 15067 | 3.96 | procedure and world-model context window | | |
| | `multiscale_20s10_40s20_80s40` | 106095 | 27.86 | recommended no-new-episode v5 export: short action windows plus medium/long procedure context | | |
| ## Highest-Priority Bottlenecks | |
| | task | priority | simple score | bottleneck | next action | | |
| | --- | --- | ---: | --- | --- | | |
| | Next-Action Prediction | highest | 0.000200 | fine-grained label explosion and held-out unseen labels | add hierarchical action/subtask families plus label-normalized scoring | | |
| | Action Recognition | highest | 0.000175 | fine-grained label explosion and held-out unseen labels | add hierarchical action/subtask families plus label-normalized scoring | | |
| | Procedure Step Recognition | highest | 0.000000 | fine-grained label explosion and held-out unseen labels | add hierarchical action/subtask families plus label-normalized scoring | | |
| | Cross-Modal Retrieval | high | | missing raw 128-episode feature blocks | export compact raw-feature shards for this task before model comparison | | |
| | Hand Trajectory Forecasting | high | | missing raw 128-episode feature blocks | export compact raw-feature shards for this task before model comparison | | |
| | Multimodal Synchronization Detection | high | | missing raw 128-episode feature blocks | export compact raw-feature shards for this task before model comparison | | |
| | Cross-Modal Reconstruction | high | | missing raw 128-episode feature blocks | export compact raw-feature shards for this task before model comparison | | |
| | Language Grounding | medium | 0.012786 | weak public-safe metadata/text baseline | add dense windows and stronger fusion baselines before interpreting model quality | | |
| ## Recommended Next Run | |
| Use `multiscale_20s10_40s20_80s40` as the next export target, then train a Qwen3 v5 hierarchical-target LoRA/partial-unfreeze run against the unchanged 96/16/16 episode split. | |
| In parallel, export compact raw 128-episode feature shards for trajectory, retrieval, reconstruction, and synchronization tasks so the simple and neural baselines can be fully aligned beyond the JSON-supported labels. | |
| The current artifacts remain the baseline; future runs should write new run ids and publish separate verified packages. | |