Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| <svg xmlns="http://www.w3.org/2000/svg" width="1320" height="624" viewBox="0 0 1320 624"> | |
| <rect width="100%" height="100%" fill="#07110d"/> | |
| <text x="36" y="42" fill="#e6f7ea" font-family="Arial, sans-serif" font-size="28" font-weight="700">Measured Audio Delta Across 12 Xperience-10M Tasks</text> | |
| <text x="36" y="70" fill="#a7b8ab" font-family="Arial, sans-serif" font-size="15">Positive means audio improved the task primary metric on the single public sample split.</text> | |
| <line x1="680" y1="92" x2="680" y2="600" stroke="#5b6f61" stroke-width="1"/> | |
| <text x="36" y="130" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Current Action Recognition</text> | |
| <rect x="680.00" y="112" width="0.10" height="22" rx="3" fill="#7ae5c3"/> | |
| <text x="950" y="129" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0003 macro_f1</text> | |
| <text x="36" y="172" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Current Subtask Recognition</text> | |
| <rect x="680.00" y="154" width="0.03" height="22" rx="3" fill="#7ae5c3"/> | |
| <text x="950" y="171" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0001 macro_f1</text> | |
| <text x="36" y="214" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Action Transition Detection</text> | |
| <rect x="677.58" y="196" width="2.42" height="22" rx="3" fill="#ff8a6a"/> | |
| <text x="950" y="213" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0066 macro_f1</text> | |
| <text x="36" y="256" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Next-Action Prediction</text> | |
| <rect x="679.95" y="238" width="0.05" height="22" rx="3" fill="#ff8a6a"/> | |
| <text x="950" y="255" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0001 macro_f1</text> | |
| <text x="36" y="298" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Future Hand Motion Forecasting</text> | |
| <rect x="620.17" y="280" width="59.83" height="22" rx="3" fill="#ff8a6a"/> | |
| <text x="950" y="297" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.1626 mae</text> | |
| <text x="36" y="340" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Contact State Prediction</text> | |
| <rect x="680.00" y="322" width="0.00" height="22" rx="3" fill="#7ae5c3"/> | |
| <text x="950" y="339" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0000 macro_f1</text> | |
| <text x="36" y="382" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Relevant Object Prediction</text> | |
| <rect x="680.00" y="364" width="3.75" height="22" rx="3" fill="#7ae5c3"/> | |
| <text x="950" y="381" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0102 micro_f1</text> | |
| <text x="36" y="424" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Language-to-Time Grounding</text> | |
| <rect x="680.00" y="406" width="1.79" height="22" rx="3" fill="#7ae5c3"/> | |
| <text x="950" y="423" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0049 mrr</text> | |
| <text x="36" y="466" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Cross-Modal Window Retrieval</text> | |
| <rect x="674.82" y="448" width="5.18" height="22" rx="3" fill="#ff8a6a"/> | |
| <text x="950" y="465" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0141 mrr</text> | |
| <text x="36" y="508" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Sensor-to-Visual Reconstruction</text> | |
| <rect x="680.00" y="490" width="240.00" height="22" rx="3" fill="#7ae5c3"/> | |
| <text x="950" y="507" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.6524 mae</text> | |
| <text x="36" y="550" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Temporal Order Verification</text> | |
| <rect x="680.00" y="532" width="8.46" height="22" rx="3" fill="#7ae5c3"/> | |
| <text x="950" y="549" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">+0.0230 macro_f1</text> | |
| <text x="36" y="592" fill="#d8eadc" font-family="Arial, sans-serif" font-size="15">Cross-Modal Misalignment Detection</text> | |
| <rect x="678.07" y="574" width="1.93" height="22" rx="3" fill="#ff8a6a"/> | |
| <text x="950" y="591" fill="#d8eadc" font-family="Arial, sans-serif" font-size="14">-0.0052 macro_f1</text> | |
| </svg> |