Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 14,862 Bytes
ac3e830 d053290 ac3e830 53ff5d1 ac3e830 17c38d5 930bfac a1205b3 ac3e830 709baf1 47429ce 709baf1 ac3e830 a1205b3 ac3e830 9d6c33f ac3e830 a1205b3 9d6c33f b158b4d 9d6c33f a8277a7 8a19bcd 9d6c33f 557a248 8a19bcd 9d6c33f 2147cf7 ac3e830 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | # Task Method 20-Result Matrix
Every method has one record for each of the 20 unified task contracts. Numeric scores appear only where a committed runner or verified package produced that task target.
Legend: `score` = direct numeric task score and `proxy` = documented compact substitute target. The current public matrix is complete at 180/180 scored records; unsupported/not-evaluated labels are retained only for future regression audits.
| Method | Records | Scored | Proxy scored | Scoreless | Status counts |
| --- | ---: | ---: | ---: | ---: | --- |
| Minimal | 20 | 20 | 0 | 0 | scored 20 |
| Neural MLP | 20 | 20 | 0 | 0 | scored 20 |
| 128ep Aligned Simple | 20 | 20 | 1 | 0 | proxy scored 1, scored 19 |
| 128ep Aligned NN | 20 | 20 | 1 | 0 | proxy scored 1, scored 19 |
| 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
| 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
| Qwen3-Omni v6 LoRA | 20 | 20 | 0 | 0 | scored 20 |
| Cosmos3-Super Reasoner | 20 | 20 | 0 | 0 | scored 20 |
| Cosmos3-Nano Future Window | 20 | 20 | 0 | 0 | scored 20 |
## Compact Score Matrix
Cells show `raw metric value`, then `direct/proxy; normalized radar value; metric key`. The raw metric is the value to cite; the normalized value is the exact linear 0-1 score retained in JSON. The SVG radar uses sqrt(normalized score) only for visual radius, so low but real differences remain visible without changing the table values.
| # | Task | Min | NN | 128-S | 128-NN | 128-RS | 128-RN | Qwen3 | C3-S | C3-N |
| ---: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 01 | Action Recognition | 0.0500<br><sub>direct; norm 0.050; macro_f1</sub> | 0.0148<br><sub>direct; norm 0.015; macro_f1</sub> | 0.0083<br><sub>direct; norm 0.008; macro_f1</sub> | 0.0042<br><sub>direct; norm 0.004; macro_f1</sub> | 0.0029<br><sub>direct; norm 0.003; macro_f1</sub> | 0.0015<br><sub>direct; norm 0.001; macro_f1</sub> | 0.0029<br><sub>direct; norm 0.003; action_macro_f1</sub> | 0.0008<br><sub>direct; norm 0.001; action_macro_f1</sub> | 0.0079<br><sub>direct; norm 0.008; action_accuracy_from_retrieved_future</sub> |
| 02 | Procedure Step Recognition | 0.0506<br><sub>direct; norm 0.051; macro_f1</sub> | 0.0281<br><sub>direct; norm 0.028; macro_f1</sub> | 0.0002<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0001<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0001<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0037<br><sub>direct; norm 0.004; subtask_accuracy</sub> | 0.0000<br><sub>direct; norm 0.000; subtask_accuracy</sub> | 0.0000<br><sub>direct; norm 0.000; timeline_subtask_macro_f1</sub> |
| 03 | Action Boundary Detection | 0.6118<br><sub>direct; norm 0.612; macro_f1</sub> | 0.5862<br><sub>direct; norm 0.586; macro_f1</sub> | 0.2965<br><sub>direct; norm 0.297; macro_f1</sub> | 0.4842<br><sub>direct; norm 0.484; macro_f1</sub> | 0.4204<br><sub>direct; norm 0.420; macro_f1</sub> | 0.4902<br><sub>direct; norm 0.490; macro_f1</sub> | 0.9898<br><sub>direct; norm 0.990; transition_accuracy</sub> | 0.3683<br><sub>direct; norm 0.368; transition_accuracy</sub> | 0.9683<br><sub>direct; norm 0.968; transition_accuracy</sub> |
| 04 | Next-Action Prediction | 0.0593<br><sub>direct; norm 0.059; macro_f1</sub> | 0.0419<br><sub>direct; norm 0.042; macro_f1</sub> | 0.0065<br><sub>direct; norm 0.007; macro_f1</sub> | 0.0049<br><sub>direct; norm 0.005; macro_f1</sub> | 0.0033<br><sub>direct; norm 0.003; macro_f1</sub> | 0.0018<br><sub>direct; norm 0.002; macro_f1</sub> | 0.0431<br><sub>direct; norm 0.043; next_action_accuracy</sub> | 0.0134<br><sub>direct; norm 0.013; next_action_accuracy</sub> | 0.0079<br><sub>direct; norm 0.008; action_accuracy_from_retrieved_future</sub> |
| 05 | Hand Trajectory Forecasting | 0.8647<br><sub>direct; norm 0.125; mpjpe</sub> | 0.1079<br><sub>direct; norm 1.000; mpjpe</sub> | 8.817<br><sub>direct; norm 0.012; mpjpe</sub> | 0.4294<br><sub>direct; norm 0.251; mpjpe</sub> | 0.2729<br><sub>direct; norm 0.395; mae</sub> | 0.1848<br><sub>direct; norm 0.584; mae</sub> | 0.7216<br><sub>direct; norm 0.149; hand_trajectory_forecast_mrr</sub> | 0.8915<br><sub>direct; norm 0.121; hand_trajectory_forecast_mrr</sub> | 0.6913<br><sub>direct; norm 0.156; hand_trajectory_forecast_mrr</sub> |
| 06 | Contact State Prediction | 1.000<br><sub>direct; norm 1.000; macro_f1</sub> | 1.000<br><sub>direct; norm 1.000; macro_f1</sub> | 0.4381<br><sub>direct; norm 0.438; macro_f1</sub> | 0.5683<br><sub>direct; norm 0.568; macro_f1</sub> | 0.8870<br><sub>direct; norm 0.887; macro_f1</sub> | 1.000<br><sub>direct; norm 1.000; macro_f1</sub> | 0.8177<br><sub>direct; norm 0.818; contact_accuracy</sub> | 0.3214<br><sub>direct; norm 0.321; contact_accuracy</sub> | 0.7434<br><sub>direct; norm 0.743; contact_accuracy</sub> |
| 07 | Object Relevance Prediction | 0.1803<br><sub>direct; norm 0.180; micro_f1</sub> | 0.1679<br><sub>direct; norm 0.168; micro_f1</sub> | 0.1776<br><sub>direct; norm 0.178; micro_f1</sub> | 0.1866<br><sub>direct; norm 0.187; micro_f1</sub> | 0.0655<br><sub>direct; norm 0.066; micro_f1</sub> | 0.1766<br><sub>direct; norm 0.177; micro_f1</sub> | 0.3065<br><sub>direct; norm 0.306; object_micro_f1</sub> | 0.1370<br><sub>direct; norm 0.137; object_micro_f1</sub> | 0.0005<br><sub>direct; norm 0.000; object_relevance_micro_f1</sub> |
| 08 | Language Grounding | 0.0160<br><sub>direct; norm 0.016; mrr</sub> | 0.0168<br><sub>direct; norm 0.017; mrr</sub> | 0.0023<br><sub>direct; norm 0.002; mrr</sub> | 0.0082<br><sub>direct; norm 0.008; mrr</sub> | 0.0111<br><sub>direct; norm 0.011; mrr</sub> | 0.0063<br><sub>direct; norm 0.006; mrr</sub> | 0.8764<br><sub>direct; norm 0.876; caption_grounding_mrr</sub> | 0.3064<br><sub>direct; norm 0.306; caption_grounding_iou</sub> | 0.5221<br><sub>direct; norm 0.522; caption_grounding_mrr</sub> |
| 09 | Cross-Modal Retrieval | 0.2693<br><sub>direct; norm 0.269; mrr</sub> | 0.1300<br><sub>direct; norm 0.130; mrr</sub> | 0.0026<br><sub>direct; norm 0.003; mrr</sub> | 0.0026<br><sub>direct; norm 0.003; mrr</sub> | 0.0035<br><sub>direct; norm 0.003; mrr</sub> | 0.0025<br><sub>direct; norm 0.003; mrr</sub> | 0.5080<br><sub>direct; norm 0.508; cross_modal_retrieval_mrr</sub> | 0.6628<br><sub>direct; norm 0.663; cross_modal_retrieval_mrr</sub> | 0.0221<br><sub>direct; norm 0.022; future_retrieval_mrr</sub> |
| 10 | Cross-Modal Reconstruction | -0.0153<br><sub>direct; norm 0.000; r2</sub> | -0.0102<br><sub>direct; norm 0.000; r2</sub> | -190.66<br><sub>direct; norm 0.000; r2</sub> | -0.4348<br><sub>direct; norm 0.000; r2</sub> | -1.345<br><sub>direct; norm 0.000; r2</sub> | -1.397<br><sub>direct; norm 0.000; r2</sub> | 0.9671<br><sub>direct; norm 0.967; modality_reconstruction_mrr</sub> | 0.9939<br><sub>direct; norm 0.994; modality_reconstruction_mrr</sub> | 0.0003<br><sub>direct; norm 0.000; feature_reconstruction_quality</sub> |
| 11 | Temporal Order Verification | 0.5400<br><sub>direct; norm 0.540; f1</sub> | 0.8520<br><sub>direct; norm 0.852; f1</sub> | 0.4199<br><sub>direct; norm 0.420; f1</sub> | 0.8252<br><sub>direct; norm 0.825; f1</sub> | 0.4982<br><sub>direct; norm 0.498; macro_f1</sub> | 0.8030<br><sub>direct; norm 0.803; macro_f1</sub> | 0.4098<br><sub>direct; norm 0.410; temporal_order_f1</sub> | 0.6286<br><sub>direct; norm 0.629; temporal_order_f1</sub> | 0.5954<br><sub>direct; norm 0.595; temporal_order_f1</sub> |
| 12 | Multimodal Synchronization Detection | 0.5052<br><sub>direct; norm 0.505; f1</sub> | 0.7153<br><sub>direct; norm 0.715; f1</sub> | 0.4998<br><sub>direct; norm 0.500; f1</sub> | 0.7774<br><sub>direct; norm 0.777; f1</sub> | 0.4959<br><sub>direct; norm 0.496; macro_f1</sub> | 0.8273<br><sub>direct; norm 0.827; macro_f1</sub> | 0.3345<br><sub>direct; norm 0.334; misalignment_detection_f1</sub> | 0.3727<br><sub>direct; norm 0.373; misalignment_detection_f1</sub> | 0.4772<br><sub>direct; norm 0.477; misalignment_detection_f1</sub> |
| 13 | Long-Horizon Next-Action Forecasting | 0.0750<br><sub>direct; norm 0.075; macro_f1</sub> | 0.0655<br><sub>direct; norm 0.065; macro_f1</sub> | 0.0046<br><sub>direct; norm 0.005; macro_f1</sub> | 0.0030<br><sub>direct; norm 0.003; macro_f1</sub> | 0.0024<br><sub>direct; norm 0.002; macro_f1</sub> | 0.0011<br><sub>direct; norm 0.001; macro_f1</sub> | 0.0023<br><sub>direct; norm 0.002; long_horizon_next_action_macro_f1</sub> | 0.0088<br><sub>direct; norm 0.009; long_horizon_next_action_macro_f1</sub> | 0.0025<br><sub>direct; norm 0.002; long_horizon_next_action_macro_f1</sub> |
| 14 | Long-Horizon Next-Subtask Forecasting | 0.0455<br><sub>direct; norm 0.045; macro_f1</sub> | 0.0507<br><sub>direct; norm 0.051; macro_f1</sub> | 0.0001<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0042<br><sub>direct; norm 0.004; next_subtask_forecast_macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; next_subtask_forecast_macro_f1</sub> | 0.0066<br><sub>direct; norm 0.007; next_subtask_forecast_macro_f1</sub> |
| 15 | Interaction Text Prediction | 0.0444<br><sub>direct; norm 0.044; macro_f1</sub> | 0.0381<br><sub>direct; norm 0.038; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0126<br><sub>proxy; norm 0.013; macro_f1</sub> | 0.0098<br><sub>proxy; norm 0.010; macro_f1</sub> | 0.4319<br><sub>direct; norm 0.432; macro_f1</sub> | 0.1795<br><sub>direct; norm 0.179; macro_f1</sub> | 0.1788<br><sub>direct; norm 0.179; macro_f1</sub> |
| 16 | Action-Object Relation Prediction | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0002<br><sub>direct; norm 0.000; action_object_relation_macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; action_object_relation_macro_f1</sub> | 0.0028<br><sub>direct; norm 0.003; action_object_relation_macro_f1</sub> |
| 17 | Future Object-Set Forecasting | 0.1694<br><sub>direct; norm 0.169; micro_f1</sub> | 0.1972<br><sub>direct; norm 0.197; micro_f1</sub> | 0.1766<br><sub>direct; norm 0.177; micro_f1</sub> | 0.1742<br><sub>direct; norm 0.174; micro_f1</sub> | 0.0647<br><sub>direct; norm 0.065; micro_f1</sub> | 0.1752<br><sub>direct; norm 0.175; micro_f1</sub> | 0.1659<br><sub>direct; norm 0.166; object_set_forecast_micro_f1</sub> | 0.0009<br><sub>direct; norm 0.001; object_set_forecast_micro_f1</sub> | 0.0178<br><sub>direct; norm 0.018; object_set_forecast_micro_f1</sub> |
| 18 | IMU-to-Hand Pose Reconstruction | 0.0420<br><sub>direct; norm 1.000; mae</sub> | 0.0426<br><sub>direct; norm 0.988; mae</sub> | 0.2295<br><sub>direct; norm 0.183; mae</sub> | 0.2556<br><sub>direct; norm 0.165; mae</sub> | 0.2294<br><sub>direct; norm 0.183; mae</sub> | 0.2530<br><sub>direct; norm 0.166; mae</sub> | 0.9642<br><sub>direct; norm 0.044; imu_to_hand_pose_mrr</sub> | 0.9897<br><sub>direct; norm 0.042; imu_to_hand_pose_mrr</sub> | 0.9920<br><sub>direct; norm 0.042; imu_to_hand_pose_mrr</sub> |
| 19 | Camera-View Synchronization Retrieval | 0.4943<br><sub>direct; norm 0.494; mrr</sub> | 0.2409<br><sub>direct; norm 0.241; mrr</sub> | 0.0021<br><sub>proxy; norm 0.002; mrr</sub> | 0.0027<br><sub>proxy; norm 0.003; mrr</sub> | 0.0027<br><sub>proxy; norm 0.003; mrr</sub> | 0.0025<br><sub>proxy; norm 0.003; mrr</sub> | 0.6588<br><sub>direct; norm 0.659; camera_view_sync_retrieval_mrr</sub> | 0.9980<br><sub>direct; norm 0.998; camera_view_sync_retrieval_mrr</sub> | 0.9990<br><sub>direct; norm 0.999; camera_view_sync_retrieval_mrr</sub> |
| 20 | Time-to-Next-Transition Regression | 10.54<br><sub>direct; norm 1.000; mae</sub> | 10.55<br><sub>direct; norm 0.998; mae</sub> | 624.81<br><sub>direct; norm 0.017; mae</sub> | 41.47<br><sub>direct; norm 0.254; mae</sub> | 52.33<br><sub>direct; norm 0.201; mae</sub> | 42.37<br><sub>direct; norm 0.249; mae</sub> | 134.07<br><sub>direct; norm 0.079; time_to_transition_mae</sub> | 52.95<br><sub>direct; norm 0.199; time_to_transition_mae</sub> | 33.81<br><sub>direct; norm 0.312; time_to_transition_mae</sub> |
## Status Matrix
| # | Task | Min | NN | 128-S | 128-NN | 128-RS | 128-RN | Qwen3 | C3-S | C3-N |
| ---: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 01 | Action Recognition | score | score | score | score | score | score | score | score | score |
| 02 | Procedure Step Recognition | score | score | score | score | score | score | score | score | score |
| 03 | Action Boundary Detection | score | score | score | score | score | score | score | score | score |
| 04 | Next-Action Prediction | score | score | score | score | score | score | score | score | score |
| 05 | Hand Trajectory Forecasting | score | score | score | score | score | score | score | score | score |
| 06 | Contact State Prediction | score | score | score | score | score | score | score | score | score |
| 07 | Object Relevance Prediction | score | score | score | score | score | score | score | score | score |
| 08 | Language Grounding | score | score | score | score | score | score | score | score | score |
| 09 | Cross-Modal Retrieval | score | score | score | score | score | score | score | score | score |
| 10 | Cross-Modal Reconstruction | score | score | score | score | score | score | score | score | score |
| 11 | Temporal Order Verification | score | score | score | score | score | score | score | score | score |
| 12 | Multimodal Synchronization Detection | score | score | score | score | score | score | score | score | score |
| 13 | Long-Horizon Next-Action Forecasting | score | score | score | score | score | score | score | score | score |
| 14 | Long-Horizon Next-Subtask Forecasting | score | score | score | score | score | score | score | score | score |
| 15 | Interaction Text Prediction | score | score | score | score | proxy | proxy | score | score | score |
| 16 | Action-Object Relation Prediction | score | score | score | score | score | score | score | score | score |
| 17 | Future Object-Set Forecasting | score | score | score | score | score | score | score | score | score |
| 18 | IMU-to-Hand Pose Reconstruction | score | score | score | score | score | score | score | score | score |
| 19 | Camera-View Synchronization Retrieval | score | score | proxy | proxy | proxy | proxy | score | score | score |
| 20 | Time-to-Next-Transition Regression | score | score | score | score | score | score | score | score | score |
Sources and raw values are in `docs/data/task_method_20_result_matrix.json` and `docs/data/unified_task_model_radar.json`.
|