ropedia-xperience-10m-task-baselines / TASK_METHOD_20_RESULT_MATRIX.md
cy0307's picture
Add files using upload-large-folder tool
47429ce verified
|
Raw
History Blame
14.9 kB

Task Method 20-Result Matrix

Every method has one record for each of the 20 unified task contracts. Numeric scores appear only where a committed runner or verified package produced that task target.

Legend: score = direct numeric task score and proxy = documented compact substitute target. The current public matrix is complete at 180/180 scored records; unsupported/not-evaluated labels are retained only for future regression audits.

Method Records Scored Proxy scored Scoreless Status counts
Minimal 20 20 0 0 scored 20
Neural MLP 20 20 0 0 scored 20
128ep Aligned Simple 20 20 1 0 proxy scored 1, scored 19
128ep Aligned NN 20 20 1 0 proxy scored 1, scored 19
128ep Raw Simple 20 20 2 0 proxy scored 2, scored 18
128ep Raw NN 20 20 2 0 proxy scored 2, scored 18
Qwen3-Omni v6 LoRA 20 20 0 0 scored 20
Cosmos3-Super Reasoner 20 20 0 0 scored 20
Cosmos3-Nano Future Window 20 20 0 0 scored 20

Compact Score Matrix

Cells show raw metric value, then direct/proxy; normalized radar value; metric key. The raw metric is the value to cite; the normalized value is the exact linear 0-1 score retained in JSON. The SVG radar uses sqrt(normalized score) only for visual radius, so low but real differences remain visible without changing the table values.

# Task Min NN 128-S 128-NN 128-RS 128-RN Qwen3 C3-S C3-N
01 Action Recognition 0.0500
direct; norm 0.050; macro_f1
0.0148
direct; norm 0.015; macro_f1
0.0083
direct; norm 0.008; macro_f1
0.0042
direct; norm 0.004; macro_f1
0.0029
direct; norm 0.003; macro_f1
0.0015
direct; norm 0.001; macro_f1
0.0029
direct; norm 0.003; action_macro_f1
0.0008
direct; norm 0.001; action_macro_f1
0.0079
direct; norm 0.008; action_accuracy_from_retrieved_future
02 Procedure Step Recognition 0.0506
direct; norm 0.051; macro_f1
0.0281
direct; norm 0.028; macro_f1
0.0002
direct; norm 0.000; macro_f1
0.0001
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0001
direct; norm 0.000; macro_f1
0.0037
direct; norm 0.004; subtask_accuracy
0.0000
direct; norm 0.000; subtask_accuracy
0.0000
direct; norm 0.000; timeline_subtask_macro_f1
03 Action Boundary Detection 0.6118
direct; norm 0.612; macro_f1
0.5862
direct; norm 0.586; macro_f1
0.2965
direct; norm 0.297; macro_f1
0.4842
direct; norm 0.484; macro_f1
0.4204
direct; norm 0.420; macro_f1
0.4902
direct; norm 0.490; macro_f1
0.9898
direct; norm 0.990; transition_accuracy
0.3683
direct; norm 0.368; transition_accuracy
0.9683
direct; norm 0.968; transition_accuracy
04 Next-Action Prediction 0.0593
direct; norm 0.059; macro_f1
0.0419
direct; norm 0.042; macro_f1
0.0065
direct; norm 0.007; macro_f1
0.0049
direct; norm 0.005; macro_f1
0.0033
direct; norm 0.003; macro_f1
0.0018
direct; norm 0.002; macro_f1
0.0431
direct; norm 0.043; next_action_accuracy
0.0134
direct; norm 0.013; next_action_accuracy
0.0079
direct; norm 0.008; action_accuracy_from_retrieved_future
05 Hand Trajectory Forecasting 0.8647
direct; norm 0.125; mpjpe
0.1079
direct; norm 1.000; mpjpe
8.817
direct; norm 0.012; mpjpe
0.4294
direct; norm 0.251; mpjpe
0.2729
direct; norm 0.395; mae
0.1848
direct; norm 0.584; mae
0.7216
direct; norm 0.149; hand_trajectory_forecast_mrr
0.8915
direct; norm 0.121; hand_trajectory_forecast_mrr
0.6913
direct; norm 0.156; hand_trajectory_forecast_mrr
06 Contact State Prediction 1.000
direct; norm 1.000; macro_f1
1.000
direct; norm 1.000; macro_f1
0.4381
direct; norm 0.438; macro_f1
0.5683
direct; norm 0.568; macro_f1
0.8870
direct; norm 0.887; macro_f1
1.000
direct; norm 1.000; macro_f1
0.8177
direct; norm 0.818; contact_accuracy
0.3214
direct; norm 0.321; contact_accuracy
0.7434
direct; norm 0.743; contact_accuracy
07 Object Relevance Prediction 0.1803
direct; norm 0.180; micro_f1
0.1679
direct; norm 0.168; micro_f1
0.1776
direct; norm 0.178; micro_f1
0.1866
direct; norm 0.187; micro_f1
0.0655
direct; norm 0.066; micro_f1
0.1766
direct; norm 0.177; micro_f1
0.3065
direct; norm 0.306; object_micro_f1
0.1370
direct; norm 0.137; object_micro_f1
0.0005
direct; norm 0.000; object_relevance_micro_f1
08 Language Grounding 0.0160
direct; norm 0.016; mrr
0.0168
direct; norm 0.017; mrr
0.0023
direct; norm 0.002; mrr
0.0082
direct; norm 0.008; mrr
0.0111
direct; norm 0.011; mrr
0.0063
direct; norm 0.006; mrr
0.8764
direct; norm 0.876; caption_grounding_mrr
0.3064
direct; norm 0.306; caption_grounding_iou
0.5221
direct; norm 0.522; caption_grounding_mrr
09 Cross-Modal Retrieval 0.2693
direct; norm 0.269; mrr
0.1300
direct; norm 0.130; mrr
0.0026
direct; norm 0.003; mrr
0.0026
direct; norm 0.003; mrr
0.0035
direct; norm 0.003; mrr
0.0025
direct; norm 0.003; mrr
0.5080
direct; norm 0.508; cross_modal_retrieval_mrr
0.6628
direct; norm 0.663; cross_modal_retrieval_mrr
0.0221
direct; norm 0.022; future_retrieval_mrr
10 Cross-Modal Reconstruction -0.0153
direct; norm 0.000; r2
-0.0102
direct; norm 0.000; r2
-190.66
direct; norm 0.000; r2
-0.4348
direct; norm 0.000; r2
-1.345
direct; norm 0.000; r2
-1.397
direct; norm 0.000; r2
0.9671
direct; norm 0.967; modality_reconstruction_mrr
0.9939
direct; norm 0.994; modality_reconstruction_mrr
0.0003
direct; norm 0.000; feature_reconstruction_quality
11 Temporal Order Verification 0.5400
direct; norm 0.540; f1
0.8520
direct; norm 0.852; f1
0.4199
direct; norm 0.420; f1
0.8252
direct; norm 0.825; f1
0.4982
direct; norm 0.498; macro_f1
0.8030
direct; norm 0.803; macro_f1
0.4098
direct; norm 0.410; temporal_order_f1
0.6286
direct; norm 0.629; temporal_order_f1
0.5954
direct; norm 0.595; temporal_order_f1
12 Multimodal Synchronization Detection 0.5052
direct; norm 0.505; f1
0.7153
direct; norm 0.715; f1
0.4998
direct; norm 0.500; f1
0.7774
direct; norm 0.777; f1
0.4959
direct; norm 0.496; macro_f1
0.8273
direct; norm 0.827; macro_f1
0.3345
direct; norm 0.334; misalignment_detection_f1
0.3727
direct; norm 0.373; misalignment_detection_f1
0.4772
direct; norm 0.477; misalignment_detection_f1
13 Long-Horizon Next-Action Forecasting 0.0750
direct; norm 0.075; macro_f1
0.0655
direct; norm 0.065; macro_f1
0.0046
direct; norm 0.005; macro_f1
0.0030
direct; norm 0.003; macro_f1
0.0024
direct; norm 0.002; macro_f1
0.0011
direct; norm 0.001; macro_f1
0.0023
direct; norm 0.002; long_horizon_next_action_macro_f1
0.0088
direct; norm 0.009; long_horizon_next_action_macro_f1
0.0025
direct; norm 0.002; long_horizon_next_action_macro_f1
14 Long-Horizon Next-Subtask Forecasting 0.0455
direct; norm 0.045; macro_f1
0.0507
direct; norm 0.051; macro_f1
0.0001
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0042
direct; norm 0.004; next_subtask_forecast_macro_f1
0.0000
direct; norm 0.000; next_subtask_forecast_macro_f1
0.0066
direct; norm 0.007; next_subtask_forecast_macro_f1
15 Interaction Text Prediction 0.0444
direct; norm 0.044; macro_f1
0.0381
direct; norm 0.038; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0126
proxy; norm 0.013; macro_f1
0.0098
proxy; norm 0.010; macro_f1
0.4319
direct; norm 0.432; macro_f1
0.1795
direct; norm 0.179; macro_f1
0.1788
direct; norm 0.179; macro_f1
16 Action-Object Relation Prediction 0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0000
direct; norm 0.000; macro_f1
0.0002
direct; norm 0.000; action_object_relation_macro_f1
0.0000
direct; norm 0.000; action_object_relation_macro_f1
0.0028
direct; norm 0.003; action_object_relation_macro_f1
17 Future Object-Set Forecasting 0.1694
direct; norm 0.169; micro_f1
0.1972
direct; norm 0.197; micro_f1
0.1766
direct; norm 0.177; micro_f1
0.1742
direct; norm 0.174; micro_f1
0.0647
direct; norm 0.065; micro_f1
0.1752
direct; norm 0.175; micro_f1
0.1659
direct; norm 0.166; object_set_forecast_micro_f1
0.0009
direct; norm 0.001; object_set_forecast_micro_f1
0.0178
direct; norm 0.018; object_set_forecast_micro_f1
18 IMU-to-Hand Pose Reconstruction 0.0420
direct; norm 1.000; mae
0.0426
direct; norm 0.988; mae
0.2295
direct; norm 0.183; mae
0.2556
direct; norm 0.165; mae
0.2294
direct; norm 0.183; mae
0.2530
direct; norm 0.166; mae
0.9642
direct; norm 0.044; imu_to_hand_pose_mrr
0.9897
direct; norm 0.042; imu_to_hand_pose_mrr
0.9920
direct; norm 0.042; imu_to_hand_pose_mrr
19 Camera-View Synchronization Retrieval 0.4943
direct; norm 0.494; mrr
0.2409
direct; norm 0.241; mrr
0.0021
proxy; norm 0.002; mrr
0.0027
proxy; norm 0.003; mrr
0.0027
proxy; norm 0.003; mrr
0.0025
proxy; norm 0.003; mrr
0.6588
direct; norm 0.659; camera_view_sync_retrieval_mrr
0.9980
direct; norm 0.998; camera_view_sync_retrieval_mrr
0.9990
direct; norm 0.999; camera_view_sync_retrieval_mrr
20 Time-to-Next-Transition Regression 10.54
direct; norm 1.000; mae
10.55
direct; norm 0.998; mae
624.81
direct; norm 0.017; mae
41.47
direct; norm 0.254; mae
52.33
direct; norm 0.201; mae
42.37
direct; norm 0.249; mae
134.07
direct; norm 0.079; time_to_transition_mae
52.95
direct; norm 0.199; time_to_transition_mae
33.81
direct; norm 0.312; time_to_transition_mae

Status Matrix

# Task Min NN 128-S 128-NN 128-RS 128-RN Qwen3 C3-S C3-N
01 Action Recognition score score score score score score score score score
02 Procedure Step Recognition score score score score score score score score score
03 Action Boundary Detection score score score score score score score score score
04 Next-Action Prediction score score score score score score score score score
05 Hand Trajectory Forecasting score score score score score score score score score
06 Contact State Prediction score score score score score score score score score
07 Object Relevance Prediction score score score score score score score score score
08 Language Grounding score score score score score score score score score
09 Cross-Modal Retrieval score score score score score score score score score
10 Cross-Modal Reconstruction score score score score score score score score score
11 Temporal Order Verification score score score score score score score score score
12 Multimodal Synchronization Detection score score score score score score score score score
13 Long-Horizon Next-Action Forecasting score score score score score score score score score
14 Long-Horizon Next-Subtask Forecasting score score score score score score score score score
15 Interaction Text Prediction score score score score proxy proxy score score score
16 Action-Object Relation Prediction score score score score score score score score score
17 Future Object-Set Forecasting score score score score score score score score score
18 IMU-to-Hand Pose Reconstruction score score score score score score score score score
19 Camera-View Synchronization Retrieval score score proxy proxy proxy proxy score score score
20 Time-to-Next-Transition Regression score score score score score score score score score

Sources and raw values are in docs/data/task_method_20_result_matrix.json and docs/data/unified_task_model_radar.json.