File size: 14,862 Bytes
ac3e830
 
 
 
d053290
ac3e830
 
 
 
 
53ff5d1
 
ac3e830
 
17c38d5
930bfac
a1205b3
ac3e830
709baf1
 
47429ce
709baf1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac3e830
 
 
a1205b3
ac3e830
 
9d6c33f
ac3e830
a1205b3
9d6c33f
b158b4d
 
9d6c33f
 
a8277a7
8a19bcd
9d6c33f
557a248
8a19bcd
9d6c33f
 
2147cf7
ac3e830
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# Task Method 20-Result Matrix

Every method has one record for each of the 20 unified task contracts. Numeric scores appear only where a committed runner or verified package produced that task target.

Legend: `score` = direct numeric task score and `proxy` = documented compact substitute target. The current public matrix is complete at 180/180 scored records; unsupported/not-evaluated labels are retained only for future regression audits.

| Method | Records | Scored | Proxy scored | Scoreless | Status counts |
| --- | ---: | ---: | ---: | ---: | --- |
| Minimal | 20 | 20 | 0 | 0 | scored 20 |
| Neural MLP | 20 | 20 | 0 | 0 | scored 20 |
| 128ep Aligned Simple | 20 | 20 | 1 | 0 | proxy scored 1, scored 19 |
| 128ep Aligned NN | 20 | 20 | 1 | 0 | proxy scored 1, scored 19 |
| 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
| 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
| Qwen3-Omni v6 LoRA | 20 | 20 | 0 | 0 | scored 20 |
| Cosmos3-Super Reasoner | 20 | 20 | 0 | 0 | scored 20 |
| Cosmos3-Nano Future Window | 20 | 20 | 0 | 0 | scored 20 |

## Compact Score Matrix

Cells show `raw metric value`, then `direct/proxy; normalized radar value; metric key`. The raw metric is the value to cite; the normalized value is the exact linear 0-1 score retained in JSON. The SVG radar uses sqrt(normalized score) only for visual radius, so low but real differences remain visible without changing the table values.

| # | Task | Min | NN | 128-S | 128-NN | 128-RS | 128-RN | Qwen3 | C3-S | C3-N |
| ---: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 01 | Action Recognition | 0.0500<br><sub>direct; norm 0.050; macro_f1</sub> | 0.0148<br><sub>direct; norm 0.015; macro_f1</sub> | 0.0083<br><sub>direct; norm 0.008; macro_f1</sub> | 0.0042<br><sub>direct; norm 0.004; macro_f1</sub> | 0.0029<br><sub>direct; norm 0.003; macro_f1</sub> | 0.0015<br><sub>direct; norm 0.001; macro_f1</sub> | 0.0029<br><sub>direct; norm 0.003; action_macro_f1</sub> | 0.0008<br><sub>direct; norm 0.001; action_macro_f1</sub> | 0.0079<br><sub>direct; norm 0.008; action_accuracy_from_retrieved_future</sub> |
| 02 | Procedure Step Recognition | 0.0506<br><sub>direct; norm 0.051; macro_f1</sub> | 0.0281<br><sub>direct; norm 0.028; macro_f1</sub> | 0.0002<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0001<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0001<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0037<br><sub>direct; norm 0.004; subtask_accuracy</sub> | 0.0000<br><sub>direct; norm 0.000; subtask_accuracy</sub> | 0.0000<br><sub>direct; norm 0.000; timeline_subtask_macro_f1</sub> |
| 03 | Action Boundary Detection | 0.6118<br><sub>direct; norm 0.612; macro_f1</sub> | 0.5862<br><sub>direct; norm 0.586; macro_f1</sub> | 0.2965<br><sub>direct; norm 0.297; macro_f1</sub> | 0.4842<br><sub>direct; norm 0.484; macro_f1</sub> | 0.4204<br><sub>direct; norm 0.420; macro_f1</sub> | 0.4902<br><sub>direct; norm 0.490; macro_f1</sub> | 0.9898<br><sub>direct; norm 0.990; transition_accuracy</sub> | 0.3683<br><sub>direct; norm 0.368; transition_accuracy</sub> | 0.9683<br><sub>direct; norm 0.968; transition_accuracy</sub> |
| 04 | Next-Action Prediction | 0.0593<br><sub>direct; norm 0.059; macro_f1</sub> | 0.0419<br><sub>direct; norm 0.042; macro_f1</sub> | 0.0065<br><sub>direct; norm 0.007; macro_f1</sub> | 0.0049<br><sub>direct; norm 0.005; macro_f1</sub> | 0.0033<br><sub>direct; norm 0.003; macro_f1</sub> | 0.0018<br><sub>direct; norm 0.002; macro_f1</sub> | 0.0431<br><sub>direct; norm 0.043; next_action_accuracy</sub> | 0.0134<br><sub>direct; norm 0.013; next_action_accuracy</sub> | 0.0079<br><sub>direct; norm 0.008; action_accuracy_from_retrieved_future</sub> |
| 05 | Hand Trajectory Forecasting | 0.8647<br><sub>direct; norm 0.125; mpjpe</sub> | 0.1079<br><sub>direct; norm 1.000; mpjpe</sub> | 8.817<br><sub>direct; norm 0.012; mpjpe</sub> | 0.4294<br><sub>direct; norm 0.251; mpjpe</sub> | 0.2729<br><sub>direct; norm 0.395; mae</sub> | 0.1848<br><sub>direct; norm 0.584; mae</sub> | 0.7216<br><sub>direct; norm 0.149; hand_trajectory_forecast_mrr</sub> | 0.8915<br><sub>direct; norm 0.121; hand_trajectory_forecast_mrr</sub> | 0.6913<br><sub>direct; norm 0.156; hand_trajectory_forecast_mrr</sub> |
| 06 | Contact State Prediction | 1.000<br><sub>direct; norm 1.000; macro_f1</sub> | 1.000<br><sub>direct; norm 1.000; macro_f1</sub> | 0.4381<br><sub>direct; norm 0.438; macro_f1</sub> | 0.5683<br><sub>direct; norm 0.568; macro_f1</sub> | 0.8870<br><sub>direct; norm 0.887; macro_f1</sub> | 1.000<br><sub>direct; norm 1.000; macro_f1</sub> | 0.8177<br><sub>direct; norm 0.818; contact_accuracy</sub> | 0.3214<br><sub>direct; norm 0.321; contact_accuracy</sub> | 0.7434<br><sub>direct; norm 0.743; contact_accuracy</sub> |
| 07 | Object Relevance Prediction | 0.1803<br><sub>direct; norm 0.180; micro_f1</sub> | 0.1679<br><sub>direct; norm 0.168; micro_f1</sub> | 0.1776<br><sub>direct; norm 0.178; micro_f1</sub> | 0.1866<br><sub>direct; norm 0.187; micro_f1</sub> | 0.0655<br><sub>direct; norm 0.066; micro_f1</sub> | 0.1766<br><sub>direct; norm 0.177; micro_f1</sub> | 0.3065<br><sub>direct; norm 0.306; object_micro_f1</sub> | 0.1370<br><sub>direct; norm 0.137; object_micro_f1</sub> | 0.0005<br><sub>direct; norm 0.000; object_relevance_micro_f1</sub> |
| 08 | Language Grounding | 0.0160<br><sub>direct; norm 0.016; mrr</sub> | 0.0168<br><sub>direct; norm 0.017; mrr</sub> | 0.0023<br><sub>direct; norm 0.002; mrr</sub> | 0.0082<br><sub>direct; norm 0.008; mrr</sub> | 0.0111<br><sub>direct; norm 0.011; mrr</sub> | 0.0063<br><sub>direct; norm 0.006; mrr</sub> | 0.8764<br><sub>direct; norm 0.876; caption_grounding_mrr</sub> | 0.3064<br><sub>direct; norm 0.306; caption_grounding_iou</sub> | 0.5221<br><sub>direct; norm 0.522; caption_grounding_mrr</sub> |
| 09 | Cross-Modal Retrieval | 0.2693<br><sub>direct; norm 0.269; mrr</sub> | 0.1300<br><sub>direct; norm 0.130; mrr</sub> | 0.0026<br><sub>direct; norm 0.003; mrr</sub> | 0.0026<br><sub>direct; norm 0.003; mrr</sub> | 0.0035<br><sub>direct; norm 0.003; mrr</sub> | 0.0025<br><sub>direct; norm 0.003; mrr</sub> | 0.5080<br><sub>direct; norm 0.508; cross_modal_retrieval_mrr</sub> | 0.6628<br><sub>direct; norm 0.663; cross_modal_retrieval_mrr</sub> | 0.0221<br><sub>direct; norm 0.022; future_retrieval_mrr</sub> |
| 10 | Cross-Modal Reconstruction | -0.0153<br><sub>direct; norm 0.000; r2</sub> | -0.0102<br><sub>direct; norm 0.000; r2</sub> | -190.66<br><sub>direct; norm 0.000; r2</sub> | -0.4348<br><sub>direct; norm 0.000; r2</sub> | -1.345<br><sub>direct; norm 0.000; r2</sub> | -1.397<br><sub>direct; norm 0.000; r2</sub> | 0.9671<br><sub>direct; norm 0.967; modality_reconstruction_mrr</sub> | 0.9939<br><sub>direct; norm 0.994; modality_reconstruction_mrr</sub> | 0.0003<br><sub>direct; norm 0.000; feature_reconstruction_quality</sub> |
| 11 | Temporal Order Verification | 0.5400<br><sub>direct; norm 0.540; f1</sub> | 0.8520<br><sub>direct; norm 0.852; f1</sub> | 0.4199<br><sub>direct; norm 0.420; f1</sub> | 0.8252<br><sub>direct; norm 0.825; f1</sub> | 0.4982<br><sub>direct; norm 0.498; macro_f1</sub> | 0.8030<br><sub>direct; norm 0.803; macro_f1</sub> | 0.4098<br><sub>direct; norm 0.410; temporal_order_f1</sub> | 0.6286<br><sub>direct; norm 0.629; temporal_order_f1</sub> | 0.5954<br><sub>direct; norm 0.595; temporal_order_f1</sub> |
| 12 | Multimodal Synchronization Detection | 0.5052<br><sub>direct; norm 0.505; f1</sub> | 0.7153<br><sub>direct; norm 0.715; f1</sub> | 0.4998<br><sub>direct; norm 0.500; f1</sub> | 0.7774<br><sub>direct; norm 0.777; f1</sub> | 0.4959<br><sub>direct; norm 0.496; macro_f1</sub> | 0.8273<br><sub>direct; norm 0.827; macro_f1</sub> | 0.3345<br><sub>direct; norm 0.334; misalignment_detection_f1</sub> | 0.3727<br><sub>direct; norm 0.373; misalignment_detection_f1</sub> | 0.4772<br><sub>direct; norm 0.477; misalignment_detection_f1</sub> |
| 13 | Long-Horizon Next-Action Forecasting | 0.0750<br><sub>direct; norm 0.075; macro_f1</sub> | 0.0655<br><sub>direct; norm 0.065; macro_f1</sub> | 0.0046<br><sub>direct; norm 0.005; macro_f1</sub> | 0.0030<br><sub>direct; norm 0.003; macro_f1</sub> | 0.0024<br><sub>direct; norm 0.002; macro_f1</sub> | 0.0011<br><sub>direct; norm 0.001; macro_f1</sub> | 0.0023<br><sub>direct; norm 0.002; long_horizon_next_action_macro_f1</sub> | 0.0088<br><sub>direct; norm 0.009; long_horizon_next_action_macro_f1</sub> | 0.0025<br><sub>direct; norm 0.002; long_horizon_next_action_macro_f1</sub> |
| 14 | Long-Horizon Next-Subtask Forecasting | 0.0455<br><sub>direct; norm 0.045; macro_f1</sub> | 0.0507<br><sub>direct; norm 0.051; macro_f1</sub> | 0.0001<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0042<br><sub>direct; norm 0.004; next_subtask_forecast_macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; next_subtask_forecast_macro_f1</sub> | 0.0066<br><sub>direct; norm 0.007; next_subtask_forecast_macro_f1</sub> |
| 15 | Interaction Text Prediction | 0.0444<br><sub>direct; norm 0.044; macro_f1</sub> | 0.0381<br><sub>direct; norm 0.038; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0126<br><sub>proxy; norm 0.013; macro_f1</sub> | 0.0098<br><sub>proxy; norm 0.010; macro_f1</sub> | 0.4319<br><sub>direct; norm 0.432; macro_f1</sub> | 0.1795<br><sub>direct; norm 0.179; macro_f1</sub> | 0.1788<br><sub>direct; norm 0.179; macro_f1</sub> |
| 16 | Action-Object Relation Prediction | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; macro_f1</sub> | 0.0002<br><sub>direct; norm 0.000; action_object_relation_macro_f1</sub> | 0.0000<br><sub>direct; norm 0.000; action_object_relation_macro_f1</sub> | 0.0028<br><sub>direct; norm 0.003; action_object_relation_macro_f1</sub> |
| 17 | Future Object-Set Forecasting | 0.1694<br><sub>direct; norm 0.169; micro_f1</sub> | 0.1972<br><sub>direct; norm 0.197; micro_f1</sub> | 0.1766<br><sub>direct; norm 0.177; micro_f1</sub> | 0.1742<br><sub>direct; norm 0.174; micro_f1</sub> | 0.0647<br><sub>direct; norm 0.065; micro_f1</sub> | 0.1752<br><sub>direct; norm 0.175; micro_f1</sub> | 0.1659<br><sub>direct; norm 0.166; object_set_forecast_micro_f1</sub> | 0.0009<br><sub>direct; norm 0.001; object_set_forecast_micro_f1</sub> | 0.0178<br><sub>direct; norm 0.018; object_set_forecast_micro_f1</sub> |
| 18 | IMU-to-Hand Pose Reconstruction | 0.0420<br><sub>direct; norm 1.000; mae</sub> | 0.0426<br><sub>direct; norm 0.988; mae</sub> | 0.2295<br><sub>direct; norm 0.183; mae</sub> | 0.2556<br><sub>direct; norm 0.165; mae</sub> | 0.2294<br><sub>direct; norm 0.183; mae</sub> | 0.2530<br><sub>direct; norm 0.166; mae</sub> | 0.9642<br><sub>direct; norm 0.044; imu_to_hand_pose_mrr</sub> | 0.9897<br><sub>direct; norm 0.042; imu_to_hand_pose_mrr</sub> | 0.9920<br><sub>direct; norm 0.042; imu_to_hand_pose_mrr</sub> |
| 19 | Camera-View Synchronization Retrieval | 0.4943<br><sub>direct; norm 0.494; mrr</sub> | 0.2409<br><sub>direct; norm 0.241; mrr</sub> | 0.0021<br><sub>proxy; norm 0.002; mrr</sub> | 0.0027<br><sub>proxy; norm 0.003; mrr</sub> | 0.0027<br><sub>proxy; norm 0.003; mrr</sub> | 0.0025<br><sub>proxy; norm 0.003; mrr</sub> | 0.6588<br><sub>direct; norm 0.659; camera_view_sync_retrieval_mrr</sub> | 0.9980<br><sub>direct; norm 0.998; camera_view_sync_retrieval_mrr</sub> | 0.9990<br><sub>direct; norm 0.999; camera_view_sync_retrieval_mrr</sub> |
| 20 | Time-to-Next-Transition Regression | 10.54<br><sub>direct; norm 1.000; mae</sub> | 10.55<br><sub>direct; norm 0.998; mae</sub> | 624.81<br><sub>direct; norm 0.017; mae</sub> | 41.47<br><sub>direct; norm 0.254; mae</sub> | 52.33<br><sub>direct; norm 0.201; mae</sub> | 42.37<br><sub>direct; norm 0.249; mae</sub> | 134.07<br><sub>direct; norm 0.079; time_to_transition_mae</sub> | 52.95<br><sub>direct; norm 0.199; time_to_transition_mae</sub> | 33.81<br><sub>direct; norm 0.312; time_to_transition_mae</sub> |

## Status Matrix

| # | Task | Min | NN | 128-S | 128-NN | 128-RS | 128-RN | Qwen3 | C3-S | C3-N |
| ---: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 01 | Action Recognition | score | score | score | score | score | score | score | score | score |
| 02 | Procedure Step Recognition | score | score | score | score | score | score | score | score | score |
| 03 | Action Boundary Detection | score | score | score | score | score | score | score | score | score |
| 04 | Next-Action Prediction | score | score | score | score | score | score | score | score | score |
| 05 | Hand Trajectory Forecasting | score | score | score | score | score | score | score | score | score |
| 06 | Contact State Prediction | score | score | score | score | score | score | score | score | score |
| 07 | Object Relevance Prediction | score | score | score | score | score | score | score | score | score |
| 08 | Language Grounding | score | score | score | score | score | score | score | score | score |
| 09 | Cross-Modal Retrieval | score | score | score | score | score | score | score | score | score |
| 10 | Cross-Modal Reconstruction | score | score | score | score | score | score | score | score | score |
| 11 | Temporal Order Verification | score | score | score | score | score | score | score | score | score |
| 12 | Multimodal Synchronization Detection | score | score | score | score | score | score | score | score | score |
| 13 | Long-Horizon Next-Action Forecasting | score | score | score | score | score | score | score | score | score |
| 14 | Long-Horizon Next-Subtask Forecasting | score | score | score | score | score | score | score | score | score |
| 15 | Interaction Text Prediction | score | score | score | score | proxy | proxy | score | score | score |
| 16 | Action-Object Relation Prediction | score | score | score | score | score | score | score | score | score |
| 17 | Future Object-Set Forecasting | score | score | score | score | score | score | score | score | score |
| 18 | IMU-to-Hand Pose Reconstruction | score | score | score | score | score | score | score | score | score |
| 19 | Camera-View Synchronization Retrieval | score | score | proxy | proxy | proxy | proxy | score | score | score |
| 20 | Time-to-Next-Transition Regression | score | score | score | score | score | score | score | score | score |

Sources and raw values are in `docs/data/task_method_20_result_matrix.json` and `docs/data/unified_task_model_radar.json`.