cy0307 commited on
Commit
eeaf70e
·
verified ·
1 Parent(s): 581a553

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. TASK_METHOD_20_GAP_AUDIT.md +16 -20
  2. TASK_METHOD_20_RESULT_MATRIX.md +10 -10
  3. data/artifact_index.json +26 -26
  4. data/episode128_task_model_radar.json +190 -191
  5. data/mirror_parity.json +0 -0
  6. data/public_surface_qa.json +4 -4
  7. data/publication_audit.json +9 -9
  8. data/quality_gates.json +1 -1
  9. data/scope_claims_audit.json +1 -1
  10. data/single_episode_task_model_radar.json +1 -1
  11. data/source_alignment_audit.json +1 -1
  12. data/task_method_20_gap_audit.json +93 -145
  13. data/task_method_20_result_matrix.json +108 -109
  14. data/task_surface_integrity.json +1 -1
  15. data/unified_task_model_radar.json +253 -254
  16. data/website_integrity.json +8 -8
  17. docs/data/artifact_index.json +26 -26
  18. docs/data/episode128_task_model_radar.json +190 -191
  19. docs/data/mirror_parity.json +0 -0
  20. docs/data/public_surface_qa.json +4 -4
  21. docs/data/publication_audit.json +9 -9
  22. docs/data/quality_gates.json +1 -1
  23. docs/data/scope_claims_audit.json +1 -1
  24. docs/data/single_episode_task_model_radar.json +1 -1
  25. docs/data/source_alignment_audit.json +1 -1
  26. docs/data/task_method_20_gap_audit.json +93 -145
  27. docs/data/task_method_20_result_matrix.json +108 -109
  28. docs/data/task_surface_integrity.json +1 -1
  29. docs/data/unified_task_model_radar.json +253 -254
  30. docs/data/website_integrity.json +8 -8
  31. metrics/artifact_index.json +26 -26
  32. metrics/episode128_task_model_radar.json +190 -191
  33. metrics/mirror_parity.json +0 -0
  34. metrics/public_surface_qa.json +4 -4
  35. metrics/publication_audit.json +9 -9
  36. metrics/quality_gates.json +1 -1
  37. metrics/scope_claims_audit.json +1 -1
  38. metrics/single_episode_task_model_radar.json +1 -1
  39. metrics/source_alignment_audit.json +1 -1
  40. metrics/task_method_20_gap_audit.json +93 -145
  41. metrics/task_method_20_result_matrix.json +108 -109
  42. metrics/task_surface_integrity.json +1 -1
  43. metrics/unified_task_model_radar.json +253 -254
  44. metrics/website_integrity.json +8 -8
  45. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/BASELINE_ALIGNMENT_REPORT.md +8 -0
  46. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json +9 -0
  47. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/confusion_matrix.csv +0 -0
  48. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json +188 -0
  49. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/per_class_metrics.csv +1212 -0
  50. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/predictions.csv +0 -0
TASK_METHOD_20_GAP_AUDIT.md CHANGED
@@ -1,6 +1,6 @@
1
  # Task Method 20-Result Gap Audit
2
 
3
- Generated: `2026-06-18T11:15:34+00:00`
4
 
5
  This audit is the explicit gap ledger for the 9-method x 20-task result matrix.
6
  It keeps missing cells visible while preserving the rule that a numeric score
@@ -9,8 +9,8 @@ requires a real task target and source artifact.
9
  ## Score Summary
10
 
11
  - Method-task records: `180`
12
- - Numeric scored records: `123`
13
- - Scoreless records: `57`
14
  - Proxy-scored records: `4`
15
  - Source matrix: [`docs/data/task_method_20_result_matrix.json`](docs/data/task_method_20_result_matrix.json)
16
 
@@ -20,8 +20,8 @@ requires a real task target and source artifact.
20
  | --- | --- | --- | --- | --- | --- |
21
  | Minimal | minimal | 20/20 | 0 | 0 | scored: 20 |
22
  | Neural MLP | neural_mlp | 20/20 | 0 | 0 | scored: 20 |
23
- | 128ep Metadata Simple | metadata128_simple | 8/20 | 12 | 0 | not_supported_by_metadata_only_package: 8, scored: 8, unsupported_without_required_target: 4 |
24
- | 128ep Metadata NN | metadata128_neural_mlp | 8/20 | 12 | 0 | not_supported_by_metadata_only_package: 12, scored: 8 |
25
  | 128ep Raw Simple | raw128_simple | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
26
  | 128ep Raw NN | raw128_neural_mlp | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
27
  | Qwen3-Omni v6 LoRA | qwen3_omni_v6_lora | 15/20 | 5 | 0 | not_evaluated_in_verified_package: 5, scored: 15 |
@@ -33,14 +33,17 @@ requires a real task target and source artifact.
33
  | Status | Count | Next step |
34
  | --- | --- | --- |
35
  | not_evaluated_in_verified_package | 33 | Generate verified model outputs for this task contract and score them against the held-out labels. |
36
- | not_supported_by_metadata_only_package | 20 | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
37
- | unsupported_without_required_target | 4 | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
38
 
39
  ## Scoreless Records
40
 
41
  | Task | Task label | Method | Status | Required evidence |
42
  | --- | --- | --- | --- | --- |
 
 
43
  | 02 | Procedure Step Recognition | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
44
  | 05 | Hand Trajectory Forecasting | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
45
  | 05 | Hand Trajectory Forecasting | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
46
  | 05 | Hand Trajectory Forecasting | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
@@ -63,38 +66,31 @@ requires a real task target and source artifact.
63
  | 12 | Multimodal Synchronization Detection | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
64
  | 12 | Multimodal Synchronization Detection | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
65
  | 12 | Multimodal Synchronization Detection | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
66
- | 13 | Long-Horizon Next-Action Forecasting | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
67
- | 13 | Long-Horizon Next-Action Forecasting | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
68
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
69
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
70
- | 14 | Long-Horizon Next-Subtask Forecasting | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
71
- | 14 | Long-Horizon Next-Subtask Forecasting | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
72
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
73
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
74
- | 15 | Interaction Text Prediction | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
75
  | 15 | Interaction Text Prediction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
76
  | 15 | Interaction Text Prediction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
77
  | 15 | Interaction Text Prediction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
78
  | 15 | Interaction Text Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
79
- | 16 | Action-Object Relation Prediction | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
80
- | 16 | Action-Object Relation Prediction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
81
  | 16 | Action-Object Relation Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
82
- | 17 | Future Object-Set Forecasting | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
83
- | 17 | Future Object-Set Forecasting | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
84
  | 17 | Future Object-Set Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
85
  | 17 | Future Object-Set Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
86
- | 18 | IMU-to-Hand Pose Reconstruction | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
87
  | 18 | IMU-to-Hand Pose Reconstruction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
88
  | 18 | IMU-to-Hand Pose Reconstruction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
89
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
90
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
91
- | 19 | Camera-View Synchronization Retrieval | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
92
  | 19 | Camera-View Synchronization Retrieval | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
93
  | 19 | Camera-View Synchronization Retrieval | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
94
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
95
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
96
- | 20 | Time-to-Next-Transition Regression | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
97
- | 20 | Time-to-Next-Transition Regression | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
98
  | 20 | Time-to-Next-Transition Regression | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
99
  | 20 | Time-to-Next-Transition Regression | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
100
 
 
1
  # Task Method 20-Result Gap Audit
2
 
3
+ Generated: `2026-06-18T12:07:14+00:00`
4
 
5
  This audit is the explicit gap ledger for the 9-method x 20-task result matrix.
6
  It keeps missing cells visible while preserving the rule that a numeric score
 
9
  ## Score Summary
10
 
11
  - Method-task records: `180`
12
+ - Numeric scored records: `127`
13
+ - Scoreless records: `53`
14
  - Proxy-scored records: `4`
15
  - Source matrix: [`docs/data/task_method_20_result_matrix.json`](docs/data/task_method_20_result_matrix.json)
16
 
 
20
  | --- | --- | --- | --- | --- | --- |
21
  | Minimal | minimal | 20/20 | 0 | 0 | scored: 20 |
22
  | Neural MLP | neural_mlp | 20/20 | 0 | 0 | scored: 20 |
23
+ | 128ep Metadata Simple | metadata128_simple | 13/20 | 7 | 0 | scored: 13, unsupported_without_required_target: 7 |
24
+ | 128ep Metadata NN | metadata128_neural_mlp | 7/20 | 13 | 0 | not_supported_by_metadata_only_package: 7, scored: 7, unsupported_without_required_target: 6 |
25
  | 128ep Raw Simple | raw128_simple | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
26
  | 128ep Raw NN | raw128_neural_mlp | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
27
  | Qwen3-Omni v6 LoRA | qwen3_omni_v6_lora | 15/20 | 5 | 0 | not_evaluated_in_verified_package: 5, scored: 15 |
 
33
  | Status | Count | Next step |
34
  | --- | --- | --- |
35
  | not_evaluated_in_verified_package | 33 | Generate verified model outputs for this task contract and score them against the held-out labels. |
36
+ | not_supported_by_metadata_only_package | 7 | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
37
+ | unsupported_without_required_target | 13 | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
38
 
39
  ## Scoreless Records
40
 
41
  | Task | Task label | Method | Status | Required evidence |
42
  | --- | --- | --- | --- | --- |
43
+ | 01 | Action Recognition | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
44
+ | 02 | Procedure Step Recognition | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
45
  | 02 | Procedure Step Recognition | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
46
+ | 04 | Next-Action Prediction | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
47
  | 05 | Hand Trajectory Forecasting | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
48
  | 05 | Hand Trajectory Forecasting | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
49
  | 05 | Hand Trajectory Forecasting | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
66
  | 12 | Multimodal Synchronization Detection | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
67
  | 12 | Multimodal Synchronization Detection | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
68
  | 12 | Multimodal Synchronization Detection | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
69
+ | 13 | Long-Horizon Next-Action Forecasting | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
 
70
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
71
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
72
+ | 14 | Long-Horizon Next-Subtask Forecasting | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
 
73
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
74
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
75
+ | 15 | Interaction Text Prediction | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
76
  | 15 | Interaction Text Prediction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
77
  | 15 | Interaction Text Prediction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
78
  | 15 | Interaction Text Prediction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
79
  | 15 | Interaction Text Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
80
+ | 16 | Action-Object Relation Prediction | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
 
81
  | 16 | Action-Object Relation Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
 
82
  | 17 | Future Object-Set Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
83
  | 17 | Future Object-Set Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
84
+ | 18 | IMU-to-Hand Pose Reconstruction | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
85
  | 18 | IMU-to-Hand Pose Reconstruction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
86
  | 18 | IMU-to-Hand Pose Reconstruction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
87
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
88
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
89
+ | 19 | Camera-View Synchronization Retrieval | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
90
  | 19 | Camera-View Synchronization Retrieval | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
91
  | 19 | Camera-View Synchronization Retrieval | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
92
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
93
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
 
94
  | 20 | Time-to-Next-Transition Regression | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
95
  | 20 | Time-to-Next-Transition Regression | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
96
 
TASK_METHOD_20_RESULT_MATRIX.md CHANGED
@@ -8,8 +8,8 @@ Legend: `score` = numeric task score, `proxy` = documented raw128 compact proxy
8
  | --- | ---: | ---: | ---: | ---: | --- |
9
  | Minimal | 20 | 20 | 0 | 0 | scored 20 |
10
  | Neural MLP | 20 | 20 | 0 | 0 | scored 20 |
11
- | 128ep Metadata Simple | 20 | 8 | 0 | 12 | not supported 8, scored 8, unsupported 4 |
12
- | 128ep Metadata NN | 20 | 8 | 0 | 12 | not supported 12, scored 8 |
13
  | 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
14
  | 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
15
  | Qwen3-Omni v6 LoRA | 20 | 15 | 0 | 5 | not evaluated 5, scored 15 |
@@ -30,13 +30,13 @@ Legend: `score` = numeric task score, `proxy` = documented raw128 compact proxy
30
  | 10 | Cross-Modal Reconstruction | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
31
  | 11 | Temporal Order Verification | score | score | score | score | score | score | score | not evaluated | not evaluated |
32
  | 12 | Multimodal Synchronization Detection | score | score | unsupported | not supported | score | score | score | not evaluated | not evaluated |
33
- | 13 | Long-Horizon Next-Action Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
34
- | 14 | Long-Horizon Next-Subtask Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
35
- | 15 | Interaction Text Prediction | score | score | not supported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
36
- | 16 | Action-Object Relation Prediction | score | score | not supported | not supported | score | score | score | score | not evaluated |
37
- | 17 | Future Object-Set Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
38
- | 18 | IMU-to-Hand Pose Reconstruction | score | score | not supported | not supported | score | score | not evaluated | not evaluated | not evaluated |
39
- | 19 | Camera-View Synchronization Retrieval | score | score | not supported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
40
- | 20 | Time-to-Next-Transition Regression | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
41
 
42
  Sources and raw values are in `docs/data/task_method_20_result_matrix.json` and `docs/data/unified_task_model_radar.json`.
 
8
  | --- | ---: | ---: | ---: | ---: | --- |
9
  | Minimal | 20 | 20 | 0 | 0 | scored 20 |
10
  | Neural MLP | 20 | 20 | 0 | 0 | scored 20 |
11
+ | 128ep Metadata Simple | 20 | 13 | 0 | 7 | scored 13, unsupported 7 |
12
+ | 128ep Metadata NN | 20 | 13 | 0 | 7 | not supported 7, scored 13 |
13
  | 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
14
  | 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
15
  | Qwen3-Omni v6 LoRA | 20 | 15 | 0 | 5 | not evaluated 5, scored 15 |
 
30
  | 10 | Cross-Modal Reconstruction | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
31
  | 11 | Temporal Order Verification | score | score | score | score | score | score | score | not evaluated | not evaluated |
32
  | 12 | Multimodal Synchronization Detection | score | score | unsupported | not supported | score | score | score | not evaluated | not evaluated |
33
+ | 13 | Long-Horizon Next-Action Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
34
+ | 14 | Long-Horizon Next-Subtask Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
35
+ | 15 | Interaction Text Prediction | score | score | unsupported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
36
+ | 16 | Action-Object Relation Prediction | score | score | score | score | score | score | score | score | not evaluated |
37
+ | 17 | Future Object-Set Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
38
+ | 18 | IMU-to-Hand Pose Reconstruction | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
39
+ | 19 | Camera-View Synchronization Retrieval | score | score | unsupported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
40
+ | 20 | Time-to-Next-Transition Regression | score | score | score | score | score | score | score | not evaluated | not evaluated |
41
 
42
  Sources and raw values are in `docs/data/task_method_20_result_matrix.json` and `docs/data/unified_task_model_radar.json`.
data/artifact_index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-18T11:16:44+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
@@ -290,8 +290,8 @@
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
- "bytes": 58012,
294
- "sha256": "a95cdde097b11f83023c758c807f031c3d4cb3fde20d42ed314565440cc68374"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
@@ -599,7 +599,7 @@
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
- "sha256": "8494b6983100acdfde9b5929e871b27120897af8ec7b5a3031aa142b598a09ae"
603
  },
604
  {
605
  "id": "source_alignment_validator",
@@ -719,8 +719,8 @@
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
- "bytes": 230951,
723
- "sha256": "8aaed21d08943f2dc53c5160e27872bc4f7f8a405d7289cdaaf7b00d867b84d8"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
@@ -731,7 +731,7 @@
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
  "bytes": 50973,
734
- "sha256": "d20637e6a17390f7fd44589ff37cb1889318bc39c2259dca6bb7f1a43d8ea26b"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
@@ -741,8 +741,8 @@
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
- "bytes": 187099,
745
- "sha256": "bf2b3fdeb9713a9d4cba0e8645c24c325b88e939cb94f4718a9d3c2db03e2bb3"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
@@ -752,8 +752,8 @@
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
- "bytes": 129600,
756
- "sha256": "30fd572521991fd7f5741411d91a40d3d442032f001841f9fd1a4e7381eb73d2"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
@@ -763,8 +763,8 @@
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
- "bytes": 4128,
767
- "sha256": "89c73da7db81d2c5f6eb4a16c828531a589ac44cabba2c0c95b171b6ad2060d6"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
@@ -774,8 +774,8 @@
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
- "bytes": 50687,
778
- "sha256": "2cdaa06f9c140a2e194675a3383be341acb1f6e07ddecfa7017cdbe34d704282"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
@@ -785,8 +785,8 @@
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
- "bytes": 14421,
789
- "sha256": "125e658010284dc48570fa7c6a7676e4013d30dd1f22deb24d369e7085a7b700"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
@@ -796,8 +796,8 @@
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
- "bytes": 50841,
800
- "sha256": "e5fa2420fc5ed905953e71ef8978ad1ee794c0daf06a7f0ff10374db7f291c72"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
@@ -818,8 +818,8 @@
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
- "bytes": 44825,
822
- "sha256": "50b5d87fca4aba303a8440f5ef53470ed493e9f1251cb5edeb16bac90038a11b"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
@@ -906,8 +906,8 @@
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
- "bytes": 50297,
910
- "sha256": "1c1710bcf340ece479e321f19d4cb8302fe369a1103b4584a15853fe73dc226c"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
@@ -1105,7 +1105,7 @@
1105
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1106
  "exists": true,
1107
  "bytes": 8100,
1108
- "sha256": "6549b0f8da6c3742c72b12b71900db1b89455cd34d5befcdf9d249b4adebbd1a"
1109
  },
1110
  {
1111
  "id": "public_surface_qa",
@@ -1310,7 +1310,7 @@
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
- "bytes": 983979,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
@@ -1322,7 +1322,7 @@
1322
  "volatile": true,
1323
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
1324
  "exists": true,
1325
- "bytes": 20022,
1326
  "hash_policy": "existence_and_size_only"
1327
  },
1328
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
 
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
+ "bytes": 73236,
294
+ "sha256": "76acae0de25d51413e7e6f11021163e7d9909cfe95d65bf6b02e74043d429e2d"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
 
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
+ "sha256": "ae089cc0df132b63365e03b2157a488b5d1569567c0374d7621bcd347da62c9e"
603
  },
604
  {
605
  "id": "source_alignment_validator",
 
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
+ "bytes": 230297,
723
+ "sha256": "437874b1633e73165e3300f55580394663a44759c848288e696859b98f8aad32"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
 
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
  "bytes": 50973,
734
+ "sha256": "38cb43512f2ac40feeb62333bdea89b3a55e5b48468beb8982cf22536f794ecf"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
 
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
+ "bytes": 186443,
745
+ "sha256": "55e758e8703f406889022976d0ba055181212305c9a7246e899463e0c3c3b554"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
 
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
+ "bytes": 129242,
756
+ "sha256": "64fb700d51f536edf11291799b6173cf9ae8dd7a41178aac348b8207ed4b1e42"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
 
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
+ "bytes": 4026,
767
+ "sha256": "55e949fc30419a52f7f5ec4dd9544a11b253b076f8e3637ec3e92b3d61a89aab"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
 
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
+ "bytes": 46902,
778
+ "sha256": "2b64dbd013625852679f9b91d25c48d1ed197fec727883b4fe37088b2d594784"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
 
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
+ "bytes": 13387,
789
+ "sha256": "d33461eb704f8e92545b6b54d9fc509e617fbacc9ca9894ac851ca9c3dec0fec"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
 
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
+ "bytes": 51953,
800
+ "sha256": "19c001f10319946ef0e4921064f8a012836f29e7c8b272f900c257169faf46a1"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
 
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
+ "bytes": 45937,
822
+ "sha256": "b504b1b9c5cad0caa8c822d5bb2971c1b708251cf7b9ef587a92db2c12751e97"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
 
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
+ "bytes": 109248,
910
+ "sha256": "5e7f3085be5012eb3dda46f9c7b5b7c0ae22d6a0fbce71d6e99dd317fecc12af"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
 
1105
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1106
  "exists": true,
1107
  "bytes": 8100,
1108
+ "sha256": "7800195093b8b81b49c87cdcbcebe601de8141c0c9d8b4490b98f539cb132725"
1109
  },
1110
  {
1111
  "id": "public_surface_qa",
 
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
+ "bytes": 994053,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
 
1322
  "volatile": true,
1323
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
1324
  "exists": true,
1325
+ "bytes": 20021,
1326
  "hash_policy": "existence_and_size_only"
1327
  },
1328
  {
data/episode128_task_model_radar.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
- "scored_method_task_count": 83,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -30,18 +30,17 @@
30
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
- "scored_task_count": 8,
34
- "covered_task_count": 8,
35
  "proxy_scored_task_count": 0,
36
- "scoreless_task_count": 12,
37
- "unsupported_task_count": 12,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
- "not_supported_by_metadata_only_package": 8,
41
- "scored": 8,
42
- "unsupported_without_required_target": 4
43
  },
44
- "coverage_fraction": 0.4,
45
  "result_record_fraction": 1.0
46
  },
47
  {
@@ -55,17 +54,17 @@
55
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
56
  "plotted_as": "colored point overlay",
57
  "result_record_count": 20,
58
- "scored_task_count": 8,
59
- "covered_task_count": 8,
60
  "proxy_scored_task_count": 0,
61
- "scoreless_task_count": 12,
62
- "unsupported_task_count": 12,
63
  "not_evaluated_task_count": 0,
64
  "status_counts": {
65
- "not_supported_by_metadata_only_package": 12,
66
- "scored": 8
67
  },
68
- "coverage_fraction": 0.4,
69
  "result_record_fraction": 1.0
70
  },
71
  {
@@ -1295,26 +1294,26 @@
1295
  "raw128_proxy_axis": false,
1296
  "values": {
1297
  "metadata128_simple": {
1298
- "raw": null,
1299
  "metric_key": "macro_f1",
1300
- "source": null,
1301
  "scope": "multi_episode_128_metadata_baseline",
1302
- "status": "not_supported_by_metadata_only_package",
1303
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1304
- "normalized_score": null,
1305
- "raw_text": "n/a",
1306
- "status_label": "not supported"
1307
  },
1308
  "metadata128_neural_mlp": {
1309
- "raw": null,
1310
  "metric_key": "macro_f1",
1311
- "source": null,
1312
  "scope": "multi_episode_128_metadata_baseline",
1313
- "status": "not_supported_by_metadata_only_package",
1314
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1315
- "normalized_score": null,
1316
- "raw_text": "n/a",
1317
- "status_label": "not supported"
1318
  },
1319
  "raw128_simple": {
1320
  "raw": 0.0024280172369056294,
@@ -1386,26 +1385,26 @@
1386
  "raw128_proxy_axis": false,
1387
  "values": {
1388
  "metadata128_simple": {
1389
- "raw": null,
1390
  "metric_key": "macro_f1",
1391
- "source": null,
1392
  "scope": "multi_episode_128_metadata_baseline",
1393
- "status": "not_supported_by_metadata_only_package",
1394
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1395
- "normalized_score": null,
1396
- "raw_text": "n/a",
1397
- "status_label": "not supported"
1398
  },
1399
  "metadata128_neural_mlp": {
1400
- "raw": null,
1401
  "metric_key": "macro_f1",
1402
- "source": null,
1403
  "scope": "multi_episode_128_metadata_baseline",
1404
- "status": "not_supported_by_metadata_only_package",
1405
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1406
- "normalized_score": null,
1407
- "raw_text": "n/a",
1408
- "status_label": "not supported"
1409
  },
1410
  "raw128_simple": {
1411
  "raw": 0.0,
@@ -1479,13 +1478,13 @@
1479
  "metadata128_simple": {
1480
  "raw": null,
1481
  "metric_key": "macro_f1",
1482
- "source": null,
1483
  "scope": "multi_episode_128_metadata_baseline",
1484
- "status": "not_supported_by_metadata_only_package",
1485
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1486
  "normalized_score": null,
1487
  "raw_text": "n/a",
1488
- "status_label": "not supported"
1489
  },
1490
  "metadata128_neural_mlp": {
1491
  "raw": null,
@@ -1568,26 +1567,26 @@
1568
  "raw128_proxy_axis": false,
1569
  "values": {
1570
  "metadata128_simple": {
1571
- "raw": null,
1572
  "metric_key": "macro_f1",
1573
- "source": null,
1574
  "scope": "multi_episode_128_metadata_baseline",
1575
- "status": "not_supported_by_metadata_only_package",
1576
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1577
- "normalized_score": null,
1578
- "raw_text": "n/a",
1579
- "status_label": "not supported"
1580
  },
1581
  "metadata128_neural_mlp": {
1582
- "raw": null,
1583
  "metric_key": "macro_f1",
1584
- "source": null,
1585
  "scope": "multi_episode_128_metadata_baseline",
1586
- "status": "not_supported_by_metadata_only_package",
1587
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1588
- "normalized_score": null,
1589
- "raw_text": "n/a",
1590
- "status_label": "not supported"
1591
  },
1592
  "raw128_simple": {
1593
  "raw": 0.0,
@@ -1659,26 +1658,26 @@
1659
  "raw128_proxy_axis": false,
1660
  "values": {
1661
  "metadata128_simple": {
1662
- "raw": null,
1663
  "metric_key": "micro_f1",
1664
- "source": null,
1665
  "scope": "multi_episode_128_metadata_baseline",
1666
- "status": "not_supported_by_metadata_only_package",
1667
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1668
- "normalized_score": null,
1669
- "raw_text": "n/a",
1670
- "status_label": "not supported"
1671
  },
1672
  "metadata128_neural_mlp": {
1673
- "raw": null,
1674
  "metric_key": "micro_f1",
1675
- "source": null,
1676
  "scope": "multi_episode_128_metadata_baseline",
1677
- "status": "not_supported_by_metadata_only_package",
1678
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1679
- "normalized_score": null,
1680
- "raw_text": "n/a",
1681
- "status_label": "not supported"
1682
  },
1683
  "raw128_simple": {
1684
  "raw": 0.06469493412657774,
@@ -1752,13 +1751,13 @@
1752
  "metadata128_simple": {
1753
  "raw": null,
1754
  "metric_key": "mae",
1755
- "source": null,
1756
  "scope": "multi_episode_128_metadata_baseline",
1757
- "status": "not_supported_by_metadata_only_package",
1758
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1759
  "normalized_score": null,
1760
  "raw_text": "n/a",
1761
- "status_label": "not supported"
1762
  },
1763
  "metadata128_neural_mlp": {
1764
  "raw": null,
@@ -1843,13 +1842,13 @@
1843
  "metadata128_simple": {
1844
  "raw": null,
1845
  "metric_key": "mrr",
1846
- "source": null,
1847
  "scope": "multi_episode_128_metadata_baseline",
1848
- "status": "not_supported_by_metadata_only_package",
1849
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1850
  "normalized_score": null,
1851
  "raw_text": "n/a",
1852
- "status_label": "not supported"
1853
  },
1854
  "metadata128_neural_mlp": {
1855
  "raw": null,
@@ -1932,26 +1931,26 @@
1932
  "raw128_proxy_axis": false,
1933
  "values": {
1934
  "metadata128_simple": {
1935
- "raw": null,
1936
  "metric_key": "mae",
1937
- "source": null,
1938
  "scope": "multi_episode_128_metadata_baseline",
1939
- "status": "not_supported_by_metadata_only_package",
1940
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1941
- "normalized_score": null,
1942
- "raw_text": "n/a",
1943
- "status_label": "not supported"
1944
  },
1945
  "metadata128_neural_mlp": {
1946
- "raw": null,
1947
  "metric_key": "mae",
1948
- "source": null,
1949
  "scope": "multi_episode_128_metadata_baseline",
1950
- "status": "not_supported_by_metadata_only_package",
1951
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1952
- "normalized_score": null,
1953
- "raw_text": "n/a",
1954
- "status_label": "not supported"
1955
  },
1956
  "raw128_simple": {
1957
  "raw": 52.32759475708008,
@@ -3530,17 +3529,17 @@
3530
  "task_label": "Long-Horizon Next-Action Forecasting",
3531
  "series_id": "metadata128_simple",
3532
  "method": "128ep Metadata Simple",
3533
- "status": "not_supported_by_metadata_only_package",
3534
- "status_label": "not supported",
3535
- "scored": false,
3536
  "proxy_scored": false,
3537
- "raw": null,
3538
- "raw_text": "n/a",
3539
- "normalized_score": null,
3540
  "metric_key": "macro_f1",
3541
- "source": null,
3542
  "scope": "multi_episode_128_metadata_baseline",
3543
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3544
  },
3545
  {
3546
  "task_number": 13,
@@ -3548,17 +3547,17 @@
3548
  "task_label": "Long-Horizon Next-Action Forecasting",
3549
  "series_id": "metadata128_neural_mlp",
3550
  "method": "128ep Metadata NN",
3551
- "status": "not_supported_by_metadata_only_package",
3552
- "status_label": "not supported",
3553
- "scored": false,
3554
  "proxy_scored": false,
3555
- "raw": null,
3556
- "raw_text": "n/a",
3557
- "normalized_score": null,
3558
  "metric_key": "macro_f1",
3559
- "source": null,
3560
  "scope": "multi_episode_128_metadata_baseline",
3561
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3562
  },
3563
  {
3564
  "task_number": 13,
@@ -3656,17 +3655,17 @@
3656
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3657
  "series_id": "metadata128_simple",
3658
  "method": "128ep Metadata Simple",
3659
- "status": "not_supported_by_metadata_only_package",
3660
- "status_label": "not supported",
3661
- "scored": false,
3662
  "proxy_scored": false,
3663
- "raw": null,
3664
- "raw_text": "n/a",
3665
- "normalized_score": null,
3666
  "metric_key": "macro_f1",
3667
- "source": null,
3668
  "scope": "multi_episode_128_metadata_baseline",
3669
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3670
  },
3671
  {
3672
  "task_number": 14,
@@ -3674,17 +3673,17 @@
3674
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3675
  "series_id": "metadata128_neural_mlp",
3676
  "method": "128ep Metadata NN",
3677
- "status": "not_supported_by_metadata_only_package",
3678
- "status_label": "not supported",
3679
- "scored": false,
3680
  "proxy_scored": false,
3681
- "raw": null,
3682
- "raw_text": "n/a",
3683
- "normalized_score": null,
3684
  "metric_key": "macro_f1",
3685
- "source": null,
3686
  "scope": "multi_episode_128_metadata_baseline",
3687
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3688
  },
3689
  {
3690
  "task_number": 14,
@@ -3782,17 +3781,17 @@
3782
  "task_label": "Interaction Text Prediction",
3783
  "series_id": "metadata128_simple",
3784
  "method": "128ep Metadata Simple",
3785
- "status": "not_supported_by_metadata_only_package",
3786
- "status_label": "not supported",
3787
  "scored": false,
3788
  "proxy_scored": false,
3789
  "raw": null,
3790
  "raw_text": "n/a",
3791
  "normalized_score": null,
3792
  "metric_key": "macro_f1",
3793
- "source": null,
3794
  "scope": "multi_episode_128_metadata_baseline",
3795
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3796
  },
3797
  {
3798
  "task_number": 15,
@@ -3908,17 +3907,17 @@
3908
  "task_label": "Action-Object Relation Prediction",
3909
  "series_id": "metadata128_simple",
3910
  "method": "128ep Metadata Simple",
3911
- "status": "not_supported_by_metadata_only_package",
3912
- "status_label": "not supported",
3913
- "scored": false,
3914
  "proxy_scored": false,
3915
- "raw": null,
3916
- "raw_text": "n/a",
3917
- "normalized_score": null,
3918
  "metric_key": "macro_f1",
3919
- "source": null,
3920
  "scope": "multi_episode_128_metadata_baseline",
3921
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3922
  },
3923
  {
3924
  "task_number": 16,
@@ -3926,17 +3925,17 @@
3926
  "task_label": "Action-Object Relation Prediction",
3927
  "series_id": "metadata128_neural_mlp",
3928
  "method": "128ep Metadata NN",
3929
- "status": "not_supported_by_metadata_only_package",
3930
- "status_label": "not supported",
3931
- "scored": false,
3932
  "proxy_scored": false,
3933
- "raw": null,
3934
- "raw_text": "n/a",
3935
- "normalized_score": null,
3936
  "metric_key": "macro_f1",
3937
- "source": null,
3938
  "scope": "multi_episode_128_metadata_baseline",
3939
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3940
  },
3941
  {
3942
  "task_number": 16,
@@ -4034,17 +4033,17 @@
4034
  "task_label": "Future Object-Set Forecasting",
4035
  "series_id": "metadata128_simple",
4036
  "method": "128ep Metadata Simple",
4037
- "status": "not_supported_by_metadata_only_package",
4038
- "status_label": "not supported",
4039
- "scored": false,
4040
  "proxy_scored": false,
4041
- "raw": null,
4042
- "raw_text": "n/a",
4043
- "normalized_score": null,
4044
  "metric_key": "micro_f1",
4045
- "source": null,
4046
  "scope": "multi_episode_128_metadata_baseline",
4047
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4048
  },
4049
  {
4050
  "task_number": 17,
@@ -4052,17 +4051,17 @@
4052
  "task_label": "Future Object-Set Forecasting",
4053
  "series_id": "metadata128_neural_mlp",
4054
  "method": "128ep Metadata NN",
4055
- "status": "not_supported_by_metadata_only_package",
4056
- "status_label": "not supported",
4057
- "scored": false,
4058
  "proxy_scored": false,
4059
- "raw": null,
4060
- "raw_text": "n/a",
4061
- "normalized_score": null,
4062
  "metric_key": "micro_f1",
4063
- "source": null,
4064
  "scope": "multi_episode_128_metadata_baseline",
4065
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4066
  },
4067
  {
4068
  "task_number": 17,
@@ -4160,17 +4159,17 @@
4160
  "task_label": "IMU-to-Hand Pose Reconstruction",
4161
  "series_id": "metadata128_simple",
4162
  "method": "128ep Metadata Simple",
4163
- "status": "not_supported_by_metadata_only_package",
4164
- "status_label": "not supported",
4165
  "scored": false,
4166
  "proxy_scored": false,
4167
  "raw": null,
4168
  "raw_text": "n/a",
4169
  "normalized_score": null,
4170
  "metric_key": "mae",
4171
- "source": null,
4172
  "scope": "multi_episode_128_metadata_baseline",
4173
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4174
  },
4175
  {
4176
  "task_number": 18,
@@ -4286,17 +4285,17 @@
4286
  "task_label": "Camera-View Synchronization Retrieval",
4287
  "series_id": "metadata128_simple",
4288
  "method": "128ep Metadata Simple",
4289
- "status": "not_supported_by_metadata_only_package",
4290
- "status_label": "not supported",
4291
  "scored": false,
4292
  "proxy_scored": false,
4293
  "raw": null,
4294
  "raw_text": "n/a",
4295
  "normalized_score": null,
4296
  "metric_key": "mrr",
4297
- "source": null,
4298
  "scope": "multi_episode_128_metadata_baseline",
4299
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4300
  },
4301
  {
4302
  "task_number": 19,
@@ -4412,17 +4411,17 @@
4412
  "task_label": "Time-to-Next-Transition Regression",
4413
  "series_id": "metadata128_simple",
4414
  "method": "128ep Metadata Simple",
4415
- "status": "not_supported_by_metadata_only_package",
4416
- "status_label": "not supported",
4417
- "scored": false,
4418
  "proxy_scored": false,
4419
- "raw": null,
4420
- "raw_text": "n/a",
4421
- "normalized_score": null,
4422
  "metric_key": "mae",
4423
- "source": null,
4424
  "scope": "multi_episode_128_metadata_baseline",
4425
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4426
  },
4427
  {
4428
  "task_number": 20,
@@ -4430,17 +4429,17 @@
4430
  "task_label": "Time-to-Next-Transition Regression",
4431
  "series_id": "metadata128_neural_mlp",
4432
  "method": "128ep Metadata NN",
4433
- "status": "not_supported_by_metadata_only_package",
4434
- "status_label": "not supported",
4435
- "scored": false,
4436
  "proxy_scored": false,
4437
- "raw": null,
4438
- "raw_text": "n/a",
4439
- "normalized_score": null,
4440
  "metric_key": "mae",
4441
- "source": null,
4442
  "scope": "multi_episode_128_metadata_baseline",
4443
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4444
  },
4445
  {
4446
  "task_number": 20,
 
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
+ "scored_method_task_count": 93,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
 
30
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
+ "scored_task_count": 13,
34
+ "covered_task_count": 13,
35
  "proxy_scored_task_count": 0,
36
+ "scoreless_task_count": 7,
37
+ "unsupported_task_count": 7,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
+ "scored": 13,
41
+ "unsupported_without_required_target": 7
 
42
  },
43
+ "coverage_fraction": 0.65,
44
  "result_record_fraction": 1.0
45
  },
46
  {
 
54
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
+ "scored_task_count": 13,
58
+ "covered_task_count": 13,
59
  "proxy_scored_task_count": 0,
60
+ "scoreless_task_count": 7,
61
+ "unsupported_task_count": 7,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
+ "not_supported_by_metadata_only_package": 7,
65
+ "scored": 13
66
  },
67
+ "coverage_fraction": 0.65,
68
  "result_record_fraction": 1.0
69
  },
70
  {
 
1294
  "raw128_proxy_axis": false,
1295
  "values": {
1296
  "metadata128_simple": {
1297
+ "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
  "scope": "multi_episode_128_metadata_baseline",
1301
+ "status": "scored",
1302
+ "reason": null,
1303
+ "normalized_score": 0.004579592783699693,
1304
+ "raw_text": "0.0046",
1305
+ "status_label": "scored"
1306
  },
1307
  "metadata128_neural_mlp": {
1308
+ "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
  "scope": "multi_episode_128_metadata_baseline",
1312
+ "status": "scored",
1313
+ "reason": null,
1314
+ "normalized_score": 0.0029821307969142615,
1315
+ "raw_text": "0.0030",
1316
+ "status_label": "scored"
1317
  },
1318
  "raw128_simple": {
1319
  "raw": 0.0024280172369056294,
 
1385
  "raw128_proxy_axis": false,
1386
  "values": {
1387
  "metadata128_simple": {
1388
+ "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
  "scope": "multi_episode_128_metadata_baseline",
1392
+ "status": "scored",
1393
+ "reason": null,
1394
+ "normalized_score": 0.0001206030150753769,
1395
+ "raw_text": "0.0001",
1396
+ "status_label": "scored"
1397
  },
1398
  "metadata128_neural_mlp": {
1399
+ "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
  "scope": "multi_episode_128_metadata_baseline",
1403
+ "status": "scored",
1404
+ "reason": null,
1405
+ "normalized_score": 2.086049543676662e-05,
1406
+ "raw_text": "0.0000",
1407
+ "status_label": "scored"
1408
  },
1409
  "raw128_simple": {
1410
  "raw": 0.0,
 
1478
  "metadata128_simple": {
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
  "scope": "multi_episode_128_metadata_baseline",
1483
+ "status": "unsupported_without_required_target",
1484
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
1486
  "raw_text": "n/a",
1487
+ "status_label": "unsupported"
1488
  },
1489
  "metadata128_neural_mlp": {
1490
  "raw": null,
 
1567
  "raw128_proxy_axis": false,
1568
  "values": {
1569
  "metadata128_simple": {
1570
+ "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
  "scope": "multi_episode_128_metadata_baseline",
1574
+ "status": "scored",
1575
+ "reason": null,
1576
+ "normalized_score": 0.0,
1577
+ "raw_text": "0.0000",
1578
+ "status_label": "scored"
1579
  },
1580
  "metadata128_neural_mlp": {
1581
+ "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
  "scope": "multi_episode_128_metadata_baseline",
1585
+ "status": "scored",
1586
+ "reason": null,
1587
+ "normalized_score": 0.0,
1588
+ "raw_text": "0.0000",
1589
+ "status_label": "scored"
1590
  },
1591
  "raw128_simple": {
1592
  "raw": 0.0,
 
1658
  "raw128_proxy_axis": false,
1659
  "values": {
1660
  "metadata128_simple": {
1661
+ "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
  "scope": "multi_episode_128_metadata_baseline",
1665
+ "status": "scored",
1666
+ "reason": null,
1667
+ "normalized_score": 0.17656983343047333,
1668
+ "raw_text": "0.1766",
1669
+ "status_label": "scored"
1670
  },
1671
  "metadata128_neural_mlp": {
1672
+ "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
  "scope": "multi_episode_128_metadata_baseline",
1676
+ "status": "scored",
1677
+ "reason": null,
1678
+ "normalized_score": 0.17418550827844048,
1679
+ "raw_text": "0.1742",
1680
+ "status_label": "scored"
1681
  },
1682
  "raw128_simple": {
1683
  "raw": 0.06469493412657774,
 
1751
  "metadata128_simple": {
1752
  "raw": null,
1753
  "metric_key": "mae",
1754
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
  "scope": "multi_episode_128_metadata_baseline",
1756
+ "status": "unsupported_without_required_target",
1757
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
1758
  "normalized_score": null,
1759
  "raw_text": "n/a",
1760
+ "status_label": "unsupported"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
  "raw": null,
 
1842
  "metadata128_simple": {
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
  "scope": "multi_episode_128_metadata_baseline",
1847
+ "status": "unsupported_without_required_target",
1848
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
1850
  "raw_text": "n/a",
1851
+ "status_label": "unsupported"
1852
  },
1853
  "metadata128_neural_mlp": {
1854
  "raw": null,
 
1931
  "raw128_proxy_axis": false,
1932
  "values": {
1933
  "metadata128_simple": {
1934
+ "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
  "scope": "multi_episode_128_metadata_baseline",
1938
+ "status": "scored",
1939
+ "reason": null,
1940
+ "normalized_score": 0.016864874132806403,
1941
+ "raw_text": "624.81",
1942
+ "status_label": "scored"
1943
  },
1944
  "metadata128_neural_mlp": {
1945
+ "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
  "scope": "multi_episode_128_metadata_baseline",
1949
+ "status": "scored",
1950
+ "reason": null,
1951
+ "normalized_score": 0.25411768748242325,
1952
+ "raw_text": "41.47",
1953
+ "status_label": "scored"
1954
  },
1955
  "raw128_simple": {
1956
  "raw": 52.32759475708008,
 
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
  "method": "128ep Metadata Simple",
3532
+ "status": "scored",
3533
+ "status_label": "scored",
3534
+ "scored": true,
3535
  "proxy_scored": false,
3536
+ "raw": 0.004579592783699693,
3537
+ "raw_text": "0.0046",
3538
+ "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
  "scope": "multi_episode_128_metadata_baseline",
3542
+ "reason": null
3543
  },
3544
  {
3545
  "task_number": 13,
 
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
  "method": "128ep Metadata NN",
3550
+ "status": "scored",
3551
+ "status_label": "scored",
3552
+ "scored": true,
3553
  "proxy_scored": false,
3554
+ "raw": 0.0029821307969142615,
3555
+ "raw_text": "0.0030",
3556
+ "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
  "scope": "multi_episode_128_metadata_baseline",
3560
+ "reason": null
3561
  },
3562
  {
3563
  "task_number": 13,
 
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
  "method": "128ep Metadata Simple",
3658
+ "status": "scored",
3659
+ "status_label": "scored",
3660
+ "scored": true,
3661
  "proxy_scored": false,
3662
+ "raw": 0.0001206030150753769,
3663
+ "raw_text": "0.0001",
3664
+ "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
  "scope": "multi_episode_128_metadata_baseline",
3668
+ "reason": null
3669
  },
3670
  {
3671
  "task_number": 14,
 
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
  "method": "128ep Metadata NN",
3676
+ "status": "scored",
3677
+ "status_label": "scored",
3678
+ "scored": true,
3679
  "proxy_scored": false,
3680
+ "raw": 2.086049543676662e-05,
3681
+ "raw_text": "0.0000",
3682
+ "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
  "scope": "multi_episode_128_metadata_baseline",
3686
+ "reason": null
3687
  },
3688
  {
3689
  "task_number": 14,
 
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
  "method": "128ep Metadata Simple",
3784
+ "status": "unsupported_without_required_target",
3785
+ "status_label": "unsupported",
3786
  "scored": false,
3787
  "proxy_scored": false,
3788
  "raw": null,
3789
  "raw_text": "n/a",
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
  "scope": "multi_episode_128_metadata_baseline",
3794
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
3797
  "task_number": 15,
 
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
  "method": "128ep Metadata Simple",
3910
+ "status": "scored",
3911
+ "status_label": "scored",
3912
+ "scored": true,
3913
  "proxy_scored": false,
3914
+ "raw": 0.0,
3915
+ "raw_text": "0.0000",
3916
+ "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
  "scope": "multi_episode_128_metadata_baseline",
3920
+ "reason": null
3921
  },
3922
  {
3923
  "task_number": 16,
 
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
  "method": "128ep Metadata NN",
3928
+ "status": "scored",
3929
+ "status_label": "scored",
3930
+ "scored": true,
3931
  "proxy_scored": false,
3932
+ "raw": 0.0,
3933
+ "raw_text": "0.0000",
3934
+ "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
  "scope": "multi_episode_128_metadata_baseline",
3938
+ "reason": null
3939
  },
3940
  {
3941
  "task_number": 16,
 
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
  "method": "128ep Metadata Simple",
4036
+ "status": "scored",
4037
+ "status_label": "scored",
4038
+ "scored": true,
4039
  "proxy_scored": false,
4040
+ "raw": 0.17656983343047333,
4041
+ "raw_text": "0.1766",
4042
+ "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
  "scope": "multi_episode_128_metadata_baseline",
4046
+ "reason": null
4047
  },
4048
  {
4049
  "task_number": 17,
 
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
  "method": "128ep Metadata NN",
4054
+ "status": "scored",
4055
+ "status_label": "scored",
4056
+ "scored": true,
4057
  "proxy_scored": false,
4058
+ "raw": 0.17418550827844048,
4059
+ "raw_text": "0.1742",
4060
+ "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
  "scope": "multi_episode_128_metadata_baseline",
4064
+ "reason": null
4065
  },
4066
  {
4067
  "task_number": 17,
 
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
  "method": "128ep Metadata Simple",
4162
+ "status": "unsupported_without_required_target",
4163
+ "status_label": "unsupported",
4164
  "scored": false,
4165
  "proxy_scored": false,
4166
  "raw": null,
4167
  "raw_text": "n/a",
4168
  "normalized_score": null,
4169
  "metric_key": "mae",
4170
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
  "scope": "multi_episode_128_metadata_baseline",
4172
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
4173
  },
4174
  {
4175
  "task_number": 18,
 
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
  "method": "128ep Metadata Simple",
4288
+ "status": "unsupported_without_required_target",
4289
+ "status_label": "unsupported",
4290
  "scored": false,
4291
  "proxy_scored": false,
4292
  "raw": null,
4293
  "raw_text": "n/a",
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
  "scope": "multi_episode_128_metadata_baseline",
4298
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
4301
  "task_number": 19,
 
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
  "method": "128ep Metadata Simple",
4414
+ "status": "scored",
4415
+ "status_label": "scored",
4416
+ "scored": true,
4417
  "proxy_scored": false,
4418
+ "raw": 624.8108520507812,
4419
+ "raw_text": "624.81",
4420
+ "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
  "scope": "multi_episode_128_metadata_baseline",
4424
+ "reason": null
4425
  },
4426
  {
4427
  "task_number": 20,
 
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
  "method": "128ep Metadata NN",
4432
+ "status": "scored",
4433
+ "status_label": "scored",
4434
+ "scored": true,
4435
  "proxy_scored": false,
4436
+ "raw": 41.4664421081543,
4437
+ "raw_text": "41.47",
4438
+ "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
  "scope": "multi_episode_128_metadata_baseline",
4442
+ "reason": null
4443
  },
4444
  {
4445
  "task_number": 20,
data/mirror_parity.json CHANGED
The diff for this file is too large to render. See raw diff
 
data/public_surface_qa.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:41:42+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
@@ -18,7 +18,7 @@
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
- "generated_at_utc": "2026-06-18T11:18:05+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
@@ -43,12 +43,12 @@
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
- "generated_at_utc": "2026-06-18T11:18:57+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
- "generated_at_utc": "2026-06-18T11:21:54+00:00"
52
  }
53
  },
54
  "failures": {}
 
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
 
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
+ "generated_at_utc": "2026-06-18T11:41:43+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
 
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
+ "generated_at_utc": "2026-06-18T11:42:48+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
+ "generated_at_utc": "2026-06-18T11:43:59+00:00"
52
  }
53
  },
54
  "failures": {}
data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:42:48+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -215,8 +215,8 @@
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
- "file_count": 1276,
219
- "text_file_count": 1072,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
@@ -226,8 +226,8 @@
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
- "file_count": 1058,
230
- "text_file_count": 879,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
@@ -237,8 +237,8 @@
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
- "file_count": 2537,
241
- "text_file_count": 1085,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
@@ -248,8 +248,8 @@
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
- "file_count": 2956,
252
- "text_file_count": 1247,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:10:47+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
+ "file_count": 1321,
219
+ "text_file_count": 1108,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
 
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
+ "file_count": 1103,
230
+ "text_file_count": 915,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
 
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
+ "file_count": 2582,
241
+ "text_file_count": 1121,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
 
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
+ "file_count": 3001,
252
+ "text_file_count": 1283,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
data/quality_gates.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:20:56+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
data/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:18:06+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:48+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
data/single_episode_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
 
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
data/source_alignment_audit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:18:04+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
 
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:45+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
data/task_method_20_gap_audit.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "generated_at_utc": "2026-06-18T11:15:34+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
- "purpose": "Keep the 57 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
@@ -50,11 +50,12 @@
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
  "scope": "128 selected episodes, JSONL metadata/text only",
53
- "scored_task_count": 8,
54
- "scoreless_task_count": 12,
55
  "status_counts": {
56
- "not_supported_by_metadata_only_package": 12,
57
- "scored": 8
 
58
  }
59
  },
60
  "metadata128_simple": {
@@ -63,12 +64,11 @@
63
  "proxy_scored_task_count": 0,
64
  "result_record_count": 20,
65
  "scope": "128 selected episodes, JSONL metadata/text only",
66
- "scored_task_count": 8,
67
- "scoreless_task_count": 12,
68
  "status_counts": {
69
- "not_supported_by_metadata_only_package": 8,
70
- "scored": 8,
71
- "unsupported_without_required_target": 4
72
  }
73
  },
74
  "minimal": {
@@ -138,18 +138,25 @@
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
- "metadata128_neural_mlp": 12,
142
- "metadata128_simple": 12,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
- "not_supported_by_metadata_only_package": 20,
148
- "unsupported_without_required_target": 4
149
  },
150
  "missing_by_task": {
 
 
 
151
  "02 Procedure Step Recognition": [
152
- "cosmos3_nano_future_window"
 
 
 
 
153
  ],
154
  "05 Hand Trajectory Forecasting": [
155
  "cosmos3_nano_future_window",
@@ -190,14 +197,12 @@
190
  "13 Long-Horizon Next-Action Forecasting": [
191
  "cosmos3_nano_future_window",
192
  "cosmos3_super_reasoner",
193
- "metadata128_neural_mlp",
194
- "metadata128_simple"
195
  ],
196
  "14 Long-Horizon Next-Subtask Forecasting": [
197
  "cosmos3_nano_future_window",
198
  "cosmos3_super_reasoner",
199
- "metadata128_neural_mlp",
200
- "metadata128_simple"
201
  ],
202
  "15 Interaction Text Prediction": [
203
  "cosmos3_nano_future_window",
@@ -208,14 +213,11 @@
208
  ],
209
  "16 Action-Object Relation Prediction": [
210
  "cosmos3_nano_future_window",
211
- "metadata128_neural_mlp",
212
- "metadata128_simple"
213
  ],
214
  "17 Future Object-Set Forecasting": [
215
  "cosmos3_nano_future_window",
216
- "cosmos3_super_reasoner",
217
- "metadata128_neural_mlp",
218
- "metadata128_simple"
219
  ],
220
  "18 IMU-to-Hand Pose Reconstruction": [
221
  "cosmos3_nano_future_window",
@@ -233,12 +235,36 @@
233
  ],
234
  "20 Time-to-Next-Transition Regression": [
235
  "cosmos3_nano_future_window",
236
- "cosmos3_super_reasoner",
237
- "metadata128_neural_mlp",
238
- "metadata128_simple"
239
  ]
240
  },
241
  "missing_records": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  {
243
  "method": "Cosmos3-Nano Future Window",
244
  "metric_key": "macro_f1",
@@ -252,6 +278,19 @@
252
  "task_label": "Procedure Step Recognition",
253
  "task_number": 2
254
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
255
  {
256
  "method": "128ep Metadata Simple",
257
  "metric_key": "mpjpe",
@@ -538,28 +577,15 @@
538
  "task_label": "Multimodal Synchronization Detection",
539
  "task_number": 12
540
  },
541
- {
542
- "method": "128ep Metadata Simple",
543
- "metric_key": "macro_f1",
544
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
545
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
546
- "scope": "multi_episode_128_metadata_baseline",
547
- "series_id": "metadata128_simple",
548
- "status": "not_supported_by_metadata_only_package",
549
- "status_label": "not supported",
550
- "task_id": "long_horizon_next_action",
551
- "task_label": "Long-Horizon Next-Action Forecasting",
552
- "task_number": 13
553
- },
554
  {
555
  "method": "128ep Metadata NN",
556
  "metric_key": "macro_f1",
557
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
558
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
559
  "scope": "multi_episode_128_metadata_baseline",
560
  "series_id": "metadata128_neural_mlp",
561
- "status": "not_supported_by_metadata_only_package",
562
- "status_label": "not supported",
563
  "task_id": "long_horizon_next_action",
564
  "task_label": "Long-Horizon Next-Action Forecasting",
565
  "task_number": 13
@@ -590,28 +616,15 @@
590
  "task_label": "Long-Horizon Next-Action Forecasting",
591
  "task_number": 13
592
  },
593
- {
594
- "method": "128ep Metadata Simple",
595
- "metric_key": "macro_f1",
596
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
597
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
598
- "scope": "multi_episode_128_metadata_baseline",
599
- "series_id": "metadata128_simple",
600
- "status": "not_supported_by_metadata_only_package",
601
- "status_label": "not supported",
602
- "task_id": "next_subtask_forecast",
603
- "task_label": "Long-Horizon Next-Subtask Forecasting",
604
- "task_number": 14
605
- },
606
  {
607
  "method": "128ep Metadata NN",
608
  "metric_key": "macro_f1",
609
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
610
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
611
  "scope": "multi_episode_128_metadata_baseline",
612
  "series_id": "metadata128_neural_mlp",
613
- "status": "not_supported_by_metadata_only_package",
614
- "status_label": "not supported",
615
  "task_id": "next_subtask_forecast",
616
  "task_label": "Long-Horizon Next-Subtask Forecasting",
617
  "task_number": 14
@@ -645,12 +658,12 @@
645
  {
646
  "method": "128ep Metadata Simple",
647
  "metric_key": "macro_f1",
648
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
649
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
650
  "scope": "multi_episode_128_metadata_baseline",
651
  "series_id": "metadata128_simple",
652
- "status": "not_supported_by_metadata_only_package",
653
- "status_label": "not supported",
654
  "task_id": "interaction_text_prediction",
655
  "task_label": "Interaction Text Prediction",
656
  "task_number": 15
@@ -707,28 +720,15 @@
707
  "task_label": "Interaction Text Prediction",
708
  "task_number": 15
709
  },
710
- {
711
- "method": "128ep Metadata Simple",
712
- "metric_key": "macro_f1",
713
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
714
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
715
- "scope": "multi_episode_128_metadata_baseline",
716
- "series_id": "metadata128_simple",
717
- "status": "not_supported_by_metadata_only_package",
718
- "status_label": "not supported",
719
- "task_id": "action_object_relation",
720
- "task_label": "Action-Object Relation Prediction",
721
- "task_number": 16
722
- },
723
  {
724
  "method": "128ep Metadata NN",
725
  "metric_key": "macro_f1",
726
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
727
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
728
  "scope": "multi_episode_128_metadata_baseline",
729
  "series_id": "metadata128_neural_mlp",
730
- "status": "not_supported_by_metadata_only_package",
731
- "status_label": "not supported",
732
  "task_id": "action_object_relation",
733
  "task_label": "Action-Object Relation Prediction",
734
  "task_number": 16
@@ -746,32 +746,6 @@
746
  "task_label": "Action-Object Relation Prediction",
747
  "task_number": 16
748
  },
749
- {
750
- "method": "128ep Metadata Simple",
751
- "metric_key": "micro_f1",
752
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
753
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
754
- "scope": "multi_episode_128_metadata_baseline",
755
- "series_id": "metadata128_simple",
756
- "status": "not_supported_by_metadata_only_package",
757
- "status_label": "not supported",
758
- "task_id": "object_set_forecast",
759
- "task_label": "Future Object-Set Forecasting",
760
- "task_number": 17
761
- },
762
- {
763
- "method": "128ep Metadata NN",
764
- "metric_key": "micro_f1",
765
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
766
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
767
- "scope": "multi_episode_128_metadata_baseline",
768
- "series_id": "metadata128_neural_mlp",
769
- "status": "not_supported_by_metadata_only_package",
770
- "status_label": "not supported",
771
- "task_id": "object_set_forecast",
772
- "task_label": "Future Object-Set Forecasting",
773
- "task_number": 17
774
- },
775
  {
776
  "method": "Cosmos3-Super Reasoner",
777
  "metric_key": "micro_f1",
@@ -801,12 +775,12 @@
801
  {
802
  "method": "128ep Metadata Simple",
803
  "metric_key": "mae",
804
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
805
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
806
  "scope": "multi_episode_128_metadata_baseline",
807
  "series_id": "metadata128_simple",
808
- "status": "not_supported_by_metadata_only_package",
809
- "status_label": "not supported",
810
  "task_id": "imu_to_hand_pose",
811
  "task_label": "IMU-to-Hand Pose Reconstruction",
812
  "task_number": 18
@@ -866,12 +840,12 @@
866
  {
867
  "method": "128ep Metadata Simple",
868
  "metric_key": "mrr",
869
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
870
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
871
  "scope": "multi_episode_128_metadata_baseline",
872
  "series_id": "metadata128_simple",
873
- "status": "not_supported_by_metadata_only_package",
874
- "status_label": "not supported",
875
  "task_id": "camera_view_sync_retrieval",
876
  "task_label": "Camera-View Synchronization Retrieval",
877
  "task_number": 19
@@ -928,32 +902,6 @@
928
  "task_label": "Camera-View Synchronization Retrieval",
929
  "task_number": 19
930
  },
931
- {
932
- "method": "128ep Metadata Simple",
933
- "metric_key": "mae",
934
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
935
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
936
- "scope": "multi_episode_128_metadata_baseline",
937
- "series_id": "metadata128_simple",
938
- "status": "not_supported_by_metadata_only_package",
939
- "status_label": "not supported",
940
- "task_id": "time_to_transition",
941
- "task_label": "Time-to-Next-Transition Regression",
942
- "task_number": 20
943
- },
944
- {
945
- "method": "128ep Metadata NN",
946
- "metric_key": "mae",
947
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
948
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
949
- "scope": "multi_episode_128_metadata_baseline",
950
- "series_id": "metadata128_neural_mlp",
951
- "status": "not_supported_by_metadata_only_package",
952
- "status_label": "not supported",
953
- "task_id": "time_to_transition",
954
- "task_label": "Time-to-Next-Transition Regression",
955
- "task_number": 20
956
- },
957
  {
958
  "method": "Cosmos3-Super Reasoner",
959
  "metric_key": "mae",
@@ -1027,8 +975,8 @@
1027
  "method_count": 9,
1028
  "method_task_record_count": 180,
1029
  "proxy_scored_method_task_count": 4,
1030
- "scored_method_task_count": 123,
1031
- "scoreless_method_task_count": 57,
1032
  "task_count": 20
1033
  },
1034
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
 
1
  {
2
+ "generated_at_utc": "2026-06-18T12:07:14+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
+ "purpose": "Keep the 53 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
 
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
  "scope": "128 selected episodes, JSONL metadata/text only",
53
+ "scored_task_count": 7,
54
+ "scoreless_task_count": 13,
55
  "status_counts": {
56
+ "not_supported_by_metadata_only_package": 7,
57
+ "scored": 7,
58
+ "unsupported_without_required_target": 6
59
  }
60
  },
61
  "metadata128_simple": {
 
64
  "proxy_scored_task_count": 0,
65
  "result_record_count": 20,
66
  "scope": "128 selected episodes, JSONL metadata/text only",
67
+ "scored_task_count": 13,
68
+ "scoreless_task_count": 7,
69
  "status_counts": {
70
+ "scored": 13,
71
+ "unsupported_without_required_target": 7
 
72
  }
73
  },
74
  "minimal": {
 
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
+ "metadata128_neural_mlp": 13,
142
+ "metadata128_simple": 7,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
+ "not_supported_by_metadata_only_package": 7,
148
+ "unsupported_without_required_target": 13
149
  },
150
  "missing_by_task": {
151
+ "01 Action Recognition": [
152
+ "metadata128_neural_mlp"
153
+ ],
154
  "02 Procedure Step Recognition": [
155
+ "cosmos3_nano_future_window",
156
+ "metadata128_neural_mlp"
157
+ ],
158
+ "04 Next-Action Prediction": [
159
+ "metadata128_neural_mlp"
160
  ],
161
  "05 Hand Trajectory Forecasting": [
162
  "cosmos3_nano_future_window",
 
197
  "13 Long-Horizon Next-Action Forecasting": [
198
  "cosmos3_nano_future_window",
199
  "cosmos3_super_reasoner",
200
+ "metadata128_neural_mlp"
 
201
  ],
202
  "14 Long-Horizon Next-Subtask Forecasting": [
203
  "cosmos3_nano_future_window",
204
  "cosmos3_super_reasoner",
205
+ "metadata128_neural_mlp"
 
206
  ],
207
  "15 Interaction Text Prediction": [
208
  "cosmos3_nano_future_window",
 
213
  ],
214
  "16 Action-Object Relation Prediction": [
215
  "cosmos3_nano_future_window",
216
+ "metadata128_neural_mlp"
 
217
  ],
218
  "17 Future Object-Set Forecasting": [
219
  "cosmos3_nano_future_window",
220
+ "cosmos3_super_reasoner"
 
 
221
  ],
222
  "18 IMU-to-Hand Pose Reconstruction": [
223
  "cosmos3_nano_future_window",
 
235
  ],
236
  "20 Time-to-Next-Transition Regression": [
237
  "cosmos3_nano_future_window",
238
+ "cosmos3_super_reasoner"
 
 
239
  ]
240
  },
241
  "missing_records": [
242
+ {
243
+ "method": "128ep Metadata NN",
244
+ "metric_key": "macro_f1",
245
+ "reason": "train class count 896 exceeds --max-neural-classes 512",
246
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
247
+ "scope": "multi_episode_128_metadata_baseline",
248
+ "series_id": "metadata128_neural_mlp",
249
+ "status": "unsupported_without_required_target",
250
+ "status_label": "unsupported",
251
+ "task_id": "timeline_action",
252
+ "task_label": "Action Recognition",
253
+ "task_number": 1
254
+ },
255
+ {
256
+ "method": "128ep Metadata NN",
257
+ "metric_key": "macro_f1",
258
+ "reason": "train class count 652 exceeds --max-neural-classes 512",
259
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
260
+ "scope": "multi_episode_128_metadata_baseline",
261
+ "series_id": "metadata128_neural_mlp",
262
+ "status": "unsupported_without_required_target",
263
+ "status_label": "unsupported",
264
+ "task_id": "timeline_subtask",
265
+ "task_label": "Procedure Step Recognition",
266
+ "task_number": 2
267
+ },
268
  {
269
  "method": "Cosmos3-Nano Future Window",
270
  "metric_key": "macro_f1",
 
278
  "task_label": "Procedure Step Recognition",
279
  "task_number": 2
280
  },
281
+ {
282
+ "method": "128ep Metadata NN",
283
+ "metric_key": "macro_f1",
284
+ "reason": "train class count 891 exceeds --max-neural-classes 512",
285
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
286
+ "scope": "multi_episode_128_metadata_baseline",
287
+ "series_id": "metadata128_neural_mlp",
288
+ "status": "unsupported_without_required_target",
289
+ "status_label": "unsupported",
290
+ "task_id": "next_action",
291
+ "task_label": "Next-Action Prediction",
292
+ "task_number": 4
293
+ },
294
  {
295
  "method": "128ep Metadata Simple",
296
  "metric_key": "mpjpe",
 
577
  "task_label": "Multimodal Synchronization Detection",
578
  "task_number": 12
579
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
580
  {
581
  "method": "128ep Metadata NN",
582
  "metric_key": "macro_f1",
583
+ "reason": "train class count 887 exceeds --max-neural-classes 512",
584
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
585
  "scope": "multi_episode_128_metadata_baseline",
586
  "series_id": "metadata128_neural_mlp",
587
+ "status": "unsupported_without_required_target",
588
+ "status_label": "unsupported",
589
  "task_id": "long_horizon_next_action",
590
  "task_label": "Long-Horizon Next-Action Forecasting",
591
  "task_number": 13
 
616
  "task_label": "Long-Horizon Next-Action Forecasting",
617
  "task_number": 13
618
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
619
  {
620
  "method": "128ep Metadata NN",
621
  "metric_key": "macro_f1",
622
+ "reason": "train class count 651 exceeds --max-neural-classes 512",
623
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
624
  "scope": "multi_episode_128_metadata_baseline",
625
  "series_id": "metadata128_neural_mlp",
626
+ "status": "unsupported_without_required_target",
627
+ "status_label": "unsupported",
628
  "task_id": "next_subtask_forecast",
629
  "task_label": "Long-Horizon Next-Subtask Forecasting",
630
  "task_number": 14
 
658
  {
659
  "method": "128ep Metadata Simple",
660
  "metric_key": "macro_f1",
661
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
662
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
663
  "scope": "multi_episode_128_metadata_baseline",
664
  "series_id": "metadata128_simple",
665
+ "status": "unsupported_without_required_target",
666
+ "status_label": "unsupported",
667
  "task_id": "interaction_text_prediction",
668
  "task_label": "Interaction Text Prediction",
669
  "task_number": 15
 
720
  "task_label": "Interaction Text Prediction",
721
  "task_number": 15
722
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
723
  {
724
  "method": "128ep Metadata NN",
725
  "metric_key": "macro_f1",
726
+ "reason": "train class count 3058 exceeds --max-neural-classes 512",
727
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
728
  "scope": "multi_episode_128_metadata_baseline",
729
  "series_id": "metadata128_neural_mlp",
730
+ "status": "unsupported_without_required_target",
731
+ "status_label": "unsupported",
732
  "task_id": "action_object_relation",
733
  "task_label": "Action-Object Relation Prediction",
734
  "task_number": 16
 
746
  "task_label": "Action-Object Relation Prediction",
747
  "task_number": 16
748
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
749
  {
750
  "method": "Cosmos3-Super Reasoner",
751
  "metric_key": "micro_f1",
 
775
  {
776
  "method": "128ep Metadata Simple",
777
  "metric_key": "mae",
778
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
779
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
780
  "scope": "multi_episode_128_metadata_baseline",
781
  "series_id": "metadata128_simple",
782
+ "status": "unsupported_without_required_target",
783
+ "status_label": "unsupported",
784
  "task_id": "imu_to_hand_pose",
785
  "task_label": "IMU-to-Hand Pose Reconstruction",
786
  "task_number": 18
 
840
  {
841
  "method": "128ep Metadata Simple",
842
  "metric_key": "mrr",
843
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
844
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
845
  "scope": "multi_episode_128_metadata_baseline",
846
  "series_id": "metadata128_simple",
847
+ "status": "unsupported_without_required_target",
848
+ "status_label": "unsupported",
849
  "task_id": "camera_view_sync_retrieval",
850
  "task_label": "Camera-View Synchronization Retrieval",
851
  "task_number": 19
 
902
  "task_label": "Camera-View Synchronization Retrieval",
903
  "task_number": 19
904
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
905
  {
906
  "method": "Cosmos3-Super Reasoner",
907
  "metric_key": "mae",
 
975
  "method_count": 9,
976
  "method_task_record_count": 180,
977
  "proxy_scored_method_task_count": 4,
978
+ "scored_method_task_count": 127,
979
+ "scoreless_method_task_count": 53,
980
  "task_count": 20
981
  },
982
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
data/task_method_20_result_matrix.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 123,
9
  "series": [
10
  {
11
  "id": "minimal",
@@ -64,18 +64,17 @@
64
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
- "scored_task_count": 8,
68
- "covered_task_count": 8,
69
  "proxy_scored_task_count": 0,
70
- "scoreless_task_count": 12,
71
- "unsupported_task_count": 12,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
- "not_supported_by_metadata_only_package": 8,
75
- "scored": 8,
76
- "unsupported_without_required_target": 4
77
  },
78
- "coverage_fraction": 0.4,
79
  "result_record_fraction": 1.0
80
  },
81
  {
@@ -89,17 +88,17 @@
89
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
90
  "plotted_as": "colored point overlay",
91
  "result_record_count": 20,
92
- "scored_task_count": 8,
93
- "covered_task_count": 8,
94
  "proxy_scored_task_count": 0,
95
- "scoreless_task_count": 12,
96
- "unsupported_task_count": 12,
97
  "not_evaluated_task_count": 0,
98
  "status_counts": {
99
- "not_supported_by_metadata_only_package": 12,
100
- "scored": 8
101
  },
102
- "coverage_fraction": 0.4,
103
  "result_record_fraction": 1.0
104
  },
105
  {
@@ -2210,17 +2209,17 @@
2210
  "task_label": "Long-Horizon Next-Action Forecasting",
2211
  "series_id": "metadata128_simple",
2212
  "method": "128ep Metadata Simple",
2213
- "status": "not_supported_by_metadata_only_package",
2214
- "status_label": "not supported",
2215
- "scored": false,
2216
  "proxy_scored": false,
2217
- "raw": null,
2218
- "raw_text": "n/a",
2219
- "normalized_score": null,
2220
  "metric_key": "macro_f1",
2221
- "source": null,
2222
  "scope": "multi_episode_128_metadata_baseline",
2223
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2224
  },
2225
  {
2226
  "task_number": 13,
@@ -2228,17 +2227,17 @@
2228
  "task_label": "Long-Horizon Next-Action Forecasting",
2229
  "series_id": "metadata128_neural_mlp",
2230
  "method": "128ep Metadata NN",
2231
- "status": "not_supported_by_metadata_only_package",
2232
- "status_label": "not supported",
2233
- "scored": false,
2234
  "proxy_scored": false,
2235
- "raw": null,
2236
- "raw_text": "n/a",
2237
- "normalized_score": null,
2238
  "metric_key": "macro_f1",
2239
- "source": null,
2240
  "scope": "multi_episode_128_metadata_baseline",
2241
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2242
  },
2243
  {
2244
  "task_number": 13,
@@ -2372,17 +2371,17 @@
2372
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2373
  "series_id": "metadata128_simple",
2374
  "method": "128ep Metadata Simple",
2375
- "status": "not_supported_by_metadata_only_package",
2376
- "status_label": "not supported",
2377
- "scored": false,
2378
  "proxy_scored": false,
2379
- "raw": null,
2380
- "raw_text": "n/a",
2381
- "normalized_score": null,
2382
  "metric_key": "macro_f1",
2383
- "source": null,
2384
  "scope": "multi_episode_128_metadata_baseline",
2385
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2386
  },
2387
  {
2388
  "task_number": 14,
@@ -2390,17 +2389,17 @@
2390
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2391
  "series_id": "metadata128_neural_mlp",
2392
  "method": "128ep Metadata NN",
2393
- "status": "not_supported_by_metadata_only_package",
2394
- "status_label": "not supported",
2395
- "scored": false,
2396
  "proxy_scored": false,
2397
- "raw": null,
2398
- "raw_text": "n/a",
2399
- "normalized_score": null,
2400
  "metric_key": "macro_f1",
2401
- "source": null,
2402
  "scope": "multi_episode_128_metadata_baseline",
2403
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2404
  },
2405
  {
2406
  "task_number": 14,
@@ -2534,17 +2533,17 @@
2534
  "task_label": "Interaction Text Prediction",
2535
  "series_id": "metadata128_simple",
2536
  "method": "128ep Metadata Simple",
2537
- "status": "not_supported_by_metadata_only_package",
2538
- "status_label": "not supported",
2539
  "scored": false,
2540
  "proxy_scored": false,
2541
  "raw": null,
2542
  "raw_text": "n/a",
2543
  "normalized_score": null,
2544
  "metric_key": "macro_f1",
2545
- "source": null,
2546
  "scope": "multi_episode_128_metadata_baseline",
2547
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2548
  },
2549
  {
2550
  "task_number": 15,
@@ -2696,17 +2695,17 @@
2696
  "task_label": "Action-Object Relation Prediction",
2697
  "series_id": "metadata128_simple",
2698
  "method": "128ep Metadata Simple",
2699
- "status": "not_supported_by_metadata_only_package",
2700
- "status_label": "not supported",
2701
- "scored": false,
2702
  "proxy_scored": false,
2703
- "raw": null,
2704
- "raw_text": "n/a",
2705
- "normalized_score": null,
2706
  "metric_key": "macro_f1",
2707
- "source": null,
2708
  "scope": "multi_episode_128_metadata_baseline",
2709
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2710
  },
2711
  {
2712
  "task_number": 16,
@@ -2714,17 +2713,17 @@
2714
  "task_label": "Action-Object Relation Prediction",
2715
  "series_id": "metadata128_neural_mlp",
2716
  "method": "128ep Metadata NN",
2717
- "status": "not_supported_by_metadata_only_package",
2718
- "status_label": "not supported",
2719
- "scored": false,
2720
  "proxy_scored": false,
2721
- "raw": null,
2722
- "raw_text": "n/a",
2723
- "normalized_score": null,
2724
  "metric_key": "macro_f1",
2725
- "source": null,
2726
  "scope": "multi_episode_128_metadata_baseline",
2727
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2728
  },
2729
  {
2730
  "task_number": 16,
@@ -2858,17 +2857,17 @@
2858
  "task_label": "Future Object-Set Forecasting",
2859
  "series_id": "metadata128_simple",
2860
  "method": "128ep Metadata Simple",
2861
- "status": "not_supported_by_metadata_only_package",
2862
- "status_label": "not supported",
2863
- "scored": false,
2864
  "proxy_scored": false,
2865
- "raw": null,
2866
- "raw_text": "n/a",
2867
- "normalized_score": null,
2868
  "metric_key": "micro_f1",
2869
- "source": null,
2870
  "scope": "multi_episode_128_metadata_baseline",
2871
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2872
  },
2873
  {
2874
  "task_number": 17,
@@ -2876,17 +2875,17 @@
2876
  "task_label": "Future Object-Set Forecasting",
2877
  "series_id": "metadata128_neural_mlp",
2878
  "method": "128ep Metadata NN",
2879
- "status": "not_supported_by_metadata_only_package",
2880
- "status_label": "not supported",
2881
- "scored": false,
2882
  "proxy_scored": false,
2883
- "raw": null,
2884
- "raw_text": "n/a",
2885
- "normalized_score": null,
2886
  "metric_key": "micro_f1",
2887
- "source": null,
2888
  "scope": "multi_episode_128_metadata_baseline",
2889
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2890
  },
2891
  {
2892
  "task_number": 17,
@@ -3020,17 +3019,17 @@
3020
  "task_label": "IMU-to-Hand Pose Reconstruction",
3021
  "series_id": "metadata128_simple",
3022
  "method": "128ep Metadata Simple",
3023
- "status": "not_supported_by_metadata_only_package",
3024
- "status_label": "not supported",
3025
  "scored": false,
3026
  "proxy_scored": false,
3027
  "raw": null,
3028
  "raw_text": "n/a",
3029
  "normalized_score": null,
3030
  "metric_key": "mae",
3031
- "source": null,
3032
  "scope": "multi_episode_128_metadata_baseline",
3033
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3034
  },
3035
  {
3036
  "task_number": 18,
@@ -3182,17 +3181,17 @@
3182
  "task_label": "Camera-View Synchronization Retrieval",
3183
  "series_id": "metadata128_simple",
3184
  "method": "128ep Metadata Simple",
3185
- "status": "not_supported_by_metadata_only_package",
3186
- "status_label": "not supported",
3187
  "scored": false,
3188
  "proxy_scored": false,
3189
  "raw": null,
3190
  "raw_text": "n/a",
3191
  "normalized_score": null,
3192
  "metric_key": "mrr",
3193
- "source": null,
3194
  "scope": "multi_episode_128_metadata_baseline",
3195
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3196
  },
3197
  {
3198
  "task_number": 19,
@@ -3344,17 +3343,17 @@
3344
  "task_label": "Time-to-Next-Transition Regression",
3345
  "series_id": "metadata128_simple",
3346
  "method": "128ep Metadata Simple",
3347
- "status": "not_supported_by_metadata_only_package",
3348
- "status_label": "not supported",
3349
- "scored": false,
3350
  "proxy_scored": false,
3351
- "raw": null,
3352
- "raw_text": "n/a",
3353
- "normalized_score": null,
3354
  "metric_key": "mae",
3355
- "source": null,
3356
  "scope": "multi_episode_128_metadata_baseline",
3357
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3358
  },
3359
  {
3360
  "task_number": 20,
@@ -3362,17 +3361,17 @@
3362
  "task_label": "Time-to-Next-Transition Regression",
3363
  "series_id": "metadata128_neural_mlp",
3364
  "method": "128ep Metadata NN",
3365
- "status": "not_supported_by_metadata_only_package",
3366
- "status_label": "not supported",
3367
- "scored": false,
3368
  "proxy_scored": false,
3369
- "raw": null,
3370
- "raw_text": "n/a",
3371
- "normalized_score": null,
3372
  "metric_key": "mae",
3373
- "source": null,
3374
  "scope": "multi_episode_128_metadata_baseline",
3375
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3376
  },
3377
  {
3378
  "task_number": 20,
 
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 133,
9
  "series": [
10
  {
11
  "id": "minimal",
 
64
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
+ "scored_task_count": 13,
68
+ "covered_task_count": 13,
69
  "proxy_scored_task_count": 0,
70
+ "scoreless_task_count": 7,
71
+ "unsupported_task_count": 7,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
+ "scored": 13,
75
+ "unsupported_without_required_target": 7
 
76
  },
77
+ "coverage_fraction": 0.65,
78
  "result_record_fraction": 1.0
79
  },
80
  {
 
88
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
89
  "plotted_as": "colored point overlay",
90
  "result_record_count": 20,
91
+ "scored_task_count": 13,
92
+ "covered_task_count": 13,
93
  "proxy_scored_task_count": 0,
94
+ "scoreless_task_count": 7,
95
+ "unsupported_task_count": 7,
96
  "not_evaluated_task_count": 0,
97
  "status_counts": {
98
+ "not_supported_by_metadata_only_package": 7,
99
+ "scored": 13
100
  },
101
+ "coverage_fraction": 0.65,
102
  "result_record_fraction": 1.0
103
  },
104
  {
 
2209
  "task_label": "Long-Horizon Next-Action Forecasting",
2210
  "series_id": "metadata128_simple",
2211
  "method": "128ep Metadata Simple",
2212
+ "status": "scored",
2213
+ "status_label": "scored",
2214
+ "scored": true,
2215
  "proxy_scored": false,
2216
+ "raw": 0.004579592783699693,
2217
+ "raw_text": "0.0046",
2218
+ "normalized_score": 0.004579592783699693,
2219
  "metric_key": "macro_f1",
2220
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
2221
  "scope": "multi_episode_128_metadata_baseline",
2222
+ "reason": null
2223
  },
2224
  {
2225
  "task_number": 13,
 
2227
  "task_label": "Long-Horizon Next-Action Forecasting",
2228
  "series_id": "metadata128_neural_mlp",
2229
  "method": "128ep Metadata NN",
2230
+ "status": "scored",
2231
+ "status_label": "scored",
2232
+ "scored": true,
2233
  "proxy_scored": false,
2234
+ "raw": 0.0029821307969142615,
2235
+ "raw_text": "0.0030",
2236
+ "normalized_score": 0.0029821307969142615,
2237
  "metric_key": "macro_f1",
2238
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
2239
  "scope": "multi_episode_128_metadata_baseline",
2240
+ "reason": null
2241
  },
2242
  {
2243
  "task_number": 13,
 
2371
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2372
  "series_id": "metadata128_simple",
2373
  "method": "128ep Metadata Simple",
2374
+ "status": "scored",
2375
+ "status_label": "scored",
2376
+ "scored": true,
2377
  "proxy_scored": false,
2378
+ "raw": 0.0001206030150753769,
2379
+ "raw_text": "0.0001",
2380
+ "normalized_score": 0.0001206030150753769,
2381
  "metric_key": "macro_f1",
2382
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
2383
  "scope": "multi_episode_128_metadata_baseline",
2384
+ "reason": null
2385
  },
2386
  {
2387
  "task_number": 14,
 
2389
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2390
  "series_id": "metadata128_neural_mlp",
2391
  "method": "128ep Metadata NN",
2392
+ "status": "scored",
2393
+ "status_label": "scored",
2394
+ "scored": true,
2395
  "proxy_scored": false,
2396
+ "raw": 2.086049543676662e-05,
2397
+ "raw_text": "0.0000",
2398
+ "normalized_score": 2.086049543676662e-05,
2399
  "metric_key": "macro_f1",
2400
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
2401
  "scope": "multi_episode_128_metadata_baseline",
2402
+ "reason": null
2403
  },
2404
  {
2405
  "task_number": 14,
 
2533
  "task_label": "Interaction Text Prediction",
2534
  "series_id": "metadata128_simple",
2535
  "method": "128ep Metadata Simple",
2536
+ "status": "unsupported_without_required_target",
2537
+ "status_label": "unsupported",
2538
  "scored": false,
2539
  "proxy_scored": false,
2540
  "raw": null,
2541
  "raw_text": "n/a",
2542
  "normalized_score": null,
2543
  "metric_key": "macro_f1",
2544
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
2545
  "scope": "multi_episode_128_metadata_baseline",
2546
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
2547
  },
2548
  {
2549
  "task_number": 15,
 
2695
  "task_label": "Action-Object Relation Prediction",
2696
  "series_id": "metadata128_simple",
2697
  "method": "128ep Metadata Simple",
2698
+ "status": "scored",
2699
+ "status_label": "scored",
2700
+ "scored": true,
2701
  "proxy_scored": false,
2702
+ "raw": 0.0,
2703
+ "raw_text": "0.0000",
2704
+ "normalized_score": 0.0,
2705
  "metric_key": "macro_f1",
2706
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
2707
  "scope": "multi_episode_128_metadata_baseline",
2708
+ "reason": null
2709
  },
2710
  {
2711
  "task_number": 16,
 
2713
  "task_label": "Action-Object Relation Prediction",
2714
  "series_id": "metadata128_neural_mlp",
2715
  "method": "128ep Metadata NN",
2716
+ "status": "scored",
2717
+ "status_label": "scored",
2718
+ "scored": true,
2719
  "proxy_scored": false,
2720
+ "raw": 0.0,
2721
+ "raw_text": "0.0000",
2722
+ "normalized_score": 0.0,
2723
  "metric_key": "macro_f1",
2724
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
2725
  "scope": "multi_episode_128_metadata_baseline",
2726
+ "reason": null
2727
  },
2728
  {
2729
  "task_number": 16,
 
2857
  "task_label": "Future Object-Set Forecasting",
2858
  "series_id": "metadata128_simple",
2859
  "method": "128ep Metadata Simple",
2860
+ "status": "scored",
2861
+ "status_label": "scored",
2862
+ "scored": true,
2863
  "proxy_scored": false,
2864
+ "raw": 0.17656983343047333,
2865
+ "raw_text": "0.1766",
2866
+ "normalized_score": 0.17656983343047333,
2867
  "metric_key": "micro_f1",
2868
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2869
  "scope": "multi_episode_128_metadata_baseline",
2870
+ "reason": null
2871
  },
2872
  {
2873
  "task_number": 17,
 
2875
  "task_label": "Future Object-Set Forecasting",
2876
  "series_id": "metadata128_neural_mlp",
2877
  "method": "128ep Metadata NN",
2878
+ "status": "scored",
2879
+ "status_label": "scored",
2880
+ "scored": true,
2881
  "proxy_scored": false,
2882
+ "raw": 0.17418550827844048,
2883
+ "raw_text": "0.1742",
2884
+ "normalized_score": 0.17418550827844048,
2885
  "metric_key": "micro_f1",
2886
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2887
  "scope": "multi_episode_128_metadata_baseline",
2888
+ "reason": null
2889
  },
2890
  {
2891
  "task_number": 17,
 
3019
  "task_label": "IMU-to-Hand Pose Reconstruction",
3020
  "series_id": "metadata128_simple",
3021
  "method": "128ep Metadata Simple",
3022
+ "status": "unsupported_without_required_target",
3023
+ "status_label": "unsupported",
3024
  "scored": false,
3025
  "proxy_scored": false,
3026
  "raw": null,
3027
  "raw_text": "n/a",
3028
  "normalized_score": null,
3029
  "metric_key": "mae",
3030
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
3031
  "scope": "multi_episode_128_metadata_baseline",
3032
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
3033
  },
3034
  {
3035
  "task_number": 18,
 
3181
  "task_label": "Camera-View Synchronization Retrieval",
3182
  "series_id": "metadata128_simple",
3183
  "method": "128ep Metadata Simple",
3184
+ "status": "unsupported_without_required_target",
3185
+ "status_label": "unsupported",
3186
  "scored": false,
3187
  "proxy_scored": false,
3188
  "raw": null,
3189
  "raw_text": "n/a",
3190
  "normalized_score": null,
3191
  "metric_key": "mrr",
3192
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
3193
  "scope": "multi_episode_128_metadata_baseline",
3194
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
3195
  },
3196
  {
3197
  "task_number": 19,
 
3343
  "task_label": "Time-to-Next-Transition Regression",
3344
  "series_id": "metadata128_simple",
3345
  "method": "128ep Metadata Simple",
3346
+ "status": "scored",
3347
+ "status_label": "scored",
3348
+ "scored": true,
3349
  "proxy_scored": false,
3350
+ "raw": 624.8108520507812,
3351
+ "raw_text": "624.81",
3352
+ "normalized_score": 0.016864874132806403,
3353
  "metric_key": "mae",
3354
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
3355
  "scope": "multi_episode_128_metadata_baseline",
3356
+ "reason": null
3357
  },
3358
  {
3359
  "task_number": 20,
 
3361
  "task_label": "Time-to-Next-Transition Regression",
3362
  "series_id": "metadata128_neural_mlp",
3363
  "method": "128ep Metadata NN",
3364
+ "status": "scored",
3365
+ "status_label": "scored",
3366
+ "scored": true,
3367
  "proxy_scored": false,
3368
+ "raw": 41.4664421081543,
3369
+ "raw_text": "41.47",
3370
+ "normalized_score": 0.25411768748242325,
3371
  "metric_key": "mae",
3372
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
3373
  "scope": "multi_episode_128_metadata_baseline",
3374
+ "reason": null
3375
  },
3376
  {
3377
  "task_number": 20,
data/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:18:04+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:25+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
data/unified_task_model_radar.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 123,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -73,18 +73,17 @@
73
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
- "scored_task_count": 8,
77
- "covered_task_count": 8,
78
  "proxy_scored_task_count": 0,
79
- "scoreless_task_count": 12,
80
- "unsupported_task_count": 12,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
- "not_supported_by_metadata_only_package": 8,
84
- "scored": 8,
85
- "unsupported_without_required_target": 4
86
  },
87
- "coverage_fraction": 0.4,
88
  "result_record_fraction": 1.0
89
  },
90
  {
@@ -98,17 +97,17 @@
98
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
99
  "plotted_as": "colored point overlay",
100
  "result_record_count": 20,
101
- "scored_task_count": 8,
102
- "covered_task_count": 8,
103
  "proxy_scored_task_count": 0,
104
- "scoreless_task_count": 12,
105
- "unsupported_task_count": 12,
106
  "not_evaluated_task_count": 0,
107
  "status_counts": {
108
- "not_supported_by_metadata_only_package": 12,
109
- "scored": 8
110
  },
111
- "coverage_fraction": 0.4,
112
  "result_record_fraction": 1.0
113
  },
114
  {
@@ -1608,6 +1607,28 @@
1608
  "raw_text": "0.0023",
1609
  "status_label": "scored"
1610
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1611
  "raw128_simple": {
1612
  "raw": 0.0024280172369056294,
1613
  "metric_key": "macro_f1",
@@ -1630,28 +1651,6 @@
1630
  "raw_text": "0.0011",
1631
  "status_label": "scored"
1632
  },
1633
- "metadata128_simple": {
1634
- "raw": null,
1635
- "metric_key": "macro_f1",
1636
- "source": null,
1637
- "scope": "multi_episode_128_metadata_baseline",
1638
- "status": "not_supported_by_metadata_only_package",
1639
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1640
- "normalized_score": null,
1641
- "raw_text": "n/a",
1642
- "status_label": "not supported"
1643
- },
1644
- "metadata128_neural_mlp": {
1645
- "raw": null,
1646
- "metric_key": "macro_f1",
1647
- "source": null,
1648
- "scope": "multi_episode_128_metadata_baseline",
1649
- "status": "not_supported_by_metadata_only_package",
1650
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1651
- "normalized_score": null,
1652
- "raw_text": "n/a",
1653
- "status_label": "not supported"
1654
- },
1655
  "cosmos3_super_reasoner": {
1656
  "raw": null,
1657
  "metric_key": "macro_f1",
@@ -1719,6 +1718,28 @@
1719
  "raw_text": "0.0042",
1720
  "status_label": "scored"
1721
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1722
  "raw128_simple": {
1723
  "raw": 0.0,
1724
  "metric_key": "macro_f1",
@@ -1741,28 +1762,6 @@
1741
  "raw_text": "0.0000",
1742
  "status_label": "scored"
1743
  },
1744
- "metadata128_simple": {
1745
- "raw": null,
1746
- "metric_key": "macro_f1",
1747
- "source": null,
1748
- "scope": "multi_episode_128_metadata_baseline",
1749
- "status": "not_supported_by_metadata_only_package",
1750
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1751
- "normalized_score": null,
1752
- "raw_text": "n/a",
1753
- "status_label": "not supported"
1754
- },
1755
- "metadata128_neural_mlp": {
1756
- "raw": null,
1757
- "metric_key": "macro_f1",
1758
- "source": null,
1759
- "scope": "multi_episode_128_metadata_baseline",
1760
- "status": "not_supported_by_metadata_only_package",
1761
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1762
- "normalized_score": null,
1763
- "raw_text": "n/a",
1764
- "status_label": "not supported"
1765
- },
1766
  "cosmos3_super_reasoner": {
1767
  "raw": null,
1768
  "metric_key": "macro_f1",
@@ -1819,6 +1818,17 @@
1819
  "raw_text": "0.0381",
1820
  "status_label": "scored"
1821
  },
 
 
 
 
 
 
 
 
 
 
 
1822
  "raw128_simple": {
1823
  "raw": 0.012611998261547169,
1824
  "metric_key": "macro_f1",
@@ -1841,17 +1851,6 @@
1841
  "raw_text": "0.0098",
1842
  "status_label": "proxy scored"
1843
  },
1844
- "metadata128_simple": {
1845
- "raw": null,
1846
- "metric_key": "macro_f1",
1847
- "source": null,
1848
- "scope": "multi_episode_128_metadata_baseline",
1849
- "status": "not_supported_by_metadata_only_package",
1850
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1851
- "normalized_score": null,
1852
- "raw_text": "n/a",
1853
- "status_label": "not supported"
1854
- },
1855
  "metadata128_neural_mlp": {
1856
  "raw": null,
1857
  "metric_key": "macro_f1",
@@ -1952,6 +1951,28 @@
1952
  "raw_text": "0.0000",
1953
  "status_label": "scored"
1954
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1955
  "raw128_simple": {
1956
  "raw": 0.0,
1957
  "metric_key": "macro_f1",
@@ -1974,28 +1995,6 @@
1974
  "raw_text": "0.0000",
1975
  "status_label": "scored"
1976
  },
1977
- "metadata128_simple": {
1978
- "raw": null,
1979
- "metric_key": "macro_f1",
1980
- "source": null,
1981
- "scope": "multi_episode_128_metadata_baseline",
1982
- "status": "not_supported_by_metadata_only_package",
1983
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1984
- "normalized_score": null,
1985
- "raw_text": "n/a",
1986
- "status_label": "not supported"
1987
- },
1988
- "metadata128_neural_mlp": {
1989
- "raw": null,
1990
- "metric_key": "macro_f1",
1991
- "source": null,
1992
- "scope": "multi_episode_128_metadata_baseline",
1993
- "status": "not_supported_by_metadata_only_package",
1994
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1995
- "normalized_score": null,
1996
- "raw_text": "n/a",
1997
- "status_label": "not supported"
1998
- },
1999
  "cosmos3_nano_future_window": {
2000
  "raw": null,
2001
  "metric_key": "macro_f1",
@@ -2052,6 +2051,28 @@
2052
  "raw_text": "0.1659",
2053
  "status_label": "scored"
2054
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2055
  "raw128_simple": {
2056
  "raw": 0.06469493412657774,
2057
  "metric_key": "micro_f1",
@@ -2074,28 +2095,6 @@
2074
  "raw_text": "0.1752",
2075
  "status_label": "scored"
2076
  },
2077
- "metadata128_simple": {
2078
- "raw": null,
2079
- "metric_key": "micro_f1",
2080
- "source": null,
2081
- "scope": "multi_episode_128_metadata_baseline",
2082
- "status": "not_supported_by_metadata_only_package",
2083
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2084
- "normalized_score": null,
2085
- "raw_text": "n/a",
2086
- "status_label": "not supported"
2087
- },
2088
- "metadata128_neural_mlp": {
2089
- "raw": null,
2090
- "metric_key": "micro_f1",
2091
- "source": null,
2092
- "scope": "multi_episode_128_metadata_baseline",
2093
- "status": "not_supported_by_metadata_only_package",
2094
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2095
- "normalized_score": null,
2096
- "raw_text": "n/a",
2097
- "status_label": "not supported"
2098
- },
2099
  "cosmos3_super_reasoner": {
2100
  "raw": null,
2101
  "metric_key": "micro_f1",
@@ -2152,6 +2151,17 @@
2152
  "raw_text": "0.0426",
2153
  "status_label": "scored"
2154
  },
 
 
 
 
 
 
 
 
 
 
 
2155
  "raw128_simple": {
2156
  "raw": 0.22941437363624573,
2157
  "metric_key": "mae",
@@ -2174,17 +2184,6 @@
2174
  "raw_text": "0.2530",
2175
  "status_label": "scored"
2176
  },
2177
- "metadata128_simple": {
2178
- "raw": null,
2179
- "metric_key": "mae",
2180
- "source": null,
2181
- "scope": "multi_episode_128_metadata_baseline",
2182
- "status": "not_supported_by_metadata_only_package",
2183
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2184
- "normalized_score": null,
2185
- "raw_text": "n/a",
2186
- "status_label": "not supported"
2187
- },
2188
  "metadata128_neural_mlp": {
2189
  "raw": null,
2190
  "metric_key": "mae",
@@ -2263,6 +2262,17 @@
2263
  "raw_text": "0.2409",
2264
  "status_label": "scored"
2265
  },
 
 
 
 
 
 
 
 
 
 
 
2266
  "raw128_simple": {
2267
  "raw": 0.0026625150348991156,
2268
  "metric_key": "mrr",
@@ -2285,17 +2295,6 @@
2285
  "raw_text": "0.0025",
2286
  "status_label": "proxy scored"
2287
  },
2288
- "metadata128_simple": {
2289
- "raw": null,
2290
- "metric_key": "mrr",
2291
- "source": null,
2292
- "scope": "multi_episode_128_metadata_baseline",
2293
- "status": "not_supported_by_metadata_only_package",
2294
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2295
- "normalized_score": null,
2296
- "raw_text": "n/a",
2297
- "status_label": "not supported"
2298
- },
2299
  "metadata128_neural_mlp": {
2300
  "raw": null,
2301
  "metric_key": "mrr",
@@ -2385,6 +2384,28 @@
2385
  "raw_text": "134.07",
2386
  "status_label": "scored"
2387
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2388
  "raw128_simple": {
2389
  "raw": 52.32759475708008,
2390
  "metric_key": "mae",
@@ -2407,28 +2428,6 @@
2407
  "raw_text": "42.37",
2408
  "status_label": "scored"
2409
  },
2410
- "metadata128_simple": {
2411
- "raw": null,
2412
- "metric_key": "mae",
2413
- "source": null,
2414
- "scope": "multi_episode_128_metadata_baseline",
2415
- "status": "not_supported_by_metadata_only_package",
2416
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2417
- "normalized_score": null,
2418
- "raw_text": "n/a",
2419
- "status_label": "not supported"
2420
- },
2421
- "metadata128_neural_mlp": {
2422
- "raw": null,
2423
- "metric_key": "mae",
2424
- "source": null,
2425
- "scope": "multi_episode_128_metadata_baseline",
2426
- "status": "not_supported_by_metadata_only_package",
2427
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2428
- "normalized_score": null,
2429
- "raw_text": "n/a",
2430
- "status_label": "not supported"
2431
- },
2432
  "cosmos3_super_reasoner": {
2433
  "raw": null,
2434
  "metric_key": "mae",
@@ -2459,7 +2458,7 @@
2459
  "id": "metadata128_simple",
2460
  "title": "128ep Metadata Simple",
2461
  "status": "a100_rerun_pass",
2462
- "coverage": "20 records / 8 scored JSONL-supported axes",
2463
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2464
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2465
  },
@@ -2467,7 +2466,7 @@
2467
  "id": "metadata128_neural_mlp",
2468
  "title": "128ep Metadata NN",
2469
  "status": "a100_rerun_pass",
2470
- "coverage": "20 records / 8 scored JSONL-supported axes",
2471
  "headline": "compact MLP heads over metadata/text features",
2472
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2473
  },
@@ -4508,17 +4507,17 @@
4508
  "task_label": "Long-Horizon Next-Action Forecasting",
4509
  "series_id": "metadata128_simple",
4510
  "method": "128ep Metadata Simple",
4511
- "status": "not_supported_by_metadata_only_package",
4512
- "status_label": "not supported",
4513
- "scored": false,
4514
  "proxy_scored": false,
4515
- "raw": null,
4516
- "raw_text": "n/a",
4517
- "normalized_score": null,
4518
  "metric_key": "macro_f1",
4519
- "source": null,
4520
  "scope": "multi_episode_128_metadata_baseline",
4521
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4522
  },
4523
  {
4524
  "task_number": 13,
@@ -4526,17 +4525,17 @@
4526
  "task_label": "Long-Horizon Next-Action Forecasting",
4527
  "series_id": "metadata128_neural_mlp",
4528
  "method": "128ep Metadata NN",
4529
- "status": "not_supported_by_metadata_only_package",
4530
- "status_label": "not supported",
4531
- "scored": false,
4532
  "proxy_scored": false,
4533
- "raw": null,
4534
- "raw_text": "n/a",
4535
- "normalized_score": null,
4536
  "metric_key": "macro_f1",
4537
- "source": null,
4538
  "scope": "multi_episode_128_metadata_baseline",
4539
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4540
  },
4541
  {
4542
  "task_number": 13,
@@ -4670,17 +4669,17 @@
4670
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4671
  "series_id": "metadata128_simple",
4672
  "method": "128ep Metadata Simple",
4673
- "status": "not_supported_by_metadata_only_package",
4674
- "status_label": "not supported",
4675
- "scored": false,
4676
  "proxy_scored": false,
4677
- "raw": null,
4678
- "raw_text": "n/a",
4679
- "normalized_score": null,
4680
  "metric_key": "macro_f1",
4681
- "source": null,
4682
  "scope": "multi_episode_128_metadata_baseline",
4683
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4684
  },
4685
  {
4686
  "task_number": 14,
@@ -4688,17 +4687,17 @@
4688
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4689
  "series_id": "metadata128_neural_mlp",
4690
  "method": "128ep Metadata NN",
4691
- "status": "not_supported_by_metadata_only_package",
4692
- "status_label": "not supported",
4693
- "scored": false,
4694
  "proxy_scored": false,
4695
- "raw": null,
4696
- "raw_text": "n/a",
4697
- "normalized_score": null,
4698
  "metric_key": "macro_f1",
4699
- "source": null,
4700
  "scope": "multi_episode_128_metadata_baseline",
4701
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4702
  },
4703
  {
4704
  "task_number": 14,
@@ -4832,17 +4831,17 @@
4832
  "task_label": "Interaction Text Prediction",
4833
  "series_id": "metadata128_simple",
4834
  "method": "128ep Metadata Simple",
4835
- "status": "not_supported_by_metadata_only_package",
4836
- "status_label": "not supported",
4837
  "scored": false,
4838
  "proxy_scored": false,
4839
  "raw": null,
4840
  "raw_text": "n/a",
4841
  "normalized_score": null,
4842
  "metric_key": "macro_f1",
4843
- "source": null,
4844
  "scope": "multi_episode_128_metadata_baseline",
4845
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4846
  },
4847
  {
4848
  "task_number": 15,
@@ -4994,17 +4993,17 @@
4994
  "task_label": "Action-Object Relation Prediction",
4995
  "series_id": "metadata128_simple",
4996
  "method": "128ep Metadata Simple",
4997
- "status": "not_supported_by_metadata_only_package",
4998
- "status_label": "not supported",
4999
- "scored": false,
5000
  "proxy_scored": false,
5001
- "raw": null,
5002
- "raw_text": "n/a",
5003
- "normalized_score": null,
5004
  "metric_key": "macro_f1",
5005
- "source": null,
5006
  "scope": "multi_episode_128_metadata_baseline",
5007
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5008
  },
5009
  {
5010
  "task_number": 16,
@@ -5012,17 +5011,17 @@
5012
  "task_label": "Action-Object Relation Prediction",
5013
  "series_id": "metadata128_neural_mlp",
5014
  "method": "128ep Metadata NN",
5015
- "status": "not_supported_by_metadata_only_package",
5016
- "status_label": "not supported",
5017
- "scored": false,
5018
  "proxy_scored": false,
5019
- "raw": null,
5020
- "raw_text": "n/a",
5021
- "normalized_score": null,
5022
  "metric_key": "macro_f1",
5023
- "source": null,
5024
  "scope": "multi_episode_128_metadata_baseline",
5025
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5026
  },
5027
  {
5028
  "task_number": 16,
@@ -5156,17 +5155,17 @@
5156
  "task_label": "Future Object-Set Forecasting",
5157
  "series_id": "metadata128_simple",
5158
  "method": "128ep Metadata Simple",
5159
- "status": "not_supported_by_metadata_only_package",
5160
- "status_label": "not supported",
5161
- "scored": false,
5162
  "proxy_scored": false,
5163
- "raw": null,
5164
- "raw_text": "n/a",
5165
- "normalized_score": null,
5166
  "metric_key": "micro_f1",
5167
- "source": null,
5168
  "scope": "multi_episode_128_metadata_baseline",
5169
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5170
  },
5171
  {
5172
  "task_number": 17,
@@ -5174,17 +5173,17 @@
5174
  "task_label": "Future Object-Set Forecasting",
5175
  "series_id": "metadata128_neural_mlp",
5176
  "method": "128ep Metadata NN",
5177
- "status": "not_supported_by_metadata_only_package",
5178
- "status_label": "not supported",
5179
- "scored": false,
5180
  "proxy_scored": false,
5181
- "raw": null,
5182
- "raw_text": "n/a",
5183
- "normalized_score": null,
5184
  "metric_key": "micro_f1",
5185
- "source": null,
5186
  "scope": "multi_episode_128_metadata_baseline",
5187
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5188
  },
5189
  {
5190
  "task_number": 17,
@@ -5318,17 +5317,17 @@
5318
  "task_label": "IMU-to-Hand Pose Reconstruction",
5319
  "series_id": "metadata128_simple",
5320
  "method": "128ep Metadata Simple",
5321
- "status": "not_supported_by_metadata_only_package",
5322
- "status_label": "not supported",
5323
  "scored": false,
5324
  "proxy_scored": false,
5325
  "raw": null,
5326
  "raw_text": "n/a",
5327
  "normalized_score": null,
5328
  "metric_key": "mae",
5329
- "source": null,
5330
  "scope": "multi_episode_128_metadata_baseline",
5331
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5332
  },
5333
  {
5334
  "task_number": 18,
@@ -5480,17 +5479,17 @@
5480
  "task_label": "Camera-View Synchronization Retrieval",
5481
  "series_id": "metadata128_simple",
5482
  "method": "128ep Metadata Simple",
5483
- "status": "not_supported_by_metadata_only_package",
5484
- "status_label": "not supported",
5485
  "scored": false,
5486
  "proxy_scored": false,
5487
  "raw": null,
5488
  "raw_text": "n/a",
5489
  "normalized_score": null,
5490
  "metric_key": "mrr",
5491
- "source": null,
5492
  "scope": "multi_episode_128_metadata_baseline",
5493
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5494
  },
5495
  {
5496
  "task_number": 19,
@@ -5642,17 +5641,17 @@
5642
  "task_label": "Time-to-Next-Transition Regression",
5643
  "series_id": "metadata128_simple",
5644
  "method": "128ep Metadata Simple",
5645
- "status": "not_supported_by_metadata_only_package",
5646
- "status_label": "not supported",
5647
- "scored": false,
5648
  "proxy_scored": false,
5649
- "raw": null,
5650
- "raw_text": "n/a",
5651
- "normalized_score": null,
5652
  "metric_key": "mae",
5653
- "source": null,
5654
  "scope": "multi_episode_128_metadata_baseline",
5655
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5656
  },
5657
  {
5658
  "task_number": 20,
@@ -5660,17 +5659,17 @@
5660
  "task_label": "Time-to-Next-Transition Regression",
5661
  "series_id": "metadata128_neural_mlp",
5662
  "method": "128ep Metadata NN",
5663
- "status": "not_supported_by_metadata_only_package",
5664
- "status_label": "not supported",
5665
- "scored": false,
5666
  "proxy_scored": false,
5667
- "raw": null,
5668
- "raw_text": "n/a",
5669
- "normalized_score": null,
5670
  "metric_key": "mae",
5671
- "source": null,
5672
  "scope": "multi_episode_128_metadata_baseline",
5673
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5674
  },
5675
  {
5676
  "task_number": 20,
 
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 133,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
 
73
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
+ "scored_task_count": 13,
77
+ "covered_task_count": 13,
78
  "proxy_scored_task_count": 0,
79
+ "scoreless_task_count": 7,
80
+ "unsupported_task_count": 7,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
+ "scored": 13,
84
+ "unsupported_without_required_target": 7
 
85
  },
86
+ "coverage_fraction": 0.65,
87
  "result_record_fraction": 1.0
88
  },
89
  {
 
97
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
98
  "plotted_as": "colored point overlay",
99
  "result_record_count": 20,
100
+ "scored_task_count": 13,
101
+ "covered_task_count": 13,
102
  "proxy_scored_task_count": 0,
103
+ "scoreless_task_count": 7,
104
+ "unsupported_task_count": 7,
105
  "not_evaluated_task_count": 0,
106
  "status_counts": {
107
+ "not_supported_by_metadata_only_package": 7,
108
+ "scored": 13
109
  },
110
+ "coverage_fraction": 0.65,
111
  "result_record_fraction": 1.0
112
  },
113
  {
 
1607
  "raw_text": "0.0023",
1608
  "status_label": "scored"
1609
  },
1610
+ "metadata128_simple": {
1611
+ "raw": 0.004579592783699693,
1612
+ "metric_key": "macro_f1",
1613
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1614
+ "scope": "multi_episode_128_metadata_baseline",
1615
+ "status": "scored",
1616
+ "reason": null,
1617
+ "normalized_score": 0.004579592783699693,
1618
+ "raw_text": "0.0046",
1619
+ "status_label": "scored"
1620
+ },
1621
+ "metadata128_neural_mlp": {
1622
+ "raw": 0.0029821307969142615,
1623
+ "metric_key": "macro_f1",
1624
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1625
+ "scope": "multi_episode_128_metadata_baseline",
1626
+ "status": "scored",
1627
+ "reason": null,
1628
+ "normalized_score": 0.0029821307969142615,
1629
+ "raw_text": "0.0030",
1630
+ "status_label": "scored"
1631
+ },
1632
  "raw128_simple": {
1633
  "raw": 0.0024280172369056294,
1634
  "metric_key": "macro_f1",
 
1651
  "raw_text": "0.0011",
1652
  "status_label": "scored"
1653
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1654
  "cosmos3_super_reasoner": {
1655
  "raw": null,
1656
  "metric_key": "macro_f1",
 
1718
  "raw_text": "0.0042",
1719
  "status_label": "scored"
1720
  },
1721
+ "metadata128_simple": {
1722
+ "raw": 0.0001206030150753769,
1723
+ "metric_key": "macro_f1",
1724
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1725
+ "scope": "multi_episode_128_metadata_baseline",
1726
+ "status": "scored",
1727
+ "reason": null,
1728
+ "normalized_score": 0.0001206030150753769,
1729
+ "raw_text": "0.0001",
1730
+ "status_label": "scored"
1731
+ },
1732
+ "metadata128_neural_mlp": {
1733
+ "raw": 2.086049543676662e-05,
1734
+ "metric_key": "macro_f1",
1735
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1736
+ "scope": "multi_episode_128_metadata_baseline",
1737
+ "status": "scored",
1738
+ "reason": null,
1739
+ "normalized_score": 2.086049543676662e-05,
1740
+ "raw_text": "0.0000",
1741
+ "status_label": "scored"
1742
+ },
1743
  "raw128_simple": {
1744
  "raw": 0.0,
1745
  "metric_key": "macro_f1",
 
1762
  "raw_text": "0.0000",
1763
  "status_label": "scored"
1764
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1765
  "cosmos3_super_reasoner": {
1766
  "raw": null,
1767
  "metric_key": "macro_f1",
 
1818
  "raw_text": "0.0381",
1819
  "status_label": "scored"
1820
  },
1821
+ "metadata128_simple": {
1822
+ "raw": null,
1823
+ "metric_key": "macro_f1",
1824
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1825
+ "scope": "multi_episode_128_metadata_baseline",
1826
+ "status": "unsupported_without_required_target",
1827
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1828
+ "normalized_score": null,
1829
+ "raw_text": "n/a",
1830
+ "status_label": "unsupported"
1831
+ },
1832
  "raw128_simple": {
1833
  "raw": 0.012611998261547169,
1834
  "metric_key": "macro_f1",
 
1851
  "raw_text": "0.0098",
1852
  "status_label": "proxy scored"
1853
  },
 
 
 
 
 
 
 
 
 
 
 
1854
  "metadata128_neural_mlp": {
1855
  "raw": null,
1856
  "metric_key": "macro_f1",
 
1951
  "raw_text": "0.0000",
1952
  "status_label": "scored"
1953
  },
1954
+ "metadata128_simple": {
1955
+ "raw": 0.0,
1956
+ "metric_key": "macro_f1",
1957
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1958
+ "scope": "multi_episode_128_metadata_baseline",
1959
+ "status": "scored",
1960
+ "reason": null,
1961
+ "normalized_score": 0.0,
1962
+ "raw_text": "0.0000",
1963
+ "status_label": "scored"
1964
+ },
1965
+ "metadata128_neural_mlp": {
1966
+ "raw": 0.0,
1967
+ "metric_key": "macro_f1",
1968
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1969
+ "scope": "multi_episode_128_metadata_baseline",
1970
+ "status": "scored",
1971
+ "reason": null,
1972
+ "normalized_score": 0.0,
1973
+ "raw_text": "0.0000",
1974
+ "status_label": "scored"
1975
+ },
1976
  "raw128_simple": {
1977
  "raw": 0.0,
1978
  "metric_key": "macro_f1",
 
1995
  "raw_text": "0.0000",
1996
  "status_label": "scored"
1997
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1998
  "cosmos3_nano_future_window": {
1999
  "raw": null,
2000
  "metric_key": "macro_f1",
 
2051
  "raw_text": "0.1659",
2052
  "status_label": "scored"
2053
  },
2054
+ "metadata128_simple": {
2055
+ "raw": 0.17656983343047333,
2056
+ "metric_key": "micro_f1",
2057
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2058
+ "scope": "multi_episode_128_metadata_baseline",
2059
+ "status": "scored",
2060
+ "reason": null,
2061
+ "normalized_score": 0.17656983343047333,
2062
+ "raw_text": "0.1766",
2063
+ "status_label": "scored"
2064
+ },
2065
+ "metadata128_neural_mlp": {
2066
+ "raw": 0.17418550827844048,
2067
+ "metric_key": "micro_f1",
2068
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2069
+ "scope": "multi_episode_128_metadata_baseline",
2070
+ "status": "scored",
2071
+ "reason": null,
2072
+ "normalized_score": 0.17418550827844048,
2073
+ "raw_text": "0.1742",
2074
+ "status_label": "scored"
2075
+ },
2076
  "raw128_simple": {
2077
  "raw": 0.06469493412657774,
2078
  "metric_key": "micro_f1",
 
2095
  "raw_text": "0.1752",
2096
  "status_label": "scored"
2097
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2098
  "cosmos3_super_reasoner": {
2099
  "raw": null,
2100
  "metric_key": "micro_f1",
 
2151
  "raw_text": "0.0426",
2152
  "status_label": "scored"
2153
  },
2154
+ "metadata128_simple": {
2155
+ "raw": null,
2156
+ "metric_key": "mae",
2157
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
2158
+ "scope": "multi_episode_128_metadata_baseline",
2159
+ "status": "unsupported_without_required_target",
2160
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
2161
+ "normalized_score": null,
2162
+ "raw_text": "n/a",
2163
+ "status_label": "unsupported"
2164
+ },
2165
  "raw128_simple": {
2166
  "raw": 0.22941437363624573,
2167
  "metric_key": "mae",
 
2184
  "raw_text": "0.2530",
2185
  "status_label": "scored"
2186
  },
 
 
 
 
 
 
 
 
 
 
 
2187
  "metadata128_neural_mlp": {
2188
  "raw": null,
2189
  "metric_key": "mae",
 
2262
  "raw_text": "0.2409",
2263
  "status_label": "scored"
2264
  },
2265
+ "metadata128_simple": {
2266
+ "raw": null,
2267
+ "metric_key": "mrr",
2268
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
2269
+ "scope": "multi_episode_128_metadata_baseline",
2270
+ "status": "unsupported_without_required_target",
2271
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
2272
+ "normalized_score": null,
2273
+ "raw_text": "n/a",
2274
+ "status_label": "unsupported"
2275
+ },
2276
  "raw128_simple": {
2277
  "raw": 0.0026625150348991156,
2278
  "metric_key": "mrr",
 
2295
  "raw_text": "0.0025",
2296
  "status_label": "proxy scored"
2297
  },
 
 
 
 
 
 
 
 
 
 
 
2298
  "metadata128_neural_mlp": {
2299
  "raw": null,
2300
  "metric_key": "mrr",
 
2384
  "raw_text": "134.07",
2385
  "status_label": "scored"
2386
  },
2387
+ "metadata128_simple": {
2388
+ "raw": 624.8108520507812,
2389
+ "metric_key": "mae",
2390
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
2391
+ "scope": "multi_episode_128_metadata_baseline",
2392
+ "status": "scored",
2393
+ "reason": null,
2394
+ "normalized_score": 0.016864874132806403,
2395
+ "raw_text": "624.81",
2396
+ "status_label": "scored"
2397
+ },
2398
+ "metadata128_neural_mlp": {
2399
+ "raw": 41.4664421081543,
2400
+ "metric_key": "mae",
2401
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
2402
+ "scope": "multi_episode_128_metadata_baseline",
2403
+ "status": "scored",
2404
+ "reason": null,
2405
+ "normalized_score": 0.25411768748242325,
2406
+ "raw_text": "41.47",
2407
+ "status_label": "scored"
2408
+ },
2409
  "raw128_simple": {
2410
  "raw": 52.32759475708008,
2411
  "metric_key": "mae",
 
2428
  "raw_text": "42.37",
2429
  "status_label": "scored"
2430
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2431
  "cosmos3_super_reasoner": {
2432
  "raw": null,
2433
  "metric_key": "mae",
 
2458
  "id": "metadata128_simple",
2459
  "title": "128ep Metadata Simple",
2460
  "status": "a100_rerun_pass",
2461
+ "coverage": "20 records / 13 scored JSONL-supported axes",
2462
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2463
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2464
  },
 
2466
  "id": "metadata128_neural_mlp",
2467
  "title": "128ep Metadata NN",
2468
  "status": "a100_rerun_pass",
2469
+ "coverage": "20 records / 13 scored JSONL-supported axes",
2470
  "headline": "compact MLP heads over metadata/text features",
2471
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2472
  },
 
4507
  "task_label": "Long-Horizon Next-Action Forecasting",
4508
  "series_id": "metadata128_simple",
4509
  "method": "128ep Metadata Simple",
4510
+ "status": "scored",
4511
+ "status_label": "scored",
4512
+ "scored": true,
4513
  "proxy_scored": false,
4514
+ "raw": 0.004579592783699693,
4515
+ "raw_text": "0.0046",
4516
+ "normalized_score": 0.004579592783699693,
4517
  "metric_key": "macro_f1",
4518
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
4519
  "scope": "multi_episode_128_metadata_baseline",
4520
+ "reason": null
4521
  },
4522
  {
4523
  "task_number": 13,
 
4525
  "task_label": "Long-Horizon Next-Action Forecasting",
4526
  "series_id": "metadata128_neural_mlp",
4527
  "method": "128ep Metadata NN",
4528
+ "status": "scored",
4529
+ "status_label": "scored",
4530
+ "scored": true,
4531
  "proxy_scored": false,
4532
+ "raw": 0.0029821307969142615,
4533
+ "raw_text": "0.0030",
4534
+ "normalized_score": 0.0029821307969142615,
4535
  "metric_key": "macro_f1",
4536
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
4537
  "scope": "multi_episode_128_metadata_baseline",
4538
+ "reason": null
4539
  },
4540
  {
4541
  "task_number": 13,
 
4669
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4670
  "series_id": "metadata128_simple",
4671
  "method": "128ep Metadata Simple",
4672
+ "status": "scored",
4673
+ "status_label": "scored",
4674
+ "scored": true,
4675
  "proxy_scored": false,
4676
+ "raw": 0.0001206030150753769,
4677
+ "raw_text": "0.0001",
4678
+ "normalized_score": 0.0001206030150753769,
4679
  "metric_key": "macro_f1",
4680
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
4681
  "scope": "multi_episode_128_metadata_baseline",
4682
+ "reason": null
4683
  },
4684
  {
4685
  "task_number": 14,
 
4687
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4688
  "series_id": "metadata128_neural_mlp",
4689
  "method": "128ep Metadata NN",
4690
+ "status": "scored",
4691
+ "status_label": "scored",
4692
+ "scored": true,
4693
  "proxy_scored": false,
4694
+ "raw": 2.086049543676662e-05,
4695
+ "raw_text": "0.0000",
4696
+ "normalized_score": 2.086049543676662e-05,
4697
  "metric_key": "macro_f1",
4698
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
4699
  "scope": "multi_episode_128_metadata_baseline",
4700
+ "reason": null
4701
  },
4702
  {
4703
  "task_number": 14,
 
4831
  "task_label": "Interaction Text Prediction",
4832
  "series_id": "metadata128_simple",
4833
  "method": "128ep Metadata Simple",
4834
+ "status": "unsupported_without_required_target",
4835
+ "status_label": "unsupported",
4836
  "scored": false,
4837
  "proxy_scored": false,
4838
  "raw": null,
4839
  "raw_text": "n/a",
4840
  "normalized_score": null,
4841
  "metric_key": "macro_f1",
4842
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
4843
  "scope": "multi_episode_128_metadata_baseline",
4844
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
4845
  },
4846
  {
4847
  "task_number": 15,
 
4993
  "task_label": "Action-Object Relation Prediction",
4994
  "series_id": "metadata128_simple",
4995
  "method": "128ep Metadata Simple",
4996
+ "status": "scored",
4997
+ "status_label": "scored",
4998
+ "scored": true,
4999
  "proxy_scored": false,
5000
+ "raw": 0.0,
5001
+ "raw_text": "0.0000",
5002
+ "normalized_score": 0.0,
5003
  "metric_key": "macro_f1",
5004
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
5005
  "scope": "multi_episode_128_metadata_baseline",
5006
+ "reason": null
5007
  },
5008
  {
5009
  "task_number": 16,
 
5011
  "task_label": "Action-Object Relation Prediction",
5012
  "series_id": "metadata128_neural_mlp",
5013
  "method": "128ep Metadata NN",
5014
+ "status": "scored",
5015
+ "status_label": "scored",
5016
+ "scored": true,
5017
  "proxy_scored": false,
5018
+ "raw": 0.0,
5019
+ "raw_text": "0.0000",
5020
+ "normalized_score": 0.0,
5021
  "metric_key": "macro_f1",
5022
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
5023
  "scope": "multi_episode_128_metadata_baseline",
5024
+ "reason": null
5025
  },
5026
  {
5027
  "task_number": 16,
 
5155
  "task_label": "Future Object-Set Forecasting",
5156
  "series_id": "metadata128_simple",
5157
  "method": "128ep Metadata Simple",
5158
+ "status": "scored",
5159
+ "status_label": "scored",
5160
+ "scored": true,
5161
  "proxy_scored": false,
5162
+ "raw": 0.17656983343047333,
5163
+ "raw_text": "0.1766",
5164
+ "normalized_score": 0.17656983343047333,
5165
  "metric_key": "micro_f1",
5166
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
5167
  "scope": "multi_episode_128_metadata_baseline",
5168
+ "reason": null
5169
  },
5170
  {
5171
  "task_number": 17,
 
5173
  "task_label": "Future Object-Set Forecasting",
5174
  "series_id": "metadata128_neural_mlp",
5175
  "method": "128ep Metadata NN",
5176
+ "status": "scored",
5177
+ "status_label": "scored",
5178
+ "scored": true,
5179
  "proxy_scored": false,
5180
+ "raw": 0.17418550827844048,
5181
+ "raw_text": "0.1742",
5182
+ "normalized_score": 0.17418550827844048,
5183
  "metric_key": "micro_f1",
5184
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
5185
  "scope": "multi_episode_128_metadata_baseline",
5186
+ "reason": null
5187
  },
5188
  {
5189
  "task_number": 17,
 
5317
  "task_label": "IMU-to-Hand Pose Reconstruction",
5318
  "series_id": "metadata128_simple",
5319
  "method": "128ep Metadata Simple",
5320
+ "status": "unsupported_without_required_target",
5321
+ "status_label": "unsupported",
5322
  "scored": false,
5323
  "proxy_scored": false,
5324
  "raw": null,
5325
  "raw_text": "n/a",
5326
  "normalized_score": null,
5327
  "metric_key": "mae",
5328
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
5329
  "scope": "multi_episode_128_metadata_baseline",
5330
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
5331
  },
5332
  {
5333
  "task_number": 18,
 
5479
  "task_label": "Camera-View Synchronization Retrieval",
5480
  "series_id": "metadata128_simple",
5481
  "method": "128ep Metadata Simple",
5482
+ "status": "unsupported_without_required_target",
5483
+ "status_label": "unsupported",
5484
  "scored": false,
5485
  "proxy_scored": false,
5486
  "raw": null,
5487
  "raw_text": "n/a",
5488
  "normalized_score": null,
5489
  "metric_key": "mrr",
5490
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
5491
  "scope": "multi_episode_128_metadata_baseline",
5492
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
5493
  },
5494
  {
5495
  "task_number": 19,
 
5641
  "task_label": "Time-to-Next-Transition Regression",
5642
  "series_id": "metadata128_simple",
5643
  "method": "128ep Metadata Simple",
5644
+ "status": "scored",
5645
+ "status_label": "scored",
5646
+ "scored": true,
5647
  "proxy_scored": false,
5648
+ "raw": 624.8108520507812,
5649
+ "raw_text": "624.81",
5650
+ "normalized_score": 0.016864874132806403,
5651
  "metric_key": "mae",
5652
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
5653
  "scope": "multi_episode_128_metadata_baseline",
5654
+ "reason": null
5655
  },
5656
  {
5657
  "task_number": 20,
 
5659
  "task_label": "Time-to-Next-Transition Regression",
5660
  "series_id": "metadata128_neural_mlp",
5661
  "method": "128ep Metadata NN",
5662
+ "status": "scored",
5663
+ "status_label": "scored",
5664
+ "scored": true,
5665
  "proxy_scored": false,
5666
+ "raw": 41.4664421081543,
5667
+ "raw_text": "41.47",
5668
+ "normalized_score": 0.25411768748242325,
5669
  "metric_key": "mae",
5670
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
5671
  "scope": "multi_episode_128_metadata_baseline",
5672
+ "reason": null
5673
  },
5674
  {
5675
  "task_number": 20,
data/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:41:43+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
- "bytes": 116109,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -316,7 +316,7 @@
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
- "bytes": 187099,
320
  "top_level_type": "dict"
321
  },
322
  {
@@ -486,12 +486,12 @@
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
- "bytes": 50687,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
- "bytes": 129600,
495
  "top_level_type": "dict"
496
  },
497
  {
@@ -526,7 +526,7 @@
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
- "bytes": 230951,
530
  "top_level_type": "dict"
531
  },
532
  {
@@ -571,7 +571,7 @@
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
- "bytes": 44825,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
@@ -641,7 +641,7 @@
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
- "bytes": 50841,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:46+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
+ "bytes": 116110,
305
  "top_level_type": "dict"
306
  },
307
  {
 
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
+ "bytes": 186443,
320
  "top_level_type": "dict"
321
  },
322
  {
 
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
+ "bytes": 46902,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
+ "bytes": 129242,
495
  "top_level_type": "dict"
496
  },
497
  {
 
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
+ "bytes": 230297,
530
  "top_level_type": "dict"
531
  },
532
  {
 
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
+ "bytes": 45937,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
 
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
+ "bytes": 51953,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
docs/data/artifact_index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-18T11:16:44+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
@@ -290,8 +290,8 @@
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
- "bytes": 58012,
294
- "sha256": "a95cdde097b11f83023c758c807f031c3d4cb3fde20d42ed314565440cc68374"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
@@ -599,7 +599,7 @@
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
- "sha256": "8494b6983100acdfde9b5929e871b27120897af8ec7b5a3031aa142b598a09ae"
603
  },
604
  {
605
  "id": "source_alignment_validator",
@@ -719,8 +719,8 @@
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
- "bytes": 230951,
723
- "sha256": "8aaed21d08943f2dc53c5160e27872bc4f7f8a405d7289cdaaf7b00d867b84d8"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
@@ -731,7 +731,7 @@
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
  "bytes": 50973,
734
- "sha256": "d20637e6a17390f7fd44589ff37cb1889318bc39c2259dca6bb7f1a43d8ea26b"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
@@ -741,8 +741,8 @@
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
- "bytes": 187099,
745
- "sha256": "bf2b3fdeb9713a9d4cba0e8645c24c325b88e939cb94f4718a9d3c2db03e2bb3"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
@@ -752,8 +752,8 @@
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
- "bytes": 129600,
756
- "sha256": "30fd572521991fd7f5741411d91a40d3d442032f001841f9fd1a4e7381eb73d2"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
@@ -763,8 +763,8 @@
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
- "bytes": 4128,
767
- "sha256": "89c73da7db81d2c5f6eb4a16c828531a589ac44cabba2c0c95b171b6ad2060d6"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
@@ -774,8 +774,8 @@
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
- "bytes": 50687,
778
- "sha256": "2cdaa06f9c140a2e194675a3383be341acb1f6e07ddecfa7017cdbe34d704282"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
@@ -785,8 +785,8 @@
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
- "bytes": 14421,
789
- "sha256": "125e658010284dc48570fa7c6a7676e4013d30dd1f22deb24d369e7085a7b700"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
@@ -796,8 +796,8 @@
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
- "bytes": 50841,
800
- "sha256": "e5fa2420fc5ed905953e71ef8978ad1ee794c0daf06a7f0ff10374db7f291c72"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
@@ -818,8 +818,8 @@
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
- "bytes": 44825,
822
- "sha256": "50b5d87fca4aba303a8440f5ef53470ed493e9f1251cb5edeb16bac90038a11b"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
@@ -906,8 +906,8 @@
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
- "bytes": 50297,
910
- "sha256": "1c1710bcf340ece479e321f19d4cb8302fe369a1103b4584a15853fe73dc226c"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
@@ -1105,7 +1105,7 @@
1105
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1106
  "exists": true,
1107
  "bytes": 8100,
1108
- "sha256": "6549b0f8da6c3742c72b12b71900db1b89455cd34d5befcdf9d249b4adebbd1a"
1109
  },
1110
  {
1111
  "id": "public_surface_qa",
@@ -1310,7 +1310,7 @@
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
- "bytes": 983979,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
@@ -1322,7 +1322,7 @@
1322
  "volatile": true,
1323
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
1324
  "exists": true,
1325
- "bytes": 20022,
1326
  "hash_policy": "existence_and_size_only"
1327
  },
1328
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
 
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
+ "bytes": 73236,
294
+ "sha256": "76acae0de25d51413e7e6f11021163e7d9909cfe95d65bf6b02e74043d429e2d"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
 
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
+ "sha256": "ae089cc0df132b63365e03b2157a488b5d1569567c0374d7621bcd347da62c9e"
603
  },
604
  {
605
  "id": "source_alignment_validator",
 
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
+ "bytes": 230297,
723
+ "sha256": "437874b1633e73165e3300f55580394663a44759c848288e696859b98f8aad32"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
 
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
  "bytes": 50973,
734
+ "sha256": "38cb43512f2ac40feeb62333bdea89b3a55e5b48468beb8982cf22536f794ecf"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
 
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
+ "bytes": 186443,
745
+ "sha256": "55e758e8703f406889022976d0ba055181212305c9a7246e899463e0c3c3b554"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
 
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
+ "bytes": 129242,
756
+ "sha256": "64fb700d51f536edf11291799b6173cf9ae8dd7a41178aac348b8207ed4b1e42"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
 
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
+ "bytes": 4026,
767
+ "sha256": "55e949fc30419a52f7f5ec4dd9544a11b253b076f8e3637ec3e92b3d61a89aab"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
 
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
+ "bytes": 46902,
778
+ "sha256": "2b64dbd013625852679f9b91d25c48d1ed197fec727883b4fe37088b2d594784"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
 
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
+ "bytes": 13387,
789
+ "sha256": "d33461eb704f8e92545b6b54d9fc509e617fbacc9ca9894ac851ca9c3dec0fec"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
 
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
+ "bytes": 51953,
800
+ "sha256": "19c001f10319946ef0e4921064f8a012836f29e7c8b272f900c257169faf46a1"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
 
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
+ "bytes": 45937,
822
+ "sha256": "b504b1b9c5cad0caa8c822d5bb2971c1b708251cf7b9ef587a92db2c12751e97"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
 
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
+ "bytes": 109248,
910
+ "sha256": "5e7f3085be5012eb3dda46f9c7b5b7c0ae22d6a0fbce71d6e99dd317fecc12af"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
 
1105
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1106
  "exists": true,
1107
  "bytes": 8100,
1108
+ "sha256": "7800195093b8b81b49c87cdcbcebe601de8141c0c9d8b4490b98f539cb132725"
1109
  },
1110
  {
1111
  "id": "public_surface_qa",
 
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
+ "bytes": 994053,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
 
1322
  "volatile": true,
1323
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
1324
  "exists": true,
1325
+ "bytes": 20021,
1326
  "hash_policy": "existence_and_size_only"
1327
  },
1328
  {
docs/data/episode128_task_model_radar.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
- "scored_method_task_count": 83,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -30,18 +30,17 @@
30
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
- "scored_task_count": 8,
34
- "covered_task_count": 8,
35
  "proxy_scored_task_count": 0,
36
- "scoreless_task_count": 12,
37
- "unsupported_task_count": 12,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
- "not_supported_by_metadata_only_package": 8,
41
- "scored": 8,
42
- "unsupported_without_required_target": 4
43
  },
44
- "coverage_fraction": 0.4,
45
  "result_record_fraction": 1.0
46
  },
47
  {
@@ -55,17 +54,17 @@
55
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
56
  "plotted_as": "colored point overlay",
57
  "result_record_count": 20,
58
- "scored_task_count": 8,
59
- "covered_task_count": 8,
60
  "proxy_scored_task_count": 0,
61
- "scoreless_task_count": 12,
62
- "unsupported_task_count": 12,
63
  "not_evaluated_task_count": 0,
64
  "status_counts": {
65
- "not_supported_by_metadata_only_package": 12,
66
- "scored": 8
67
  },
68
- "coverage_fraction": 0.4,
69
  "result_record_fraction": 1.0
70
  },
71
  {
@@ -1295,26 +1294,26 @@
1295
  "raw128_proxy_axis": false,
1296
  "values": {
1297
  "metadata128_simple": {
1298
- "raw": null,
1299
  "metric_key": "macro_f1",
1300
- "source": null,
1301
  "scope": "multi_episode_128_metadata_baseline",
1302
- "status": "not_supported_by_metadata_only_package",
1303
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1304
- "normalized_score": null,
1305
- "raw_text": "n/a",
1306
- "status_label": "not supported"
1307
  },
1308
  "metadata128_neural_mlp": {
1309
- "raw": null,
1310
  "metric_key": "macro_f1",
1311
- "source": null,
1312
  "scope": "multi_episode_128_metadata_baseline",
1313
- "status": "not_supported_by_metadata_only_package",
1314
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1315
- "normalized_score": null,
1316
- "raw_text": "n/a",
1317
- "status_label": "not supported"
1318
  },
1319
  "raw128_simple": {
1320
  "raw": 0.0024280172369056294,
@@ -1386,26 +1385,26 @@
1386
  "raw128_proxy_axis": false,
1387
  "values": {
1388
  "metadata128_simple": {
1389
- "raw": null,
1390
  "metric_key": "macro_f1",
1391
- "source": null,
1392
  "scope": "multi_episode_128_metadata_baseline",
1393
- "status": "not_supported_by_metadata_only_package",
1394
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1395
- "normalized_score": null,
1396
- "raw_text": "n/a",
1397
- "status_label": "not supported"
1398
  },
1399
  "metadata128_neural_mlp": {
1400
- "raw": null,
1401
  "metric_key": "macro_f1",
1402
- "source": null,
1403
  "scope": "multi_episode_128_metadata_baseline",
1404
- "status": "not_supported_by_metadata_only_package",
1405
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1406
- "normalized_score": null,
1407
- "raw_text": "n/a",
1408
- "status_label": "not supported"
1409
  },
1410
  "raw128_simple": {
1411
  "raw": 0.0,
@@ -1479,13 +1478,13 @@
1479
  "metadata128_simple": {
1480
  "raw": null,
1481
  "metric_key": "macro_f1",
1482
- "source": null,
1483
  "scope": "multi_episode_128_metadata_baseline",
1484
- "status": "not_supported_by_metadata_only_package",
1485
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1486
  "normalized_score": null,
1487
  "raw_text": "n/a",
1488
- "status_label": "not supported"
1489
  },
1490
  "metadata128_neural_mlp": {
1491
  "raw": null,
@@ -1568,26 +1567,26 @@
1568
  "raw128_proxy_axis": false,
1569
  "values": {
1570
  "metadata128_simple": {
1571
- "raw": null,
1572
  "metric_key": "macro_f1",
1573
- "source": null,
1574
  "scope": "multi_episode_128_metadata_baseline",
1575
- "status": "not_supported_by_metadata_only_package",
1576
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1577
- "normalized_score": null,
1578
- "raw_text": "n/a",
1579
- "status_label": "not supported"
1580
  },
1581
  "metadata128_neural_mlp": {
1582
- "raw": null,
1583
  "metric_key": "macro_f1",
1584
- "source": null,
1585
  "scope": "multi_episode_128_metadata_baseline",
1586
- "status": "not_supported_by_metadata_only_package",
1587
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1588
- "normalized_score": null,
1589
- "raw_text": "n/a",
1590
- "status_label": "not supported"
1591
  },
1592
  "raw128_simple": {
1593
  "raw": 0.0,
@@ -1659,26 +1658,26 @@
1659
  "raw128_proxy_axis": false,
1660
  "values": {
1661
  "metadata128_simple": {
1662
- "raw": null,
1663
  "metric_key": "micro_f1",
1664
- "source": null,
1665
  "scope": "multi_episode_128_metadata_baseline",
1666
- "status": "not_supported_by_metadata_only_package",
1667
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1668
- "normalized_score": null,
1669
- "raw_text": "n/a",
1670
- "status_label": "not supported"
1671
  },
1672
  "metadata128_neural_mlp": {
1673
- "raw": null,
1674
  "metric_key": "micro_f1",
1675
- "source": null,
1676
  "scope": "multi_episode_128_metadata_baseline",
1677
- "status": "not_supported_by_metadata_only_package",
1678
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1679
- "normalized_score": null,
1680
- "raw_text": "n/a",
1681
- "status_label": "not supported"
1682
  },
1683
  "raw128_simple": {
1684
  "raw": 0.06469493412657774,
@@ -1752,13 +1751,13 @@
1752
  "metadata128_simple": {
1753
  "raw": null,
1754
  "metric_key": "mae",
1755
- "source": null,
1756
  "scope": "multi_episode_128_metadata_baseline",
1757
- "status": "not_supported_by_metadata_only_package",
1758
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1759
  "normalized_score": null,
1760
  "raw_text": "n/a",
1761
- "status_label": "not supported"
1762
  },
1763
  "metadata128_neural_mlp": {
1764
  "raw": null,
@@ -1843,13 +1842,13 @@
1843
  "metadata128_simple": {
1844
  "raw": null,
1845
  "metric_key": "mrr",
1846
- "source": null,
1847
  "scope": "multi_episode_128_metadata_baseline",
1848
- "status": "not_supported_by_metadata_only_package",
1849
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1850
  "normalized_score": null,
1851
  "raw_text": "n/a",
1852
- "status_label": "not supported"
1853
  },
1854
  "metadata128_neural_mlp": {
1855
  "raw": null,
@@ -1932,26 +1931,26 @@
1932
  "raw128_proxy_axis": false,
1933
  "values": {
1934
  "metadata128_simple": {
1935
- "raw": null,
1936
  "metric_key": "mae",
1937
- "source": null,
1938
  "scope": "multi_episode_128_metadata_baseline",
1939
- "status": "not_supported_by_metadata_only_package",
1940
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1941
- "normalized_score": null,
1942
- "raw_text": "n/a",
1943
- "status_label": "not supported"
1944
  },
1945
  "metadata128_neural_mlp": {
1946
- "raw": null,
1947
  "metric_key": "mae",
1948
- "source": null,
1949
  "scope": "multi_episode_128_metadata_baseline",
1950
- "status": "not_supported_by_metadata_only_package",
1951
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1952
- "normalized_score": null,
1953
- "raw_text": "n/a",
1954
- "status_label": "not supported"
1955
  },
1956
  "raw128_simple": {
1957
  "raw": 52.32759475708008,
@@ -3530,17 +3529,17 @@
3530
  "task_label": "Long-Horizon Next-Action Forecasting",
3531
  "series_id": "metadata128_simple",
3532
  "method": "128ep Metadata Simple",
3533
- "status": "not_supported_by_metadata_only_package",
3534
- "status_label": "not supported",
3535
- "scored": false,
3536
  "proxy_scored": false,
3537
- "raw": null,
3538
- "raw_text": "n/a",
3539
- "normalized_score": null,
3540
  "metric_key": "macro_f1",
3541
- "source": null,
3542
  "scope": "multi_episode_128_metadata_baseline",
3543
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3544
  },
3545
  {
3546
  "task_number": 13,
@@ -3548,17 +3547,17 @@
3548
  "task_label": "Long-Horizon Next-Action Forecasting",
3549
  "series_id": "metadata128_neural_mlp",
3550
  "method": "128ep Metadata NN",
3551
- "status": "not_supported_by_metadata_only_package",
3552
- "status_label": "not supported",
3553
- "scored": false,
3554
  "proxy_scored": false,
3555
- "raw": null,
3556
- "raw_text": "n/a",
3557
- "normalized_score": null,
3558
  "metric_key": "macro_f1",
3559
- "source": null,
3560
  "scope": "multi_episode_128_metadata_baseline",
3561
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3562
  },
3563
  {
3564
  "task_number": 13,
@@ -3656,17 +3655,17 @@
3656
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3657
  "series_id": "metadata128_simple",
3658
  "method": "128ep Metadata Simple",
3659
- "status": "not_supported_by_metadata_only_package",
3660
- "status_label": "not supported",
3661
- "scored": false,
3662
  "proxy_scored": false,
3663
- "raw": null,
3664
- "raw_text": "n/a",
3665
- "normalized_score": null,
3666
  "metric_key": "macro_f1",
3667
- "source": null,
3668
  "scope": "multi_episode_128_metadata_baseline",
3669
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3670
  },
3671
  {
3672
  "task_number": 14,
@@ -3674,17 +3673,17 @@
3674
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3675
  "series_id": "metadata128_neural_mlp",
3676
  "method": "128ep Metadata NN",
3677
- "status": "not_supported_by_metadata_only_package",
3678
- "status_label": "not supported",
3679
- "scored": false,
3680
  "proxy_scored": false,
3681
- "raw": null,
3682
- "raw_text": "n/a",
3683
- "normalized_score": null,
3684
  "metric_key": "macro_f1",
3685
- "source": null,
3686
  "scope": "multi_episode_128_metadata_baseline",
3687
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3688
  },
3689
  {
3690
  "task_number": 14,
@@ -3782,17 +3781,17 @@
3782
  "task_label": "Interaction Text Prediction",
3783
  "series_id": "metadata128_simple",
3784
  "method": "128ep Metadata Simple",
3785
- "status": "not_supported_by_metadata_only_package",
3786
- "status_label": "not supported",
3787
  "scored": false,
3788
  "proxy_scored": false,
3789
  "raw": null,
3790
  "raw_text": "n/a",
3791
  "normalized_score": null,
3792
  "metric_key": "macro_f1",
3793
- "source": null,
3794
  "scope": "multi_episode_128_metadata_baseline",
3795
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3796
  },
3797
  {
3798
  "task_number": 15,
@@ -3908,17 +3907,17 @@
3908
  "task_label": "Action-Object Relation Prediction",
3909
  "series_id": "metadata128_simple",
3910
  "method": "128ep Metadata Simple",
3911
- "status": "not_supported_by_metadata_only_package",
3912
- "status_label": "not supported",
3913
- "scored": false,
3914
  "proxy_scored": false,
3915
- "raw": null,
3916
- "raw_text": "n/a",
3917
- "normalized_score": null,
3918
  "metric_key": "macro_f1",
3919
- "source": null,
3920
  "scope": "multi_episode_128_metadata_baseline",
3921
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3922
  },
3923
  {
3924
  "task_number": 16,
@@ -3926,17 +3925,17 @@
3926
  "task_label": "Action-Object Relation Prediction",
3927
  "series_id": "metadata128_neural_mlp",
3928
  "method": "128ep Metadata NN",
3929
- "status": "not_supported_by_metadata_only_package",
3930
- "status_label": "not supported",
3931
- "scored": false,
3932
  "proxy_scored": false,
3933
- "raw": null,
3934
- "raw_text": "n/a",
3935
- "normalized_score": null,
3936
  "metric_key": "macro_f1",
3937
- "source": null,
3938
  "scope": "multi_episode_128_metadata_baseline",
3939
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3940
  },
3941
  {
3942
  "task_number": 16,
@@ -4034,17 +4033,17 @@
4034
  "task_label": "Future Object-Set Forecasting",
4035
  "series_id": "metadata128_simple",
4036
  "method": "128ep Metadata Simple",
4037
- "status": "not_supported_by_metadata_only_package",
4038
- "status_label": "not supported",
4039
- "scored": false,
4040
  "proxy_scored": false,
4041
- "raw": null,
4042
- "raw_text": "n/a",
4043
- "normalized_score": null,
4044
  "metric_key": "micro_f1",
4045
- "source": null,
4046
  "scope": "multi_episode_128_metadata_baseline",
4047
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4048
  },
4049
  {
4050
  "task_number": 17,
@@ -4052,17 +4051,17 @@
4052
  "task_label": "Future Object-Set Forecasting",
4053
  "series_id": "metadata128_neural_mlp",
4054
  "method": "128ep Metadata NN",
4055
- "status": "not_supported_by_metadata_only_package",
4056
- "status_label": "not supported",
4057
- "scored": false,
4058
  "proxy_scored": false,
4059
- "raw": null,
4060
- "raw_text": "n/a",
4061
- "normalized_score": null,
4062
  "metric_key": "micro_f1",
4063
- "source": null,
4064
  "scope": "multi_episode_128_metadata_baseline",
4065
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4066
  },
4067
  {
4068
  "task_number": 17,
@@ -4160,17 +4159,17 @@
4160
  "task_label": "IMU-to-Hand Pose Reconstruction",
4161
  "series_id": "metadata128_simple",
4162
  "method": "128ep Metadata Simple",
4163
- "status": "not_supported_by_metadata_only_package",
4164
- "status_label": "not supported",
4165
  "scored": false,
4166
  "proxy_scored": false,
4167
  "raw": null,
4168
  "raw_text": "n/a",
4169
  "normalized_score": null,
4170
  "metric_key": "mae",
4171
- "source": null,
4172
  "scope": "multi_episode_128_metadata_baseline",
4173
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4174
  },
4175
  {
4176
  "task_number": 18,
@@ -4286,17 +4285,17 @@
4286
  "task_label": "Camera-View Synchronization Retrieval",
4287
  "series_id": "metadata128_simple",
4288
  "method": "128ep Metadata Simple",
4289
- "status": "not_supported_by_metadata_only_package",
4290
- "status_label": "not supported",
4291
  "scored": false,
4292
  "proxy_scored": false,
4293
  "raw": null,
4294
  "raw_text": "n/a",
4295
  "normalized_score": null,
4296
  "metric_key": "mrr",
4297
- "source": null,
4298
  "scope": "multi_episode_128_metadata_baseline",
4299
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4300
  },
4301
  {
4302
  "task_number": 19,
@@ -4412,17 +4411,17 @@
4412
  "task_label": "Time-to-Next-Transition Regression",
4413
  "series_id": "metadata128_simple",
4414
  "method": "128ep Metadata Simple",
4415
- "status": "not_supported_by_metadata_only_package",
4416
- "status_label": "not supported",
4417
- "scored": false,
4418
  "proxy_scored": false,
4419
- "raw": null,
4420
- "raw_text": "n/a",
4421
- "normalized_score": null,
4422
  "metric_key": "mae",
4423
- "source": null,
4424
  "scope": "multi_episode_128_metadata_baseline",
4425
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4426
  },
4427
  {
4428
  "task_number": 20,
@@ -4430,17 +4429,17 @@
4430
  "task_label": "Time-to-Next-Transition Regression",
4431
  "series_id": "metadata128_neural_mlp",
4432
  "method": "128ep Metadata NN",
4433
- "status": "not_supported_by_metadata_only_package",
4434
- "status_label": "not supported",
4435
- "scored": false,
4436
  "proxy_scored": false,
4437
- "raw": null,
4438
- "raw_text": "n/a",
4439
- "normalized_score": null,
4440
  "metric_key": "mae",
4441
- "source": null,
4442
  "scope": "multi_episode_128_metadata_baseline",
4443
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4444
  },
4445
  {
4446
  "task_number": 20,
 
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
+ "scored_method_task_count": 93,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
 
30
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
+ "scored_task_count": 13,
34
+ "covered_task_count": 13,
35
  "proxy_scored_task_count": 0,
36
+ "scoreless_task_count": 7,
37
+ "unsupported_task_count": 7,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
+ "scored": 13,
41
+ "unsupported_without_required_target": 7
 
42
  },
43
+ "coverage_fraction": 0.65,
44
  "result_record_fraction": 1.0
45
  },
46
  {
 
54
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
+ "scored_task_count": 13,
58
+ "covered_task_count": 13,
59
  "proxy_scored_task_count": 0,
60
+ "scoreless_task_count": 7,
61
+ "unsupported_task_count": 7,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
+ "not_supported_by_metadata_only_package": 7,
65
+ "scored": 13
66
  },
67
+ "coverage_fraction": 0.65,
68
  "result_record_fraction": 1.0
69
  },
70
  {
 
1294
  "raw128_proxy_axis": false,
1295
  "values": {
1296
  "metadata128_simple": {
1297
+ "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
  "scope": "multi_episode_128_metadata_baseline",
1301
+ "status": "scored",
1302
+ "reason": null,
1303
+ "normalized_score": 0.004579592783699693,
1304
+ "raw_text": "0.0046",
1305
+ "status_label": "scored"
1306
  },
1307
  "metadata128_neural_mlp": {
1308
+ "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
  "scope": "multi_episode_128_metadata_baseline",
1312
+ "status": "scored",
1313
+ "reason": null,
1314
+ "normalized_score": 0.0029821307969142615,
1315
+ "raw_text": "0.0030",
1316
+ "status_label": "scored"
1317
  },
1318
  "raw128_simple": {
1319
  "raw": 0.0024280172369056294,
 
1385
  "raw128_proxy_axis": false,
1386
  "values": {
1387
  "metadata128_simple": {
1388
+ "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
  "scope": "multi_episode_128_metadata_baseline",
1392
+ "status": "scored",
1393
+ "reason": null,
1394
+ "normalized_score": 0.0001206030150753769,
1395
+ "raw_text": "0.0001",
1396
+ "status_label": "scored"
1397
  },
1398
  "metadata128_neural_mlp": {
1399
+ "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
  "scope": "multi_episode_128_metadata_baseline",
1403
+ "status": "scored",
1404
+ "reason": null,
1405
+ "normalized_score": 2.086049543676662e-05,
1406
+ "raw_text": "0.0000",
1407
+ "status_label": "scored"
1408
  },
1409
  "raw128_simple": {
1410
  "raw": 0.0,
 
1478
  "metadata128_simple": {
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
  "scope": "multi_episode_128_metadata_baseline",
1483
+ "status": "unsupported_without_required_target",
1484
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
1486
  "raw_text": "n/a",
1487
+ "status_label": "unsupported"
1488
  },
1489
  "metadata128_neural_mlp": {
1490
  "raw": null,
 
1567
  "raw128_proxy_axis": false,
1568
  "values": {
1569
  "metadata128_simple": {
1570
+ "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
  "scope": "multi_episode_128_metadata_baseline",
1574
+ "status": "scored",
1575
+ "reason": null,
1576
+ "normalized_score": 0.0,
1577
+ "raw_text": "0.0000",
1578
+ "status_label": "scored"
1579
  },
1580
  "metadata128_neural_mlp": {
1581
+ "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
  "scope": "multi_episode_128_metadata_baseline",
1585
+ "status": "scored",
1586
+ "reason": null,
1587
+ "normalized_score": 0.0,
1588
+ "raw_text": "0.0000",
1589
+ "status_label": "scored"
1590
  },
1591
  "raw128_simple": {
1592
  "raw": 0.0,
 
1658
  "raw128_proxy_axis": false,
1659
  "values": {
1660
  "metadata128_simple": {
1661
+ "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
  "scope": "multi_episode_128_metadata_baseline",
1665
+ "status": "scored",
1666
+ "reason": null,
1667
+ "normalized_score": 0.17656983343047333,
1668
+ "raw_text": "0.1766",
1669
+ "status_label": "scored"
1670
  },
1671
  "metadata128_neural_mlp": {
1672
+ "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
  "scope": "multi_episode_128_metadata_baseline",
1676
+ "status": "scored",
1677
+ "reason": null,
1678
+ "normalized_score": 0.17418550827844048,
1679
+ "raw_text": "0.1742",
1680
+ "status_label": "scored"
1681
  },
1682
  "raw128_simple": {
1683
  "raw": 0.06469493412657774,
 
1751
  "metadata128_simple": {
1752
  "raw": null,
1753
  "metric_key": "mae",
1754
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
  "scope": "multi_episode_128_metadata_baseline",
1756
+ "status": "unsupported_without_required_target",
1757
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
1758
  "normalized_score": null,
1759
  "raw_text": "n/a",
1760
+ "status_label": "unsupported"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
  "raw": null,
 
1842
  "metadata128_simple": {
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
  "scope": "multi_episode_128_metadata_baseline",
1847
+ "status": "unsupported_without_required_target",
1848
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
1850
  "raw_text": "n/a",
1851
+ "status_label": "unsupported"
1852
  },
1853
  "metadata128_neural_mlp": {
1854
  "raw": null,
 
1931
  "raw128_proxy_axis": false,
1932
  "values": {
1933
  "metadata128_simple": {
1934
+ "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
  "scope": "multi_episode_128_metadata_baseline",
1938
+ "status": "scored",
1939
+ "reason": null,
1940
+ "normalized_score": 0.016864874132806403,
1941
+ "raw_text": "624.81",
1942
+ "status_label": "scored"
1943
  },
1944
  "metadata128_neural_mlp": {
1945
+ "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
  "scope": "multi_episode_128_metadata_baseline",
1949
+ "status": "scored",
1950
+ "reason": null,
1951
+ "normalized_score": 0.25411768748242325,
1952
+ "raw_text": "41.47",
1953
+ "status_label": "scored"
1954
  },
1955
  "raw128_simple": {
1956
  "raw": 52.32759475708008,
 
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
  "method": "128ep Metadata Simple",
3532
+ "status": "scored",
3533
+ "status_label": "scored",
3534
+ "scored": true,
3535
  "proxy_scored": false,
3536
+ "raw": 0.004579592783699693,
3537
+ "raw_text": "0.0046",
3538
+ "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
  "scope": "multi_episode_128_metadata_baseline",
3542
+ "reason": null
3543
  },
3544
  {
3545
  "task_number": 13,
 
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
  "method": "128ep Metadata NN",
3550
+ "status": "scored",
3551
+ "status_label": "scored",
3552
+ "scored": true,
3553
  "proxy_scored": false,
3554
+ "raw": 0.0029821307969142615,
3555
+ "raw_text": "0.0030",
3556
+ "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
  "scope": "multi_episode_128_metadata_baseline",
3560
+ "reason": null
3561
  },
3562
  {
3563
  "task_number": 13,
 
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
  "method": "128ep Metadata Simple",
3658
+ "status": "scored",
3659
+ "status_label": "scored",
3660
+ "scored": true,
3661
  "proxy_scored": false,
3662
+ "raw": 0.0001206030150753769,
3663
+ "raw_text": "0.0001",
3664
+ "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
  "scope": "multi_episode_128_metadata_baseline",
3668
+ "reason": null
3669
  },
3670
  {
3671
  "task_number": 14,
 
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
  "method": "128ep Metadata NN",
3676
+ "status": "scored",
3677
+ "status_label": "scored",
3678
+ "scored": true,
3679
  "proxy_scored": false,
3680
+ "raw": 2.086049543676662e-05,
3681
+ "raw_text": "0.0000",
3682
+ "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
  "scope": "multi_episode_128_metadata_baseline",
3686
+ "reason": null
3687
  },
3688
  {
3689
  "task_number": 14,
 
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
  "method": "128ep Metadata Simple",
3784
+ "status": "unsupported_without_required_target",
3785
+ "status_label": "unsupported",
3786
  "scored": false,
3787
  "proxy_scored": false,
3788
  "raw": null,
3789
  "raw_text": "n/a",
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
  "scope": "multi_episode_128_metadata_baseline",
3794
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
3797
  "task_number": 15,
 
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
  "method": "128ep Metadata Simple",
3910
+ "status": "scored",
3911
+ "status_label": "scored",
3912
+ "scored": true,
3913
  "proxy_scored": false,
3914
+ "raw": 0.0,
3915
+ "raw_text": "0.0000",
3916
+ "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
  "scope": "multi_episode_128_metadata_baseline",
3920
+ "reason": null
3921
  },
3922
  {
3923
  "task_number": 16,
 
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
  "method": "128ep Metadata NN",
3928
+ "status": "scored",
3929
+ "status_label": "scored",
3930
+ "scored": true,
3931
  "proxy_scored": false,
3932
+ "raw": 0.0,
3933
+ "raw_text": "0.0000",
3934
+ "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
  "scope": "multi_episode_128_metadata_baseline",
3938
+ "reason": null
3939
  },
3940
  {
3941
  "task_number": 16,
 
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
  "method": "128ep Metadata Simple",
4036
+ "status": "scored",
4037
+ "status_label": "scored",
4038
+ "scored": true,
4039
  "proxy_scored": false,
4040
+ "raw": 0.17656983343047333,
4041
+ "raw_text": "0.1766",
4042
+ "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
  "scope": "multi_episode_128_metadata_baseline",
4046
+ "reason": null
4047
  },
4048
  {
4049
  "task_number": 17,
 
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
  "method": "128ep Metadata NN",
4054
+ "status": "scored",
4055
+ "status_label": "scored",
4056
+ "scored": true,
4057
  "proxy_scored": false,
4058
+ "raw": 0.17418550827844048,
4059
+ "raw_text": "0.1742",
4060
+ "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
  "scope": "multi_episode_128_metadata_baseline",
4064
+ "reason": null
4065
  },
4066
  {
4067
  "task_number": 17,
 
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
  "method": "128ep Metadata Simple",
4162
+ "status": "unsupported_without_required_target",
4163
+ "status_label": "unsupported",
4164
  "scored": false,
4165
  "proxy_scored": false,
4166
  "raw": null,
4167
  "raw_text": "n/a",
4168
  "normalized_score": null,
4169
  "metric_key": "mae",
4170
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
  "scope": "multi_episode_128_metadata_baseline",
4172
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
4173
  },
4174
  {
4175
  "task_number": 18,
 
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
  "method": "128ep Metadata Simple",
4288
+ "status": "unsupported_without_required_target",
4289
+ "status_label": "unsupported",
4290
  "scored": false,
4291
  "proxy_scored": false,
4292
  "raw": null,
4293
  "raw_text": "n/a",
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
  "scope": "multi_episode_128_metadata_baseline",
4298
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
4301
  "task_number": 19,
 
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
  "method": "128ep Metadata Simple",
4414
+ "status": "scored",
4415
+ "status_label": "scored",
4416
+ "scored": true,
4417
  "proxy_scored": false,
4418
+ "raw": 624.8108520507812,
4419
+ "raw_text": "624.81",
4420
+ "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
  "scope": "multi_episode_128_metadata_baseline",
4424
+ "reason": null
4425
  },
4426
  {
4427
  "task_number": 20,
 
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
  "method": "128ep Metadata NN",
4432
+ "status": "scored",
4433
+ "status_label": "scored",
4434
+ "scored": true,
4435
  "proxy_scored": false,
4436
+ "raw": 41.4664421081543,
4437
+ "raw_text": "41.47",
4438
+ "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
  "scope": "multi_episode_128_metadata_baseline",
4442
+ "reason": null
4443
  },
4444
  {
4445
  "task_number": 20,
docs/data/mirror_parity.json CHANGED
The diff for this file is too large to render. See raw diff
 
docs/data/public_surface_qa.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:41:42+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
@@ -18,7 +18,7 @@
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
- "generated_at_utc": "2026-06-18T11:18:05+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
@@ -43,12 +43,12 @@
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
- "generated_at_utc": "2026-06-18T11:18:57+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
- "generated_at_utc": "2026-06-18T11:21:54+00:00"
52
  }
53
  },
54
  "failures": {}
 
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
 
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
+ "generated_at_utc": "2026-06-18T11:41:43+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
 
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
+ "generated_at_utc": "2026-06-18T11:42:48+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
+ "generated_at_utc": "2026-06-18T11:43:59+00:00"
52
  }
53
  },
54
  "failures": {}
docs/data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:42:48+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -215,8 +215,8 @@
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
- "file_count": 1276,
219
- "text_file_count": 1072,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
@@ -226,8 +226,8 @@
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
- "file_count": 1058,
230
- "text_file_count": 879,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
@@ -237,8 +237,8 @@
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
- "file_count": 2537,
241
- "text_file_count": 1085,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
@@ -248,8 +248,8 @@
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
- "file_count": 2956,
252
- "text_file_count": 1247,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:10:47+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
+ "file_count": 1321,
219
+ "text_file_count": 1108,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
 
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
+ "file_count": 1103,
230
+ "text_file_count": 915,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
 
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
+ "file_count": 2582,
241
+ "text_file_count": 1121,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
 
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
+ "file_count": 3001,
252
+ "text_file_count": 1283,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
docs/data/quality_gates.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:20:56+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
docs/data/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:18:06+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:48+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
docs/data/single_episode_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
 
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
docs/data/source_alignment_audit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:18:04+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
 
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:45+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
docs/data/task_method_20_gap_audit.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "generated_at_utc": "2026-06-18T11:15:34+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
- "purpose": "Keep the 57 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
@@ -50,11 +50,12 @@
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
  "scope": "128 selected episodes, JSONL metadata/text only",
53
- "scored_task_count": 8,
54
- "scoreless_task_count": 12,
55
  "status_counts": {
56
- "not_supported_by_metadata_only_package": 12,
57
- "scored": 8
 
58
  }
59
  },
60
  "metadata128_simple": {
@@ -63,12 +64,11 @@
63
  "proxy_scored_task_count": 0,
64
  "result_record_count": 20,
65
  "scope": "128 selected episodes, JSONL metadata/text only",
66
- "scored_task_count": 8,
67
- "scoreless_task_count": 12,
68
  "status_counts": {
69
- "not_supported_by_metadata_only_package": 8,
70
- "scored": 8,
71
- "unsupported_without_required_target": 4
72
  }
73
  },
74
  "minimal": {
@@ -138,18 +138,25 @@
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
- "metadata128_neural_mlp": 12,
142
- "metadata128_simple": 12,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
- "not_supported_by_metadata_only_package": 20,
148
- "unsupported_without_required_target": 4
149
  },
150
  "missing_by_task": {
 
 
 
151
  "02 Procedure Step Recognition": [
152
- "cosmos3_nano_future_window"
 
 
 
 
153
  ],
154
  "05 Hand Trajectory Forecasting": [
155
  "cosmos3_nano_future_window",
@@ -190,14 +197,12 @@
190
  "13 Long-Horizon Next-Action Forecasting": [
191
  "cosmos3_nano_future_window",
192
  "cosmos3_super_reasoner",
193
- "metadata128_neural_mlp",
194
- "metadata128_simple"
195
  ],
196
  "14 Long-Horizon Next-Subtask Forecasting": [
197
  "cosmos3_nano_future_window",
198
  "cosmos3_super_reasoner",
199
- "metadata128_neural_mlp",
200
- "metadata128_simple"
201
  ],
202
  "15 Interaction Text Prediction": [
203
  "cosmos3_nano_future_window",
@@ -208,14 +213,11 @@
208
  ],
209
  "16 Action-Object Relation Prediction": [
210
  "cosmos3_nano_future_window",
211
- "metadata128_neural_mlp",
212
- "metadata128_simple"
213
  ],
214
  "17 Future Object-Set Forecasting": [
215
  "cosmos3_nano_future_window",
216
- "cosmos3_super_reasoner",
217
- "metadata128_neural_mlp",
218
- "metadata128_simple"
219
  ],
220
  "18 IMU-to-Hand Pose Reconstruction": [
221
  "cosmos3_nano_future_window",
@@ -233,12 +235,36 @@
233
  ],
234
  "20 Time-to-Next-Transition Regression": [
235
  "cosmos3_nano_future_window",
236
- "cosmos3_super_reasoner",
237
- "metadata128_neural_mlp",
238
- "metadata128_simple"
239
  ]
240
  },
241
  "missing_records": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  {
243
  "method": "Cosmos3-Nano Future Window",
244
  "metric_key": "macro_f1",
@@ -252,6 +278,19 @@
252
  "task_label": "Procedure Step Recognition",
253
  "task_number": 2
254
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
255
  {
256
  "method": "128ep Metadata Simple",
257
  "metric_key": "mpjpe",
@@ -538,28 +577,15 @@
538
  "task_label": "Multimodal Synchronization Detection",
539
  "task_number": 12
540
  },
541
- {
542
- "method": "128ep Metadata Simple",
543
- "metric_key": "macro_f1",
544
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
545
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
546
- "scope": "multi_episode_128_metadata_baseline",
547
- "series_id": "metadata128_simple",
548
- "status": "not_supported_by_metadata_only_package",
549
- "status_label": "not supported",
550
- "task_id": "long_horizon_next_action",
551
- "task_label": "Long-Horizon Next-Action Forecasting",
552
- "task_number": 13
553
- },
554
  {
555
  "method": "128ep Metadata NN",
556
  "metric_key": "macro_f1",
557
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
558
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
559
  "scope": "multi_episode_128_metadata_baseline",
560
  "series_id": "metadata128_neural_mlp",
561
- "status": "not_supported_by_metadata_only_package",
562
- "status_label": "not supported",
563
  "task_id": "long_horizon_next_action",
564
  "task_label": "Long-Horizon Next-Action Forecasting",
565
  "task_number": 13
@@ -590,28 +616,15 @@
590
  "task_label": "Long-Horizon Next-Action Forecasting",
591
  "task_number": 13
592
  },
593
- {
594
- "method": "128ep Metadata Simple",
595
- "metric_key": "macro_f1",
596
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
597
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
598
- "scope": "multi_episode_128_metadata_baseline",
599
- "series_id": "metadata128_simple",
600
- "status": "not_supported_by_metadata_only_package",
601
- "status_label": "not supported",
602
- "task_id": "next_subtask_forecast",
603
- "task_label": "Long-Horizon Next-Subtask Forecasting",
604
- "task_number": 14
605
- },
606
  {
607
  "method": "128ep Metadata NN",
608
  "metric_key": "macro_f1",
609
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
610
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
611
  "scope": "multi_episode_128_metadata_baseline",
612
  "series_id": "metadata128_neural_mlp",
613
- "status": "not_supported_by_metadata_only_package",
614
- "status_label": "not supported",
615
  "task_id": "next_subtask_forecast",
616
  "task_label": "Long-Horizon Next-Subtask Forecasting",
617
  "task_number": 14
@@ -645,12 +658,12 @@
645
  {
646
  "method": "128ep Metadata Simple",
647
  "metric_key": "macro_f1",
648
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
649
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
650
  "scope": "multi_episode_128_metadata_baseline",
651
  "series_id": "metadata128_simple",
652
- "status": "not_supported_by_metadata_only_package",
653
- "status_label": "not supported",
654
  "task_id": "interaction_text_prediction",
655
  "task_label": "Interaction Text Prediction",
656
  "task_number": 15
@@ -707,28 +720,15 @@
707
  "task_label": "Interaction Text Prediction",
708
  "task_number": 15
709
  },
710
- {
711
- "method": "128ep Metadata Simple",
712
- "metric_key": "macro_f1",
713
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
714
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
715
- "scope": "multi_episode_128_metadata_baseline",
716
- "series_id": "metadata128_simple",
717
- "status": "not_supported_by_metadata_only_package",
718
- "status_label": "not supported",
719
- "task_id": "action_object_relation",
720
- "task_label": "Action-Object Relation Prediction",
721
- "task_number": 16
722
- },
723
  {
724
  "method": "128ep Metadata NN",
725
  "metric_key": "macro_f1",
726
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
727
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
728
  "scope": "multi_episode_128_metadata_baseline",
729
  "series_id": "metadata128_neural_mlp",
730
- "status": "not_supported_by_metadata_only_package",
731
- "status_label": "not supported",
732
  "task_id": "action_object_relation",
733
  "task_label": "Action-Object Relation Prediction",
734
  "task_number": 16
@@ -746,32 +746,6 @@
746
  "task_label": "Action-Object Relation Prediction",
747
  "task_number": 16
748
  },
749
- {
750
- "method": "128ep Metadata Simple",
751
- "metric_key": "micro_f1",
752
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
753
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
754
- "scope": "multi_episode_128_metadata_baseline",
755
- "series_id": "metadata128_simple",
756
- "status": "not_supported_by_metadata_only_package",
757
- "status_label": "not supported",
758
- "task_id": "object_set_forecast",
759
- "task_label": "Future Object-Set Forecasting",
760
- "task_number": 17
761
- },
762
- {
763
- "method": "128ep Metadata NN",
764
- "metric_key": "micro_f1",
765
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
766
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
767
- "scope": "multi_episode_128_metadata_baseline",
768
- "series_id": "metadata128_neural_mlp",
769
- "status": "not_supported_by_metadata_only_package",
770
- "status_label": "not supported",
771
- "task_id": "object_set_forecast",
772
- "task_label": "Future Object-Set Forecasting",
773
- "task_number": 17
774
- },
775
  {
776
  "method": "Cosmos3-Super Reasoner",
777
  "metric_key": "micro_f1",
@@ -801,12 +775,12 @@
801
  {
802
  "method": "128ep Metadata Simple",
803
  "metric_key": "mae",
804
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
805
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
806
  "scope": "multi_episode_128_metadata_baseline",
807
  "series_id": "metadata128_simple",
808
- "status": "not_supported_by_metadata_only_package",
809
- "status_label": "not supported",
810
  "task_id": "imu_to_hand_pose",
811
  "task_label": "IMU-to-Hand Pose Reconstruction",
812
  "task_number": 18
@@ -866,12 +840,12 @@
866
  {
867
  "method": "128ep Metadata Simple",
868
  "metric_key": "mrr",
869
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
870
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
871
  "scope": "multi_episode_128_metadata_baseline",
872
  "series_id": "metadata128_simple",
873
- "status": "not_supported_by_metadata_only_package",
874
- "status_label": "not supported",
875
  "task_id": "camera_view_sync_retrieval",
876
  "task_label": "Camera-View Synchronization Retrieval",
877
  "task_number": 19
@@ -928,32 +902,6 @@
928
  "task_label": "Camera-View Synchronization Retrieval",
929
  "task_number": 19
930
  },
931
- {
932
- "method": "128ep Metadata Simple",
933
- "metric_key": "mae",
934
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
935
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
936
- "scope": "multi_episode_128_metadata_baseline",
937
- "series_id": "metadata128_simple",
938
- "status": "not_supported_by_metadata_only_package",
939
- "status_label": "not supported",
940
- "task_id": "time_to_transition",
941
- "task_label": "Time-to-Next-Transition Regression",
942
- "task_number": 20
943
- },
944
- {
945
- "method": "128ep Metadata NN",
946
- "metric_key": "mae",
947
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
948
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
949
- "scope": "multi_episode_128_metadata_baseline",
950
- "series_id": "metadata128_neural_mlp",
951
- "status": "not_supported_by_metadata_only_package",
952
- "status_label": "not supported",
953
- "task_id": "time_to_transition",
954
- "task_label": "Time-to-Next-Transition Regression",
955
- "task_number": 20
956
- },
957
  {
958
  "method": "Cosmos3-Super Reasoner",
959
  "metric_key": "mae",
@@ -1027,8 +975,8 @@
1027
  "method_count": 9,
1028
  "method_task_record_count": 180,
1029
  "proxy_scored_method_task_count": 4,
1030
- "scored_method_task_count": 123,
1031
- "scoreless_method_task_count": 57,
1032
  "task_count": 20
1033
  },
1034
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
 
1
  {
2
+ "generated_at_utc": "2026-06-18T12:07:14+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
+ "purpose": "Keep the 53 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
 
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
  "scope": "128 selected episodes, JSONL metadata/text only",
53
+ "scored_task_count": 7,
54
+ "scoreless_task_count": 13,
55
  "status_counts": {
56
+ "not_supported_by_metadata_only_package": 7,
57
+ "scored": 7,
58
+ "unsupported_without_required_target": 6
59
  }
60
  },
61
  "metadata128_simple": {
 
64
  "proxy_scored_task_count": 0,
65
  "result_record_count": 20,
66
  "scope": "128 selected episodes, JSONL metadata/text only",
67
+ "scored_task_count": 13,
68
+ "scoreless_task_count": 7,
69
  "status_counts": {
70
+ "scored": 13,
71
+ "unsupported_without_required_target": 7
 
72
  }
73
  },
74
  "minimal": {
 
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
+ "metadata128_neural_mlp": 13,
142
+ "metadata128_simple": 7,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
+ "not_supported_by_metadata_only_package": 7,
148
+ "unsupported_without_required_target": 13
149
  },
150
  "missing_by_task": {
151
+ "01 Action Recognition": [
152
+ "metadata128_neural_mlp"
153
+ ],
154
  "02 Procedure Step Recognition": [
155
+ "cosmos3_nano_future_window",
156
+ "metadata128_neural_mlp"
157
+ ],
158
+ "04 Next-Action Prediction": [
159
+ "metadata128_neural_mlp"
160
  ],
161
  "05 Hand Trajectory Forecasting": [
162
  "cosmos3_nano_future_window",
 
197
  "13 Long-Horizon Next-Action Forecasting": [
198
  "cosmos3_nano_future_window",
199
  "cosmos3_super_reasoner",
200
+ "metadata128_neural_mlp"
 
201
  ],
202
  "14 Long-Horizon Next-Subtask Forecasting": [
203
  "cosmos3_nano_future_window",
204
  "cosmos3_super_reasoner",
205
+ "metadata128_neural_mlp"
 
206
  ],
207
  "15 Interaction Text Prediction": [
208
  "cosmos3_nano_future_window",
 
213
  ],
214
  "16 Action-Object Relation Prediction": [
215
  "cosmos3_nano_future_window",
216
+ "metadata128_neural_mlp"
 
217
  ],
218
  "17 Future Object-Set Forecasting": [
219
  "cosmos3_nano_future_window",
220
+ "cosmos3_super_reasoner"
 
 
221
  ],
222
  "18 IMU-to-Hand Pose Reconstruction": [
223
  "cosmos3_nano_future_window",
 
235
  ],
236
  "20 Time-to-Next-Transition Regression": [
237
  "cosmos3_nano_future_window",
238
+ "cosmos3_super_reasoner"
 
 
239
  ]
240
  },
241
  "missing_records": [
242
+ {
243
+ "method": "128ep Metadata NN",
244
+ "metric_key": "macro_f1",
245
+ "reason": "train class count 896 exceeds --max-neural-classes 512",
246
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
247
+ "scope": "multi_episode_128_metadata_baseline",
248
+ "series_id": "metadata128_neural_mlp",
249
+ "status": "unsupported_without_required_target",
250
+ "status_label": "unsupported",
251
+ "task_id": "timeline_action",
252
+ "task_label": "Action Recognition",
253
+ "task_number": 1
254
+ },
255
+ {
256
+ "method": "128ep Metadata NN",
257
+ "metric_key": "macro_f1",
258
+ "reason": "train class count 652 exceeds --max-neural-classes 512",
259
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
260
+ "scope": "multi_episode_128_metadata_baseline",
261
+ "series_id": "metadata128_neural_mlp",
262
+ "status": "unsupported_without_required_target",
263
+ "status_label": "unsupported",
264
+ "task_id": "timeline_subtask",
265
+ "task_label": "Procedure Step Recognition",
266
+ "task_number": 2
267
+ },
268
  {
269
  "method": "Cosmos3-Nano Future Window",
270
  "metric_key": "macro_f1",
 
278
  "task_label": "Procedure Step Recognition",
279
  "task_number": 2
280
  },
281
+ {
282
+ "method": "128ep Metadata NN",
283
+ "metric_key": "macro_f1",
284
+ "reason": "train class count 891 exceeds --max-neural-classes 512",
285
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
286
+ "scope": "multi_episode_128_metadata_baseline",
287
+ "series_id": "metadata128_neural_mlp",
288
+ "status": "unsupported_without_required_target",
289
+ "status_label": "unsupported",
290
+ "task_id": "next_action",
291
+ "task_label": "Next-Action Prediction",
292
+ "task_number": 4
293
+ },
294
  {
295
  "method": "128ep Metadata Simple",
296
  "metric_key": "mpjpe",
 
577
  "task_label": "Multimodal Synchronization Detection",
578
  "task_number": 12
579
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
580
  {
581
  "method": "128ep Metadata NN",
582
  "metric_key": "macro_f1",
583
+ "reason": "train class count 887 exceeds --max-neural-classes 512",
584
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
585
  "scope": "multi_episode_128_metadata_baseline",
586
  "series_id": "metadata128_neural_mlp",
587
+ "status": "unsupported_without_required_target",
588
+ "status_label": "unsupported",
589
  "task_id": "long_horizon_next_action",
590
  "task_label": "Long-Horizon Next-Action Forecasting",
591
  "task_number": 13
 
616
  "task_label": "Long-Horizon Next-Action Forecasting",
617
  "task_number": 13
618
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
619
  {
620
  "method": "128ep Metadata NN",
621
  "metric_key": "macro_f1",
622
+ "reason": "train class count 651 exceeds --max-neural-classes 512",
623
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
624
  "scope": "multi_episode_128_metadata_baseline",
625
  "series_id": "metadata128_neural_mlp",
626
+ "status": "unsupported_without_required_target",
627
+ "status_label": "unsupported",
628
  "task_id": "next_subtask_forecast",
629
  "task_label": "Long-Horizon Next-Subtask Forecasting",
630
  "task_number": 14
 
658
  {
659
  "method": "128ep Metadata Simple",
660
  "metric_key": "macro_f1",
661
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
662
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
663
  "scope": "multi_episode_128_metadata_baseline",
664
  "series_id": "metadata128_simple",
665
+ "status": "unsupported_without_required_target",
666
+ "status_label": "unsupported",
667
  "task_id": "interaction_text_prediction",
668
  "task_label": "Interaction Text Prediction",
669
  "task_number": 15
 
720
  "task_label": "Interaction Text Prediction",
721
  "task_number": 15
722
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
723
  {
724
  "method": "128ep Metadata NN",
725
  "metric_key": "macro_f1",
726
+ "reason": "train class count 3058 exceeds --max-neural-classes 512",
727
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
728
  "scope": "multi_episode_128_metadata_baseline",
729
  "series_id": "metadata128_neural_mlp",
730
+ "status": "unsupported_without_required_target",
731
+ "status_label": "unsupported",
732
  "task_id": "action_object_relation",
733
  "task_label": "Action-Object Relation Prediction",
734
  "task_number": 16
 
746
  "task_label": "Action-Object Relation Prediction",
747
  "task_number": 16
748
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
749
  {
750
  "method": "Cosmos3-Super Reasoner",
751
  "metric_key": "micro_f1",
 
775
  {
776
  "method": "128ep Metadata Simple",
777
  "metric_key": "mae",
778
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
779
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
780
  "scope": "multi_episode_128_metadata_baseline",
781
  "series_id": "metadata128_simple",
782
+ "status": "unsupported_without_required_target",
783
+ "status_label": "unsupported",
784
  "task_id": "imu_to_hand_pose",
785
  "task_label": "IMU-to-Hand Pose Reconstruction",
786
  "task_number": 18
 
840
  {
841
  "method": "128ep Metadata Simple",
842
  "metric_key": "mrr",
843
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
844
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
845
  "scope": "multi_episode_128_metadata_baseline",
846
  "series_id": "metadata128_simple",
847
+ "status": "unsupported_without_required_target",
848
+ "status_label": "unsupported",
849
  "task_id": "camera_view_sync_retrieval",
850
  "task_label": "Camera-View Synchronization Retrieval",
851
  "task_number": 19
 
902
  "task_label": "Camera-View Synchronization Retrieval",
903
  "task_number": 19
904
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
905
  {
906
  "method": "Cosmos3-Super Reasoner",
907
  "metric_key": "mae",
 
975
  "method_count": 9,
976
  "method_task_record_count": 180,
977
  "proxy_scored_method_task_count": 4,
978
+ "scored_method_task_count": 127,
979
+ "scoreless_method_task_count": 53,
980
  "task_count": 20
981
  },
982
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
docs/data/task_method_20_result_matrix.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 123,
9
  "series": [
10
  {
11
  "id": "minimal",
@@ -64,18 +64,17 @@
64
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
- "scored_task_count": 8,
68
- "covered_task_count": 8,
69
  "proxy_scored_task_count": 0,
70
- "scoreless_task_count": 12,
71
- "unsupported_task_count": 12,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
- "not_supported_by_metadata_only_package": 8,
75
- "scored": 8,
76
- "unsupported_without_required_target": 4
77
  },
78
- "coverage_fraction": 0.4,
79
  "result_record_fraction": 1.0
80
  },
81
  {
@@ -89,17 +88,17 @@
89
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
90
  "plotted_as": "colored point overlay",
91
  "result_record_count": 20,
92
- "scored_task_count": 8,
93
- "covered_task_count": 8,
94
  "proxy_scored_task_count": 0,
95
- "scoreless_task_count": 12,
96
- "unsupported_task_count": 12,
97
  "not_evaluated_task_count": 0,
98
  "status_counts": {
99
- "not_supported_by_metadata_only_package": 12,
100
- "scored": 8
101
  },
102
- "coverage_fraction": 0.4,
103
  "result_record_fraction": 1.0
104
  },
105
  {
@@ -2210,17 +2209,17 @@
2210
  "task_label": "Long-Horizon Next-Action Forecasting",
2211
  "series_id": "metadata128_simple",
2212
  "method": "128ep Metadata Simple",
2213
- "status": "not_supported_by_metadata_only_package",
2214
- "status_label": "not supported",
2215
- "scored": false,
2216
  "proxy_scored": false,
2217
- "raw": null,
2218
- "raw_text": "n/a",
2219
- "normalized_score": null,
2220
  "metric_key": "macro_f1",
2221
- "source": null,
2222
  "scope": "multi_episode_128_metadata_baseline",
2223
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2224
  },
2225
  {
2226
  "task_number": 13,
@@ -2228,17 +2227,17 @@
2228
  "task_label": "Long-Horizon Next-Action Forecasting",
2229
  "series_id": "metadata128_neural_mlp",
2230
  "method": "128ep Metadata NN",
2231
- "status": "not_supported_by_metadata_only_package",
2232
- "status_label": "not supported",
2233
- "scored": false,
2234
  "proxy_scored": false,
2235
- "raw": null,
2236
- "raw_text": "n/a",
2237
- "normalized_score": null,
2238
  "metric_key": "macro_f1",
2239
- "source": null,
2240
  "scope": "multi_episode_128_metadata_baseline",
2241
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2242
  },
2243
  {
2244
  "task_number": 13,
@@ -2372,17 +2371,17 @@
2372
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2373
  "series_id": "metadata128_simple",
2374
  "method": "128ep Metadata Simple",
2375
- "status": "not_supported_by_metadata_only_package",
2376
- "status_label": "not supported",
2377
- "scored": false,
2378
  "proxy_scored": false,
2379
- "raw": null,
2380
- "raw_text": "n/a",
2381
- "normalized_score": null,
2382
  "metric_key": "macro_f1",
2383
- "source": null,
2384
  "scope": "multi_episode_128_metadata_baseline",
2385
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2386
  },
2387
  {
2388
  "task_number": 14,
@@ -2390,17 +2389,17 @@
2390
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2391
  "series_id": "metadata128_neural_mlp",
2392
  "method": "128ep Metadata NN",
2393
- "status": "not_supported_by_metadata_only_package",
2394
- "status_label": "not supported",
2395
- "scored": false,
2396
  "proxy_scored": false,
2397
- "raw": null,
2398
- "raw_text": "n/a",
2399
- "normalized_score": null,
2400
  "metric_key": "macro_f1",
2401
- "source": null,
2402
  "scope": "multi_episode_128_metadata_baseline",
2403
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2404
  },
2405
  {
2406
  "task_number": 14,
@@ -2534,17 +2533,17 @@
2534
  "task_label": "Interaction Text Prediction",
2535
  "series_id": "metadata128_simple",
2536
  "method": "128ep Metadata Simple",
2537
- "status": "not_supported_by_metadata_only_package",
2538
- "status_label": "not supported",
2539
  "scored": false,
2540
  "proxy_scored": false,
2541
  "raw": null,
2542
  "raw_text": "n/a",
2543
  "normalized_score": null,
2544
  "metric_key": "macro_f1",
2545
- "source": null,
2546
  "scope": "multi_episode_128_metadata_baseline",
2547
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2548
  },
2549
  {
2550
  "task_number": 15,
@@ -2696,17 +2695,17 @@
2696
  "task_label": "Action-Object Relation Prediction",
2697
  "series_id": "metadata128_simple",
2698
  "method": "128ep Metadata Simple",
2699
- "status": "not_supported_by_metadata_only_package",
2700
- "status_label": "not supported",
2701
- "scored": false,
2702
  "proxy_scored": false,
2703
- "raw": null,
2704
- "raw_text": "n/a",
2705
- "normalized_score": null,
2706
  "metric_key": "macro_f1",
2707
- "source": null,
2708
  "scope": "multi_episode_128_metadata_baseline",
2709
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2710
  },
2711
  {
2712
  "task_number": 16,
@@ -2714,17 +2713,17 @@
2714
  "task_label": "Action-Object Relation Prediction",
2715
  "series_id": "metadata128_neural_mlp",
2716
  "method": "128ep Metadata NN",
2717
- "status": "not_supported_by_metadata_only_package",
2718
- "status_label": "not supported",
2719
- "scored": false,
2720
  "proxy_scored": false,
2721
- "raw": null,
2722
- "raw_text": "n/a",
2723
- "normalized_score": null,
2724
  "metric_key": "macro_f1",
2725
- "source": null,
2726
  "scope": "multi_episode_128_metadata_baseline",
2727
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2728
  },
2729
  {
2730
  "task_number": 16,
@@ -2858,17 +2857,17 @@
2858
  "task_label": "Future Object-Set Forecasting",
2859
  "series_id": "metadata128_simple",
2860
  "method": "128ep Metadata Simple",
2861
- "status": "not_supported_by_metadata_only_package",
2862
- "status_label": "not supported",
2863
- "scored": false,
2864
  "proxy_scored": false,
2865
- "raw": null,
2866
- "raw_text": "n/a",
2867
- "normalized_score": null,
2868
  "metric_key": "micro_f1",
2869
- "source": null,
2870
  "scope": "multi_episode_128_metadata_baseline",
2871
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2872
  },
2873
  {
2874
  "task_number": 17,
@@ -2876,17 +2875,17 @@
2876
  "task_label": "Future Object-Set Forecasting",
2877
  "series_id": "metadata128_neural_mlp",
2878
  "method": "128ep Metadata NN",
2879
- "status": "not_supported_by_metadata_only_package",
2880
- "status_label": "not supported",
2881
- "scored": false,
2882
  "proxy_scored": false,
2883
- "raw": null,
2884
- "raw_text": "n/a",
2885
- "normalized_score": null,
2886
  "metric_key": "micro_f1",
2887
- "source": null,
2888
  "scope": "multi_episode_128_metadata_baseline",
2889
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2890
  },
2891
  {
2892
  "task_number": 17,
@@ -3020,17 +3019,17 @@
3020
  "task_label": "IMU-to-Hand Pose Reconstruction",
3021
  "series_id": "metadata128_simple",
3022
  "method": "128ep Metadata Simple",
3023
- "status": "not_supported_by_metadata_only_package",
3024
- "status_label": "not supported",
3025
  "scored": false,
3026
  "proxy_scored": false,
3027
  "raw": null,
3028
  "raw_text": "n/a",
3029
  "normalized_score": null,
3030
  "metric_key": "mae",
3031
- "source": null,
3032
  "scope": "multi_episode_128_metadata_baseline",
3033
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3034
  },
3035
  {
3036
  "task_number": 18,
@@ -3182,17 +3181,17 @@
3182
  "task_label": "Camera-View Synchronization Retrieval",
3183
  "series_id": "metadata128_simple",
3184
  "method": "128ep Metadata Simple",
3185
- "status": "not_supported_by_metadata_only_package",
3186
- "status_label": "not supported",
3187
  "scored": false,
3188
  "proxy_scored": false,
3189
  "raw": null,
3190
  "raw_text": "n/a",
3191
  "normalized_score": null,
3192
  "metric_key": "mrr",
3193
- "source": null,
3194
  "scope": "multi_episode_128_metadata_baseline",
3195
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3196
  },
3197
  {
3198
  "task_number": 19,
@@ -3344,17 +3343,17 @@
3344
  "task_label": "Time-to-Next-Transition Regression",
3345
  "series_id": "metadata128_simple",
3346
  "method": "128ep Metadata Simple",
3347
- "status": "not_supported_by_metadata_only_package",
3348
- "status_label": "not supported",
3349
- "scored": false,
3350
  "proxy_scored": false,
3351
- "raw": null,
3352
- "raw_text": "n/a",
3353
- "normalized_score": null,
3354
  "metric_key": "mae",
3355
- "source": null,
3356
  "scope": "multi_episode_128_metadata_baseline",
3357
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3358
  },
3359
  {
3360
  "task_number": 20,
@@ -3362,17 +3361,17 @@
3362
  "task_label": "Time-to-Next-Transition Regression",
3363
  "series_id": "metadata128_neural_mlp",
3364
  "method": "128ep Metadata NN",
3365
- "status": "not_supported_by_metadata_only_package",
3366
- "status_label": "not supported",
3367
- "scored": false,
3368
  "proxy_scored": false,
3369
- "raw": null,
3370
- "raw_text": "n/a",
3371
- "normalized_score": null,
3372
  "metric_key": "mae",
3373
- "source": null,
3374
  "scope": "multi_episode_128_metadata_baseline",
3375
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3376
  },
3377
  {
3378
  "task_number": 20,
 
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 133,
9
  "series": [
10
  {
11
  "id": "minimal",
 
64
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
+ "scored_task_count": 13,
68
+ "covered_task_count": 13,
69
  "proxy_scored_task_count": 0,
70
+ "scoreless_task_count": 7,
71
+ "unsupported_task_count": 7,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
+ "scored": 13,
75
+ "unsupported_without_required_target": 7
 
76
  },
77
+ "coverage_fraction": 0.65,
78
  "result_record_fraction": 1.0
79
  },
80
  {
 
88
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
89
  "plotted_as": "colored point overlay",
90
  "result_record_count": 20,
91
+ "scored_task_count": 13,
92
+ "covered_task_count": 13,
93
  "proxy_scored_task_count": 0,
94
+ "scoreless_task_count": 7,
95
+ "unsupported_task_count": 7,
96
  "not_evaluated_task_count": 0,
97
  "status_counts": {
98
+ "not_supported_by_metadata_only_package": 7,
99
+ "scored": 13
100
  },
101
+ "coverage_fraction": 0.65,
102
  "result_record_fraction": 1.0
103
  },
104
  {
 
2209
  "task_label": "Long-Horizon Next-Action Forecasting",
2210
  "series_id": "metadata128_simple",
2211
  "method": "128ep Metadata Simple",
2212
+ "status": "scored",
2213
+ "status_label": "scored",
2214
+ "scored": true,
2215
  "proxy_scored": false,
2216
+ "raw": 0.004579592783699693,
2217
+ "raw_text": "0.0046",
2218
+ "normalized_score": 0.004579592783699693,
2219
  "metric_key": "macro_f1",
2220
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
2221
  "scope": "multi_episode_128_metadata_baseline",
2222
+ "reason": null
2223
  },
2224
  {
2225
  "task_number": 13,
 
2227
  "task_label": "Long-Horizon Next-Action Forecasting",
2228
  "series_id": "metadata128_neural_mlp",
2229
  "method": "128ep Metadata NN",
2230
+ "status": "scored",
2231
+ "status_label": "scored",
2232
+ "scored": true,
2233
  "proxy_scored": false,
2234
+ "raw": 0.0029821307969142615,
2235
+ "raw_text": "0.0030",
2236
+ "normalized_score": 0.0029821307969142615,
2237
  "metric_key": "macro_f1",
2238
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
2239
  "scope": "multi_episode_128_metadata_baseline",
2240
+ "reason": null
2241
  },
2242
  {
2243
  "task_number": 13,
 
2371
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2372
  "series_id": "metadata128_simple",
2373
  "method": "128ep Metadata Simple",
2374
+ "status": "scored",
2375
+ "status_label": "scored",
2376
+ "scored": true,
2377
  "proxy_scored": false,
2378
+ "raw": 0.0001206030150753769,
2379
+ "raw_text": "0.0001",
2380
+ "normalized_score": 0.0001206030150753769,
2381
  "metric_key": "macro_f1",
2382
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
2383
  "scope": "multi_episode_128_metadata_baseline",
2384
+ "reason": null
2385
  },
2386
  {
2387
  "task_number": 14,
 
2389
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2390
  "series_id": "metadata128_neural_mlp",
2391
  "method": "128ep Metadata NN",
2392
+ "status": "scored",
2393
+ "status_label": "scored",
2394
+ "scored": true,
2395
  "proxy_scored": false,
2396
+ "raw": 2.086049543676662e-05,
2397
+ "raw_text": "0.0000",
2398
+ "normalized_score": 2.086049543676662e-05,
2399
  "metric_key": "macro_f1",
2400
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
2401
  "scope": "multi_episode_128_metadata_baseline",
2402
+ "reason": null
2403
  },
2404
  {
2405
  "task_number": 14,
 
2533
  "task_label": "Interaction Text Prediction",
2534
  "series_id": "metadata128_simple",
2535
  "method": "128ep Metadata Simple",
2536
+ "status": "unsupported_without_required_target",
2537
+ "status_label": "unsupported",
2538
  "scored": false,
2539
  "proxy_scored": false,
2540
  "raw": null,
2541
  "raw_text": "n/a",
2542
  "normalized_score": null,
2543
  "metric_key": "macro_f1",
2544
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
2545
  "scope": "multi_episode_128_metadata_baseline",
2546
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
2547
  },
2548
  {
2549
  "task_number": 15,
 
2695
  "task_label": "Action-Object Relation Prediction",
2696
  "series_id": "metadata128_simple",
2697
  "method": "128ep Metadata Simple",
2698
+ "status": "scored",
2699
+ "status_label": "scored",
2700
+ "scored": true,
2701
  "proxy_scored": false,
2702
+ "raw": 0.0,
2703
+ "raw_text": "0.0000",
2704
+ "normalized_score": 0.0,
2705
  "metric_key": "macro_f1",
2706
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
2707
  "scope": "multi_episode_128_metadata_baseline",
2708
+ "reason": null
2709
  },
2710
  {
2711
  "task_number": 16,
 
2713
  "task_label": "Action-Object Relation Prediction",
2714
  "series_id": "metadata128_neural_mlp",
2715
  "method": "128ep Metadata NN",
2716
+ "status": "scored",
2717
+ "status_label": "scored",
2718
+ "scored": true,
2719
  "proxy_scored": false,
2720
+ "raw": 0.0,
2721
+ "raw_text": "0.0000",
2722
+ "normalized_score": 0.0,
2723
  "metric_key": "macro_f1",
2724
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
2725
  "scope": "multi_episode_128_metadata_baseline",
2726
+ "reason": null
2727
  },
2728
  {
2729
  "task_number": 16,
 
2857
  "task_label": "Future Object-Set Forecasting",
2858
  "series_id": "metadata128_simple",
2859
  "method": "128ep Metadata Simple",
2860
+ "status": "scored",
2861
+ "status_label": "scored",
2862
+ "scored": true,
2863
  "proxy_scored": false,
2864
+ "raw": 0.17656983343047333,
2865
+ "raw_text": "0.1766",
2866
+ "normalized_score": 0.17656983343047333,
2867
  "metric_key": "micro_f1",
2868
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2869
  "scope": "multi_episode_128_metadata_baseline",
2870
+ "reason": null
2871
  },
2872
  {
2873
  "task_number": 17,
 
2875
  "task_label": "Future Object-Set Forecasting",
2876
  "series_id": "metadata128_neural_mlp",
2877
  "method": "128ep Metadata NN",
2878
+ "status": "scored",
2879
+ "status_label": "scored",
2880
+ "scored": true,
2881
  "proxy_scored": false,
2882
+ "raw": 0.17418550827844048,
2883
+ "raw_text": "0.1742",
2884
+ "normalized_score": 0.17418550827844048,
2885
  "metric_key": "micro_f1",
2886
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2887
  "scope": "multi_episode_128_metadata_baseline",
2888
+ "reason": null
2889
  },
2890
  {
2891
  "task_number": 17,
 
3019
  "task_label": "IMU-to-Hand Pose Reconstruction",
3020
  "series_id": "metadata128_simple",
3021
  "method": "128ep Metadata Simple",
3022
+ "status": "unsupported_without_required_target",
3023
+ "status_label": "unsupported",
3024
  "scored": false,
3025
  "proxy_scored": false,
3026
  "raw": null,
3027
  "raw_text": "n/a",
3028
  "normalized_score": null,
3029
  "metric_key": "mae",
3030
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
3031
  "scope": "multi_episode_128_metadata_baseline",
3032
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
3033
  },
3034
  {
3035
  "task_number": 18,
 
3181
  "task_label": "Camera-View Synchronization Retrieval",
3182
  "series_id": "metadata128_simple",
3183
  "method": "128ep Metadata Simple",
3184
+ "status": "unsupported_without_required_target",
3185
+ "status_label": "unsupported",
3186
  "scored": false,
3187
  "proxy_scored": false,
3188
  "raw": null,
3189
  "raw_text": "n/a",
3190
  "normalized_score": null,
3191
  "metric_key": "mrr",
3192
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
3193
  "scope": "multi_episode_128_metadata_baseline",
3194
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
3195
  },
3196
  {
3197
  "task_number": 19,
 
3343
  "task_label": "Time-to-Next-Transition Regression",
3344
  "series_id": "metadata128_simple",
3345
  "method": "128ep Metadata Simple",
3346
+ "status": "scored",
3347
+ "status_label": "scored",
3348
+ "scored": true,
3349
  "proxy_scored": false,
3350
+ "raw": 624.8108520507812,
3351
+ "raw_text": "624.81",
3352
+ "normalized_score": 0.016864874132806403,
3353
  "metric_key": "mae",
3354
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
3355
  "scope": "multi_episode_128_metadata_baseline",
3356
+ "reason": null
3357
  },
3358
  {
3359
  "task_number": 20,
 
3361
  "task_label": "Time-to-Next-Transition Regression",
3362
  "series_id": "metadata128_neural_mlp",
3363
  "method": "128ep Metadata NN",
3364
+ "status": "scored",
3365
+ "status_label": "scored",
3366
+ "scored": true,
3367
  "proxy_scored": false,
3368
+ "raw": 41.4664421081543,
3369
+ "raw_text": "41.47",
3370
+ "normalized_score": 0.25411768748242325,
3371
  "metric_key": "mae",
3372
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
3373
  "scope": "multi_episode_128_metadata_baseline",
3374
+ "reason": null
3375
  },
3376
  {
3377
  "task_number": 20,
docs/data/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:18:04+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:25+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
docs/data/unified_task_model_radar.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 123,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -73,18 +73,17 @@
73
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
- "scored_task_count": 8,
77
- "covered_task_count": 8,
78
  "proxy_scored_task_count": 0,
79
- "scoreless_task_count": 12,
80
- "unsupported_task_count": 12,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
- "not_supported_by_metadata_only_package": 8,
84
- "scored": 8,
85
- "unsupported_without_required_target": 4
86
  },
87
- "coverage_fraction": 0.4,
88
  "result_record_fraction": 1.0
89
  },
90
  {
@@ -98,17 +97,17 @@
98
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
99
  "plotted_as": "colored point overlay",
100
  "result_record_count": 20,
101
- "scored_task_count": 8,
102
- "covered_task_count": 8,
103
  "proxy_scored_task_count": 0,
104
- "scoreless_task_count": 12,
105
- "unsupported_task_count": 12,
106
  "not_evaluated_task_count": 0,
107
  "status_counts": {
108
- "not_supported_by_metadata_only_package": 12,
109
- "scored": 8
110
  },
111
- "coverage_fraction": 0.4,
112
  "result_record_fraction": 1.0
113
  },
114
  {
@@ -1608,6 +1607,28 @@
1608
  "raw_text": "0.0023",
1609
  "status_label": "scored"
1610
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1611
  "raw128_simple": {
1612
  "raw": 0.0024280172369056294,
1613
  "metric_key": "macro_f1",
@@ -1630,28 +1651,6 @@
1630
  "raw_text": "0.0011",
1631
  "status_label": "scored"
1632
  },
1633
- "metadata128_simple": {
1634
- "raw": null,
1635
- "metric_key": "macro_f1",
1636
- "source": null,
1637
- "scope": "multi_episode_128_metadata_baseline",
1638
- "status": "not_supported_by_metadata_only_package",
1639
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1640
- "normalized_score": null,
1641
- "raw_text": "n/a",
1642
- "status_label": "not supported"
1643
- },
1644
- "metadata128_neural_mlp": {
1645
- "raw": null,
1646
- "metric_key": "macro_f1",
1647
- "source": null,
1648
- "scope": "multi_episode_128_metadata_baseline",
1649
- "status": "not_supported_by_metadata_only_package",
1650
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1651
- "normalized_score": null,
1652
- "raw_text": "n/a",
1653
- "status_label": "not supported"
1654
- },
1655
  "cosmos3_super_reasoner": {
1656
  "raw": null,
1657
  "metric_key": "macro_f1",
@@ -1719,6 +1718,28 @@
1719
  "raw_text": "0.0042",
1720
  "status_label": "scored"
1721
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1722
  "raw128_simple": {
1723
  "raw": 0.0,
1724
  "metric_key": "macro_f1",
@@ -1741,28 +1762,6 @@
1741
  "raw_text": "0.0000",
1742
  "status_label": "scored"
1743
  },
1744
- "metadata128_simple": {
1745
- "raw": null,
1746
- "metric_key": "macro_f1",
1747
- "source": null,
1748
- "scope": "multi_episode_128_metadata_baseline",
1749
- "status": "not_supported_by_metadata_only_package",
1750
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1751
- "normalized_score": null,
1752
- "raw_text": "n/a",
1753
- "status_label": "not supported"
1754
- },
1755
- "metadata128_neural_mlp": {
1756
- "raw": null,
1757
- "metric_key": "macro_f1",
1758
- "source": null,
1759
- "scope": "multi_episode_128_metadata_baseline",
1760
- "status": "not_supported_by_metadata_only_package",
1761
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1762
- "normalized_score": null,
1763
- "raw_text": "n/a",
1764
- "status_label": "not supported"
1765
- },
1766
  "cosmos3_super_reasoner": {
1767
  "raw": null,
1768
  "metric_key": "macro_f1",
@@ -1819,6 +1818,17 @@
1819
  "raw_text": "0.0381",
1820
  "status_label": "scored"
1821
  },
 
 
 
 
 
 
 
 
 
 
 
1822
  "raw128_simple": {
1823
  "raw": 0.012611998261547169,
1824
  "metric_key": "macro_f1",
@@ -1841,17 +1851,6 @@
1841
  "raw_text": "0.0098",
1842
  "status_label": "proxy scored"
1843
  },
1844
- "metadata128_simple": {
1845
- "raw": null,
1846
- "metric_key": "macro_f1",
1847
- "source": null,
1848
- "scope": "multi_episode_128_metadata_baseline",
1849
- "status": "not_supported_by_metadata_only_package",
1850
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1851
- "normalized_score": null,
1852
- "raw_text": "n/a",
1853
- "status_label": "not supported"
1854
- },
1855
  "metadata128_neural_mlp": {
1856
  "raw": null,
1857
  "metric_key": "macro_f1",
@@ -1952,6 +1951,28 @@
1952
  "raw_text": "0.0000",
1953
  "status_label": "scored"
1954
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1955
  "raw128_simple": {
1956
  "raw": 0.0,
1957
  "metric_key": "macro_f1",
@@ -1974,28 +1995,6 @@
1974
  "raw_text": "0.0000",
1975
  "status_label": "scored"
1976
  },
1977
- "metadata128_simple": {
1978
- "raw": null,
1979
- "metric_key": "macro_f1",
1980
- "source": null,
1981
- "scope": "multi_episode_128_metadata_baseline",
1982
- "status": "not_supported_by_metadata_only_package",
1983
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1984
- "normalized_score": null,
1985
- "raw_text": "n/a",
1986
- "status_label": "not supported"
1987
- },
1988
- "metadata128_neural_mlp": {
1989
- "raw": null,
1990
- "metric_key": "macro_f1",
1991
- "source": null,
1992
- "scope": "multi_episode_128_metadata_baseline",
1993
- "status": "not_supported_by_metadata_only_package",
1994
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1995
- "normalized_score": null,
1996
- "raw_text": "n/a",
1997
- "status_label": "not supported"
1998
- },
1999
  "cosmos3_nano_future_window": {
2000
  "raw": null,
2001
  "metric_key": "macro_f1",
@@ -2052,6 +2051,28 @@
2052
  "raw_text": "0.1659",
2053
  "status_label": "scored"
2054
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2055
  "raw128_simple": {
2056
  "raw": 0.06469493412657774,
2057
  "metric_key": "micro_f1",
@@ -2074,28 +2095,6 @@
2074
  "raw_text": "0.1752",
2075
  "status_label": "scored"
2076
  },
2077
- "metadata128_simple": {
2078
- "raw": null,
2079
- "metric_key": "micro_f1",
2080
- "source": null,
2081
- "scope": "multi_episode_128_metadata_baseline",
2082
- "status": "not_supported_by_metadata_only_package",
2083
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2084
- "normalized_score": null,
2085
- "raw_text": "n/a",
2086
- "status_label": "not supported"
2087
- },
2088
- "metadata128_neural_mlp": {
2089
- "raw": null,
2090
- "metric_key": "micro_f1",
2091
- "source": null,
2092
- "scope": "multi_episode_128_metadata_baseline",
2093
- "status": "not_supported_by_metadata_only_package",
2094
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2095
- "normalized_score": null,
2096
- "raw_text": "n/a",
2097
- "status_label": "not supported"
2098
- },
2099
  "cosmos3_super_reasoner": {
2100
  "raw": null,
2101
  "metric_key": "micro_f1",
@@ -2152,6 +2151,17 @@
2152
  "raw_text": "0.0426",
2153
  "status_label": "scored"
2154
  },
 
 
 
 
 
 
 
 
 
 
 
2155
  "raw128_simple": {
2156
  "raw": 0.22941437363624573,
2157
  "metric_key": "mae",
@@ -2174,17 +2184,6 @@
2174
  "raw_text": "0.2530",
2175
  "status_label": "scored"
2176
  },
2177
- "metadata128_simple": {
2178
- "raw": null,
2179
- "metric_key": "mae",
2180
- "source": null,
2181
- "scope": "multi_episode_128_metadata_baseline",
2182
- "status": "not_supported_by_metadata_only_package",
2183
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2184
- "normalized_score": null,
2185
- "raw_text": "n/a",
2186
- "status_label": "not supported"
2187
- },
2188
  "metadata128_neural_mlp": {
2189
  "raw": null,
2190
  "metric_key": "mae",
@@ -2263,6 +2262,17 @@
2263
  "raw_text": "0.2409",
2264
  "status_label": "scored"
2265
  },
 
 
 
 
 
 
 
 
 
 
 
2266
  "raw128_simple": {
2267
  "raw": 0.0026625150348991156,
2268
  "metric_key": "mrr",
@@ -2285,17 +2295,6 @@
2285
  "raw_text": "0.0025",
2286
  "status_label": "proxy scored"
2287
  },
2288
- "metadata128_simple": {
2289
- "raw": null,
2290
- "metric_key": "mrr",
2291
- "source": null,
2292
- "scope": "multi_episode_128_metadata_baseline",
2293
- "status": "not_supported_by_metadata_only_package",
2294
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2295
- "normalized_score": null,
2296
- "raw_text": "n/a",
2297
- "status_label": "not supported"
2298
- },
2299
  "metadata128_neural_mlp": {
2300
  "raw": null,
2301
  "metric_key": "mrr",
@@ -2385,6 +2384,28 @@
2385
  "raw_text": "134.07",
2386
  "status_label": "scored"
2387
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2388
  "raw128_simple": {
2389
  "raw": 52.32759475708008,
2390
  "metric_key": "mae",
@@ -2407,28 +2428,6 @@
2407
  "raw_text": "42.37",
2408
  "status_label": "scored"
2409
  },
2410
- "metadata128_simple": {
2411
- "raw": null,
2412
- "metric_key": "mae",
2413
- "source": null,
2414
- "scope": "multi_episode_128_metadata_baseline",
2415
- "status": "not_supported_by_metadata_only_package",
2416
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2417
- "normalized_score": null,
2418
- "raw_text": "n/a",
2419
- "status_label": "not supported"
2420
- },
2421
- "metadata128_neural_mlp": {
2422
- "raw": null,
2423
- "metric_key": "mae",
2424
- "source": null,
2425
- "scope": "multi_episode_128_metadata_baseline",
2426
- "status": "not_supported_by_metadata_only_package",
2427
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2428
- "normalized_score": null,
2429
- "raw_text": "n/a",
2430
- "status_label": "not supported"
2431
- },
2432
  "cosmos3_super_reasoner": {
2433
  "raw": null,
2434
  "metric_key": "mae",
@@ -2459,7 +2458,7 @@
2459
  "id": "metadata128_simple",
2460
  "title": "128ep Metadata Simple",
2461
  "status": "a100_rerun_pass",
2462
- "coverage": "20 records / 8 scored JSONL-supported axes",
2463
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2464
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2465
  },
@@ -2467,7 +2466,7 @@
2467
  "id": "metadata128_neural_mlp",
2468
  "title": "128ep Metadata NN",
2469
  "status": "a100_rerun_pass",
2470
- "coverage": "20 records / 8 scored JSONL-supported axes",
2471
  "headline": "compact MLP heads over metadata/text features",
2472
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2473
  },
@@ -4508,17 +4507,17 @@
4508
  "task_label": "Long-Horizon Next-Action Forecasting",
4509
  "series_id": "metadata128_simple",
4510
  "method": "128ep Metadata Simple",
4511
- "status": "not_supported_by_metadata_only_package",
4512
- "status_label": "not supported",
4513
- "scored": false,
4514
  "proxy_scored": false,
4515
- "raw": null,
4516
- "raw_text": "n/a",
4517
- "normalized_score": null,
4518
  "metric_key": "macro_f1",
4519
- "source": null,
4520
  "scope": "multi_episode_128_metadata_baseline",
4521
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4522
  },
4523
  {
4524
  "task_number": 13,
@@ -4526,17 +4525,17 @@
4526
  "task_label": "Long-Horizon Next-Action Forecasting",
4527
  "series_id": "metadata128_neural_mlp",
4528
  "method": "128ep Metadata NN",
4529
- "status": "not_supported_by_metadata_only_package",
4530
- "status_label": "not supported",
4531
- "scored": false,
4532
  "proxy_scored": false,
4533
- "raw": null,
4534
- "raw_text": "n/a",
4535
- "normalized_score": null,
4536
  "metric_key": "macro_f1",
4537
- "source": null,
4538
  "scope": "multi_episode_128_metadata_baseline",
4539
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4540
  },
4541
  {
4542
  "task_number": 13,
@@ -4670,17 +4669,17 @@
4670
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4671
  "series_id": "metadata128_simple",
4672
  "method": "128ep Metadata Simple",
4673
- "status": "not_supported_by_metadata_only_package",
4674
- "status_label": "not supported",
4675
- "scored": false,
4676
  "proxy_scored": false,
4677
- "raw": null,
4678
- "raw_text": "n/a",
4679
- "normalized_score": null,
4680
  "metric_key": "macro_f1",
4681
- "source": null,
4682
  "scope": "multi_episode_128_metadata_baseline",
4683
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4684
  },
4685
  {
4686
  "task_number": 14,
@@ -4688,17 +4687,17 @@
4688
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4689
  "series_id": "metadata128_neural_mlp",
4690
  "method": "128ep Metadata NN",
4691
- "status": "not_supported_by_metadata_only_package",
4692
- "status_label": "not supported",
4693
- "scored": false,
4694
  "proxy_scored": false,
4695
- "raw": null,
4696
- "raw_text": "n/a",
4697
- "normalized_score": null,
4698
  "metric_key": "macro_f1",
4699
- "source": null,
4700
  "scope": "multi_episode_128_metadata_baseline",
4701
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4702
  },
4703
  {
4704
  "task_number": 14,
@@ -4832,17 +4831,17 @@
4832
  "task_label": "Interaction Text Prediction",
4833
  "series_id": "metadata128_simple",
4834
  "method": "128ep Metadata Simple",
4835
- "status": "not_supported_by_metadata_only_package",
4836
- "status_label": "not supported",
4837
  "scored": false,
4838
  "proxy_scored": false,
4839
  "raw": null,
4840
  "raw_text": "n/a",
4841
  "normalized_score": null,
4842
  "metric_key": "macro_f1",
4843
- "source": null,
4844
  "scope": "multi_episode_128_metadata_baseline",
4845
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4846
  },
4847
  {
4848
  "task_number": 15,
@@ -4994,17 +4993,17 @@
4994
  "task_label": "Action-Object Relation Prediction",
4995
  "series_id": "metadata128_simple",
4996
  "method": "128ep Metadata Simple",
4997
- "status": "not_supported_by_metadata_only_package",
4998
- "status_label": "not supported",
4999
- "scored": false,
5000
  "proxy_scored": false,
5001
- "raw": null,
5002
- "raw_text": "n/a",
5003
- "normalized_score": null,
5004
  "metric_key": "macro_f1",
5005
- "source": null,
5006
  "scope": "multi_episode_128_metadata_baseline",
5007
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5008
  },
5009
  {
5010
  "task_number": 16,
@@ -5012,17 +5011,17 @@
5012
  "task_label": "Action-Object Relation Prediction",
5013
  "series_id": "metadata128_neural_mlp",
5014
  "method": "128ep Metadata NN",
5015
- "status": "not_supported_by_metadata_only_package",
5016
- "status_label": "not supported",
5017
- "scored": false,
5018
  "proxy_scored": false,
5019
- "raw": null,
5020
- "raw_text": "n/a",
5021
- "normalized_score": null,
5022
  "metric_key": "macro_f1",
5023
- "source": null,
5024
  "scope": "multi_episode_128_metadata_baseline",
5025
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5026
  },
5027
  {
5028
  "task_number": 16,
@@ -5156,17 +5155,17 @@
5156
  "task_label": "Future Object-Set Forecasting",
5157
  "series_id": "metadata128_simple",
5158
  "method": "128ep Metadata Simple",
5159
- "status": "not_supported_by_metadata_only_package",
5160
- "status_label": "not supported",
5161
- "scored": false,
5162
  "proxy_scored": false,
5163
- "raw": null,
5164
- "raw_text": "n/a",
5165
- "normalized_score": null,
5166
  "metric_key": "micro_f1",
5167
- "source": null,
5168
  "scope": "multi_episode_128_metadata_baseline",
5169
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5170
  },
5171
  {
5172
  "task_number": 17,
@@ -5174,17 +5173,17 @@
5174
  "task_label": "Future Object-Set Forecasting",
5175
  "series_id": "metadata128_neural_mlp",
5176
  "method": "128ep Metadata NN",
5177
- "status": "not_supported_by_metadata_only_package",
5178
- "status_label": "not supported",
5179
- "scored": false,
5180
  "proxy_scored": false,
5181
- "raw": null,
5182
- "raw_text": "n/a",
5183
- "normalized_score": null,
5184
  "metric_key": "micro_f1",
5185
- "source": null,
5186
  "scope": "multi_episode_128_metadata_baseline",
5187
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5188
  },
5189
  {
5190
  "task_number": 17,
@@ -5318,17 +5317,17 @@
5318
  "task_label": "IMU-to-Hand Pose Reconstruction",
5319
  "series_id": "metadata128_simple",
5320
  "method": "128ep Metadata Simple",
5321
- "status": "not_supported_by_metadata_only_package",
5322
- "status_label": "not supported",
5323
  "scored": false,
5324
  "proxy_scored": false,
5325
  "raw": null,
5326
  "raw_text": "n/a",
5327
  "normalized_score": null,
5328
  "metric_key": "mae",
5329
- "source": null,
5330
  "scope": "multi_episode_128_metadata_baseline",
5331
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5332
  },
5333
  {
5334
  "task_number": 18,
@@ -5480,17 +5479,17 @@
5480
  "task_label": "Camera-View Synchronization Retrieval",
5481
  "series_id": "metadata128_simple",
5482
  "method": "128ep Metadata Simple",
5483
- "status": "not_supported_by_metadata_only_package",
5484
- "status_label": "not supported",
5485
  "scored": false,
5486
  "proxy_scored": false,
5487
  "raw": null,
5488
  "raw_text": "n/a",
5489
  "normalized_score": null,
5490
  "metric_key": "mrr",
5491
- "source": null,
5492
  "scope": "multi_episode_128_metadata_baseline",
5493
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5494
  },
5495
  {
5496
  "task_number": 19,
@@ -5642,17 +5641,17 @@
5642
  "task_label": "Time-to-Next-Transition Regression",
5643
  "series_id": "metadata128_simple",
5644
  "method": "128ep Metadata Simple",
5645
- "status": "not_supported_by_metadata_only_package",
5646
- "status_label": "not supported",
5647
- "scored": false,
5648
  "proxy_scored": false,
5649
- "raw": null,
5650
- "raw_text": "n/a",
5651
- "normalized_score": null,
5652
  "metric_key": "mae",
5653
- "source": null,
5654
  "scope": "multi_episode_128_metadata_baseline",
5655
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5656
  },
5657
  {
5658
  "task_number": 20,
@@ -5660,17 +5659,17 @@
5660
  "task_label": "Time-to-Next-Transition Regression",
5661
  "series_id": "metadata128_neural_mlp",
5662
  "method": "128ep Metadata NN",
5663
- "status": "not_supported_by_metadata_only_package",
5664
- "status_label": "not supported",
5665
- "scored": false,
5666
  "proxy_scored": false,
5667
- "raw": null,
5668
- "raw_text": "n/a",
5669
- "normalized_score": null,
5670
  "metric_key": "mae",
5671
- "source": null,
5672
  "scope": "multi_episode_128_metadata_baseline",
5673
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5674
  },
5675
  {
5676
  "task_number": 20,
 
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 133,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
 
73
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
+ "scored_task_count": 13,
77
+ "covered_task_count": 13,
78
  "proxy_scored_task_count": 0,
79
+ "scoreless_task_count": 7,
80
+ "unsupported_task_count": 7,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
+ "scored": 13,
84
+ "unsupported_without_required_target": 7
 
85
  },
86
+ "coverage_fraction": 0.65,
87
  "result_record_fraction": 1.0
88
  },
89
  {
 
97
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
98
  "plotted_as": "colored point overlay",
99
  "result_record_count": 20,
100
+ "scored_task_count": 13,
101
+ "covered_task_count": 13,
102
  "proxy_scored_task_count": 0,
103
+ "scoreless_task_count": 7,
104
+ "unsupported_task_count": 7,
105
  "not_evaluated_task_count": 0,
106
  "status_counts": {
107
+ "not_supported_by_metadata_only_package": 7,
108
+ "scored": 13
109
  },
110
+ "coverage_fraction": 0.65,
111
  "result_record_fraction": 1.0
112
  },
113
  {
 
1607
  "raw_text": "0.0023",
1608
  "status_label": "scored"
1609
  },
1610
+ "metadata128_simple": {
1611
+ "raw": 0.004579592783699693,
1612
+ "metric_key": "macro_f1",
1613
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1614
+ "scope": "multi_episode_128_metadata_baseline",
1615
+ "status": "scored",
1616
+ "reason": null,
1617
+ "normalized_score": 0.004579592783699693,
1618
+ "raw_text": "0.0046",
1619
+ "status_label": "scored"
1620
+ },
1621
+ "metadata128_neural_mlp": {
1622
+ "raw": 0.0029821307969142615,
1623
+ "metric_key": "macro_f1",
1624
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1625
+ "scope": "multi_episode_128_metadata_baseline",
1626
+ "status": "scored",
1627
+ "reason": null,
1628
+ "normalized_score": 0.0029821307969142615,
1629
+ "raw_text": "0.0030",
1630
+ "status_label": "scored"
1631
+ },
1632
  "raw128_simple": {
1633
  "raw": 0.0024280172369056294,
1634
  "metric_key": "macro_f1",
 
1651
  "raw_text": "0.0011",
1652
  "status_label": "scored"
1653
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1654
  "cosmos3_super_reasoner": {
1655
  "raw": null,
1656
  "metric_key": "macro_f1",
 
1718
  "raw_text": "0.0042",
1719
  "status_label": "scored"
1720
  },
1721
+ "metadata128_simple": {
1722
+ "raw": 0.0001206030150753769,
1723
+ "metric_key": "macro_f1",
1724
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1725
+ "scope": "multi_episode_128_metadata_baseline",
1726
+ "status": "scored",
1727
+ "reason": null,
1728
+ "normalized_score": 0.0001206030150753769,
1729
+ "raw_text": "0.0001",
1730
+ "status_label": "scored"
1731
+ },
1732
+ "metadata128_neural_mlp": {
1733
+ "raw": 2.086049543676662e-05,
1734
+ "metric_key": "macro_f1",
1735
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1736
+ "scope": "multi_episode_128_metadata_baseline",
1737
+ "status": "scored",
1738
+ "reason": null,
1739
+ "normalized_score": 2.086049543676662e-05,
1740
+ "raw_text": "0.0000",
1741
+ "status_label": "scored"
1742
+ },
1743
  "raw128_simple": {
1744
  "raw": 0.0,
1745
  "metric_key": "macro_f1",
 
1762
  "raw_text": "0.0000",
1763
  "status_label": "scored"
1764
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1765
  "cosmos3_super_reasoner": {
1766
  "raw": null,
1767
  "metric_key": "macro_f1",
 
1818
  "raw_text": "0.0381",
1819
  "status_label": "scored"
1820
  },
1821
+ "metadata128_simple": {
1822
+ "raw": null,
1823
+ "metric_key": "macro_f1",
1824
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1825
+ "scope": "multi_episode_128_metadata_baseline",
1826
+ "status": "unsupported_without_required_target",
1827
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1828
+ "normalized_score": null,
1829
+ "raw_text": "n/a",
1830
+ "status_label": "unsupported"
1831
+ },
1832
  "raw128_simple": {
1833
  "raw": 0.012611998261547169,
1834
  "metric_key": "macro_f1",
 
1851
  "raw_text": "0.0098",
1852
  "status_label": "proxy scored"
1853
  },
 
 
 
 
 
 
 
 
 
 
 
1854
  "metadata128_neural_mlp": {
1855
  "raw": null,
1856
  "metric_key": "macro_f1",
 
1951
  "raw_text": "0.0000",
1952
  "status_label": "scored"
1953
  },
1954
+ "metadata128_simple": {
1955
+ "raw": 0.0,
1956
+ "metric_key": "macro_f1",
1957
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1958
+ "scope": "multi_episode_128_metadata_baseline",
1959
+ "status": "scored",
1960
+ "reason": null,
1961
+ "normalized_score": 0.0,
1962
+ "raw_text": "0.0000",
1963
+ "status_label": "scored"
1964
+ },
1965
+ "metadata128_neural_mlp": {
1966
+ "raw": 0.0,
1967
+ "metric_key": "macro_f1",
1968
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1969
+ "scope": "multi_episode_128_metadata_baseline",
1970
+ "status": "scored",
1971
+ "reason": null,
1972
+ "normalized_score": 0.0,
1973
+ "raw_text": "0.0000",
1974
+ "status_label": "scored"
1975
+ },
1976
  "raw128_simple": {
1977
  "raw": 0.0,
1978
  "metric_key": "macro_f1",
 
1995
  "raw_text": "0.0000",
1996
  "status_label": "scored"
1997
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1998
  "cosmos3_nano_future_window": {
1999
  "raw": null,
2000
  "metric_key": "macro_f1",
 
2051
  "raw_text": "0.1659",
2052
  "status_label": "scored"
2053
  },
2054
+ "metadata128_simple": {
2055
+ "raw": 0.17656983343047333,
2056
+ "metric_key": "micro_f1",
2057
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2058
+ "scope": "multi_episode_128_metadata_baseline",
2059
+ "status": "scored",
2060
+ "reason": null,
2061
+ "normalized_score": 0.17656983343047333,
2062
+ "raw_text": "0.1766",
2063
+ "status_label": "scored"
2064
+ },
2065
+ "metadata128_neural_mlp": {
2066
+ "raw": 0.17418550827844048,
2067
+ "metric_key": "micro_f1",
2068
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2069
+ "scope": "multi_episode_128_metadata_baseline",
2070
+ "status": "scored",
2071
+ "reason": null,
2072
+ "normalized_score": 0.17418550827844048,
2073
+ "raw_text": "0.1742",
2074
+ "status_label": "scored"
2075
+ },
2076
  "raw128_simple": {
2077
  "raw": 0.06469493412657774,
2078
  "metric_key": "micro_f1",
 
2095
  "raw_text": "0.1752",
2096
  "status_label": "scored"
2097
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2098
  "cosmos3_super_reasoner": {
2099
  "raw": null,
2100
  "metric_key": "micro_f1",
 
2151
  "raw_text": "0.0426",
2152
  "status_label": "scored"
2153
  },
2154
+ "metadata128_simple": {
2155
+ "raw": null,
2156
+ "metric_key": "mae",
2157
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
2158
+ "scope": "multi_episode_128_metadata_baseline",
2159
+ "status": "unsupported_without_required_target",
2160
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
2161
+ "normalized_score": null,
2162
+ "raw_text": "n/a",
2163
+ "status_label": "unsupported"
2164
+ },
2165
  "raw128_simple": {
2166
  "raw": 0.22941437363624573,
2167
  "metric_key": "mae",
 
2184
  "raw_text": "0.2530",
2185
  "status_label": "scored"
2186
  },
 
 
 
 
 
 
 
 
 
 
 
2187
  "metadata128_neural_mlp": {
2188
  "raw": null,
2189
  "metric_key": "mae",
 
2262
  "raw_text": "0.2409",
2263
  "status_label": "scored"
2264
  },
2265
+ "metadata128_simple": {
2266
+ "raw": null,
2267
+ "metric_key": "mrr",
2268
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
2269
+ "scope": "multi_episode_128_metadata_baseline",
2270
+ "status": "unsupported_without_required_target",
2271
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
2272
+ "normalized_score": null,
2273
+ "raw_text": "n/a",
2274
+ "status_label": "unsupported"
2275
+ },
2276
  "raw128_simple": {
2277
  "raw": 0.0026625150348991156,
2278
  "metric_key": "mrr",
 
2295
  "raw_text": "0.0025",
2296
  "status_label": "proxy scored"
2297
  },
 
 
 
 
 
 
 
 
 
 
 
2298
  "metadata128_neural_mlp": {
2299
  "raw": null,
2300
  "metric_key": "mrr",
 
2384
  "raw_text": "134.07",
2385
  "status_label": "scored"
2386
  },
2387
+ "metadata128_simple": {
2388
+ "raw": 624.8108520507812,
2389
+ "metric_key": "mae",
2390
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
2391
+ "scope": "multi_episode_128_metadata_baseline",
2392
+ "status": "scored",
2393
+ "reason": null,
2394
+ "normalized_score": 0.016864874132806403,
2395
+ "raw_text": "624.81",
2396
+ "status_label": "scored"
2397
+ },
2398
+ "metadata128_neural_mlp": {
2399
+ "raw": 41.4664421081543,
2400
+ "metric_key": "mae",
2401
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
2402
+ "scope": "multi_episode_128_metadata_baseline",
2403
+ "status": "scored",
2404
+ "reason": null,
2405
+ "normalized_score": 0.25411768748242325,
2406
+ "raw_text": "41.47",
2407
+ "status_label": "scored"
2408
+ },
2409
  "raw128_simple": {
2410
  "raw": 52.32759475708008,
2411
  "metric_key": "mae",
 
2428
  "raw_text": "42.37",
2429
  "status_label": "scored"
2430
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2431
  "cosmos3_super_reasoner": {
2432
  "raw": null,
2433
  "metric_key": "mae",
 
2458
  "id": "metadata128_simple",
2459
  "title": "128ep Metadata Simple",
2460
  "status": "a100_rerun_pass",
2461
+ "coverage": "20 records / 13 scored JSONL-supported axes",
2462
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2463
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2464
  },
 
2466
  "id": "metadata128_neural_mlp",
2467
  "title": "128ep Metadata NN",
2468
  "status": "a100_rerun_pass",
2469
+ "coverage": "20 records / 13 scored JSONL-supported axes",
2470
  "headline": "compact MLP heads over metadata/text features",
2471
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2472
  },
 
4507
  "task_label": "Long-Horizon Next-Action Forecasting",
4508
  "series_id": "metadata128_simple",
4509
  "method": "128ep Metadata Simple",
4510
+ "status": "scored",
4511
+ "status_label": "scored",
4512
+ "scored": true,
4513
  "proxy_scored": false,
4514
+ "raw": 0.004579592783699693,
4515
+ "raw_text": "0.0046",
4516
+ "normalized_score": 0.004579592783699693,
4517
  "metric_key": "macro_f1",
4518
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
4519
  "scope": "multi_episode_128_metadata_baseline",
4520
+ "reason": null
4521
  },
4522
  {
4523
  "task_number": 13,
 
4525
  "task_label": "Long-Horizon Next-Action Forecasting",
4526
  "series_id": "metadata128_neural_mlp",
4527
  "method": "128ep Metadata NN",
4528
+ "status": "scored",
4529
+ "status_label": "scored",
4530
+ "scored": true,
4531
  "proxy_scored": false,
4532
+ "raw": 0.0029821307969142615,
4533
+ "raw_text": "0.0030",
4534
+ "normalized_score": 0.0029821307969142615,
4535
  "metric_key": "macro_f1",
4536
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
4537
  "scope": "multi_episode_128_metadata_baseline",
4538
+ "reason": null
4539
  },
4540
  {
4541
  "task_number": 13,
 
4669
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4670
  "series_id": "metadata128_simple",
4671
  "method": "128ep Metadata Simple",
4672
+ "status": "scored",
4673
+ "status_label": "scored",
4674
+ "scored": true,
4675
  "proxy_scored": false,
4676
+ "raw": 0.0001206030150753769,
4677
+ "raw_text": "0.0001",
4678
+ "normalized_score": 0.0001206030150753769,
4679
  "metric_key": "macro_f1",
4680
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
4681
  "scope": "multi_episode_128_metadata_baseline",
4682
+ "reason": null
4683
  },
4684
  {
4685
  "task_number": 14,
 
4687
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4688
  "series_id": "metadata128_neural_mlp",
4689
  "method": "128ep Metadata NN",
4690
+ "status": "scored",
4691
+ "status_label": "scored",
4692
+ "scored": true,
4693
  "proxy_scored": false,
4694
+ "raw": 2.086049543676662e-05,
4695
+ "raw_text": "0.0000",
4696
+ "normalized_score": 2.086049543676662e-05,
4697
  "metric_key": "macro_f1",
4698
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
4699
  "scope": "multi_episode_128_metadata_baseline",
4700
+ "reason": null
4701
  },
4702
  {
4703
  "task_number": 14,
 
4831
  "task_label": "Interaction Text Prediction",
4832
  "series_id": "metadata128_simple",
4833
  "method": "128ep Metadata Simple",
4834
+ "status": "unsupported_without_required_target",
4835
+ "status_label": "unsupported",
4836
  "scored": false,
4837
  "proxy_scored": false,
4838
  "raw": null,
4839
  "raw_text": "n/a",
4840
  "normalized_score": null,
4841
  "metric_key": "macro_f1",
4842
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
4843
  "scope": "multi_episode_128_metadata_baseline",
4844
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
4845
  },
4846
  {
4847
  "task_number": 15,
 
4993
  "task_label": "Action-Object Relation Prediction",
4994
  "series_id": "metadata128_simple",
4995
  "method": "128ep Metadata Simple",
4996
+ "status": "scored",
4997
+ "status_label": "scored",
4998
+ "scored": true,
4999
  "proxy_scored": false,
5000
+ "raw": 0.0,
5001
+ "raw_text": "0.0000",
5002
+ "normalized_score": 0.0,
5003
  "metric_key": "macro_f1",
5004
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
5005
  "scope": "multi_episode_128_metadata_baseline",
5006
+ "reason": null
5007
  },
5008
  {
5009
  "task_number": 16,
 
5011
  "task_label": "Action-Object Relation Prediction",
5012
  "series_id": "metadata128_neural_mlp",
5013
  "method": "128ep Metadata NN",
5014
+ "status": "scored",
5015
+ "status_label": "scored",
5016
+ "scored": true,
5017
  "proxy_scored": false,
5018
+ "raw": 0.0,
5019
+ "raw_text": "0.0000",
5020
+ "normalized_score": 0.0,
5021
  "metric_key": "macro_f1",
5022
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
5023
  "scope": "multi_episode_128_metadata_baseline",
5024
+ "reason": null
5025
  },
5026
  {
5027
  "task_number": 16,
 
5155
  "task_label": "Future Object-Set Forecasting",
5156
  "series_id": "metadata128_simple",
5157
  "method": "128ep Metadata Simple",
5158
+ "status": "scored",
5159
+ "status_label": "scored",
5160
+ "scored": true,
5161
  "proxy_scored": false,
5162
+ "raw": 0.17656983343047333,
5163
+ "raw_text": "0.1766",
5164
+ "normalized_score": 0.17656983343047333,
5165
  "metric_key": "micro_f1",
5166
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
5167
  "scope": "multi_episode_128_metadata_baseline",
5168
+ "reason": null
5169
  },
5170
  {
5171
  "task_number": 17,
 
5173
  "task_label": "Future Object-Set Forecasting",
5174
  "series_id": "metadata128_neural_mlp",
5175
  "method": "128ep Metadata NN",
5176
+ "status": "scored",
5177
+ "status_label": "scored",
5178
+ "scored": true,
5179
  "proxy_scored": false,
5180
+ "raw": 0.17418550827844048,
5181
+ "raw_text": "0.1742",
5182
+ "normalized_score": 0.17418550827844048,
5183
  "metric_key": "micro_f1",
5184
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
5185
  "scope": "multi_episode_128_metadata_baseline",
5186
+ "reason": null
5187
  },
5188
  {
5189
  "task_number": 17,
 
5317
  "task_label": "IMU-to-Hand Pose Reconstruction",
5318
  "series_id": "metadata128_simple",
5319
  "method": "128ep Metadata Simple",
5320
+ "status": "unsupported_without_required_target",
5321
+ "status_label": "unsupported",
5322
  "scored": false,
5323
  "proxy_scored": false,
5324
  "raw": null,
5325
  "raw_text": "n/a",
5326
  "normalized_score": null,
5327
  "metric_key": "mae",
5328
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
5329
  "scope": "multi_episode_128_metadata_baseline",
5330
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
5331
  },
5332
  {
5333
  "task_number": 18,
 
5479
  "task_label": "Camera-View Synchronization Retrieval",
5480
  "series_id": "metadata128_simple",
5481
  "method": "128ep Metadata Simple",
5482
+ "status": "unsupported_without_required_target",
5483
+ "status_label": "unsupported",
5484
  "scored": false,
5485
  "proxy_scored": false,
5486
  "raw": null,
5487
  "raw_text": "n/a",
5488
  "normalized_score": null,
5489
  "metric_key": "mrr",
5490
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
5491
  "scope": "multi_episode_128_metadata_baseline",
5492
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
5493
  },
5494
  {
5495
  "task_number": 19,
 
5641
  "task_label": "Time-to-Next-Transition Regression",
5642
  "series_id": "metadata128_simple",
5643
  "method": "128ep Metadata Simple",
5644
+ "status": "scored",
5645
+ "status_label": "scored",
5646
+ "scored": true,
5647
  "proxy_scored": false,
5648
+ "raw": 624.8108520507812,
5649
+ "raw_text": "624.81",
5650
+ "normalized_score": 0.016864874132806403,
5651
  "metric_key": "mae",
5652
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
5653
  "scope": "multi_episode_128_metadata_baseline",
5654
+ "reason": null
5655
  },
5656
  {
5657
  "task_number": 20,
 
5659
  "task_label": "Time-to-Next-Transition Regression",
5660
  "series_id": "metadata128_neural_mlp",
5661
  "method": "128ep Metadata NN",
5662
+ "status": "scored",
5663
+ "status_label": "scored",
5664
+ "scored": true,
5665
  "proxy_scored": false,
5666
+ "raw": 41.4664421081543,
5667
+ "raw_text": "41.47",
5668
+ "normalized_score": 0.25411768748242325,
5669
  "metric_key": "mae",
5670
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
5671
  "scope": "multi_episode_128_metadata_baseline",
5672
+ "reason": null
5673
  },
5674
  {
5675
  "task_number": 20,
docs/data/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:41:43+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
- "bytes": 116109,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -316,7 +316,7 @@
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
- "bytes": 187099,
320
  "top_level_type": "dict"
321
  },
322
  {
@@ -486,12 +486,12 @@
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
- "bytes": 50687,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
- "bytes": 129600,
495
  "top_level_type": "dict"
496
  },
497
  {
@@ -526,7 +526,7 @@
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
- "bytes": 230951,
530
  "top_level_type": "dict"
531
  },
532
  {
@@ -571,7 +571,7 @@
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
- "bytes": 44825,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
@@ -641,7 +641,7 @@
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
- "bytes": 50841,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:46+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
+ "bytes": 116110,
305
  "top_level_type": "dict"
306
  },
307
  {
 
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
+ "bytes": 186443,
320
  "top_level_type": "dict"
321
  },
322
  {
 
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
+ "bytes": 46902,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
+ "bytes": 129242,
495
  "top_level_type": "dict"
496
  },
497
  {
 
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
+ "bytes": 230297,
530
  "top_level_type": "dict"
531
  },
532
  {
 
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
+ "bytes": 45937,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
 
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
+ "bytes": 51953,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
metrics/artifact_index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-18T11:16:44+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
@@ -290,8 +290,8 @@
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
- "bytes": 58012,
294
- "sha256": "a95cdde097b11f83023c758c807f031c3d4cb3fde20d42ed314565440cc68374"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
@@ -599,7 +599,7 @@
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
- "sha256": "8494b6983100acdfde9b5929e871b27120897af8ec7b5a3031aa142b598a09ae"
603
  },
604
  {
605
  "id": "source_alignment_validator",
@@ -719,8 +719,8 @@
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
- "bytes": 230951,
723
- "sha256": "8aaed21d08943f2dc53c5160e27872bc4f7f8a405d7289cdaaf7b00d867b84d8"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
@@ -731,7 +731,7 @@
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
  "bytes": 50973,
734
- "sha256": "d20637e6a17390f7fd44589ff37cb1889318bc39c2259dca6bb7f1a43d8ea26b"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
@@ -741,8 +741,8 @@
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
- "bytes": 187099,
745
- "sha256": "bf2b3fdeb9713a9d4cba0e8645c24c325b88e939cb94f4718a9d3c2db03e2bb3"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
@@ -752,8 +752,8 @@
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
- "bytes": 129600,
756
- "sha256": "30fd572521991fd7f5741411d91a40d3d442032f001841f9fd1a4e7381eb73d2"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
@@ -763,8 +763,8 @@
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
- "bytes": 4128,
767
- "sha256": "89c73da7db81d2c5f6eb4a16c828531a589ac44cabba2c0c95b171b6ad2060d6"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
@@ -774,8 +774,8 @@
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
- "bytes": 50687,
778
- "sha256": "2cdaa06f9c140a2e194675a3383be341acb1f6e07ddecfa7017cdbe34d704282"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
@@ -785,8 +785,8 @@
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
- "bytes": 14421,
789
- "sha256": "125e658010284dc48570fa7c6a7676e4013d30dd1f22deb24d369e7085a7b700"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
@@ -796,8 +796,8 @@
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
- "bytes": 50841,
800
- "sha256": "e5fa2420fc5ed905953e71ef8978ad1ee794c0daf06a7f0ff10374db7f291c72"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
@@ -818,8 +818,8 @@
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
- "bytes": 44825,
822
- "sha256": "50b5d87fca4aba303a8440f5ef53470ed493e9f1251cb5edeb16bac90038a11b"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
@@ -906,8 +906,8 @@
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
- "bytes": 50297,
910
- "sha256": "1c1710bcf340ece479e321f19d4cb8302fe369a1103b4584a15853fe73dc226c"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
@@ -1105,7 +1105,7 @@
1105
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1106
  "exists": true,
1107
  "bytes": 8100,
1108
- "sha256": "6549b0f8da6c3742c72b12b71900db1b89455cd34d5befcdf9d249b4adebbd1a"
1109
  },
1110
  {
1111
  "id": "public_surface_qa",
@@ -1310,7 +1310,7 @@
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
- "bytes": 983979,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
@@ -1322,7 +1322,7 @@
1322
  "volatile": true,
1323
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
1324
  "exists": true,
1325
- "bytes": 20022,
1326
  "hash_policy": "existence_and_size_only"
1327
  },
1328
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
 
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
+ "bytes": 73236,
294
+ "sha256": "76acae0de25d51413e7e6f11021163e7d9909cfe95d65bf6b02e74043d429e2d"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
 
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
+ "sha256": "ae089cc0df132b63365e03b2157a488b5d1569567c0374d7621bcd347da62c9e"
603
  },
604
  {
605
  "id": "source_alignment_validator",
 
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
+ "bytes": 230297,
723
+ "sha256": "437874b1633e73165e3300f55580394663a44759c848288e696859b98f8aad32"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
 
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
  "bytes": 50973,
734
+ "sha256": "38cb43512f2ac40feeb62333bdea89b3a55e5b48468beb8982cf22536f794ecf"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
 
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
+ "bytes": 186443,
745
+ "sha256": "55e758e8703f406889022976d0ba055181212305c9a7246e899463e0c3c3b554"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
 
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
+ "bytes": 129242,
756
+ "sha256": "64fb700d51f536edf11291799b6173cf9ae8dd7a41178aac348b8207ed4b1e42"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
 
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
+ "bytes": 4026,
767
+ "sha256": "55e949fc30419a52f7f5ec4dd9544a11b253b076f8e3637ec3e92b3d61a89aab"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
 
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
+ "bytes": 46902,
778
+ "sha256": "2b64dbd013625852679f9b91d25c48d1ed197fec727883b4fe37088b2d594784"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
 
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
+ "bytes": 13387,
789
+ "sha256": "d33461eb704f8e92545b6b54d9fc509e617fbacc9ca9894ac851ca9c3dec0fec"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
 
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
+ "bytes": 51953,
800
+ "sha256": "19c001f10319946ef0e4921064f8a012836f29e7c8b272f900c257169faf46a1"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
 
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
+ "bytes": 45937,
822
+ "sha256": "b504b1b9c5cad0caa8c822d5bb2971c1b708251cf7b9ef587a92db2c12751e97"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
 
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
+ "bytes": 109248,
910
+ "sha256": "5e7f3085be5012eb3dda46f9c7b5b7c0ae22d6a0fbce71d6e99dd317fecc12af"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
 
1105
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1106
  "exists": true,
1107
  "bytes": 8100,
1108
+ "sha256": "7800195093b8b81b49c87cdcbcebe601de8141c0c9d8b4490b98f539cb132725"
1109
  },
1110
  {
1111
  "id": "public_surface_qa",
 
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
+ "bytes": 994053,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
 
1322
  "volatile": true,
1323
  "shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
1324
  "exists": true,
1325
+ "bytes": 20021,
1326
  "hash_policy": "existence_and_size_only"
1327
  },
1328
  {
metrics/episode128_task_model_radar.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
- "scored_method_task_count": 83,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -30,18 +30,17 @@
30
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
- "scored_task_count": 8,
34
- "covered_task_count": 8,
35
  "proxy_scored_task_count": 0,
36
- "scoreless_task_count": 12,
37
- "unsupported_task_count": 12,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
- "not_supported_by_metadata_only_package": 8,
41
- "scored": 8,
42
- "unsupported_without_required_target": 4
43
  },
44
- "coverage_fraction": 0.4,
45
  "result_record_fraction": 1.0
46
  },
47
  {
@@ -55,17 +54,17 @@
55
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
56
  "plotted_as": "colored point overlay",
57
  "result_record_count": 20,
58
- "scored_task_count": 8,
59
- "covered_task_count": 8,
60
  "proxy_scored_task_count": 0,
61
- "scoreless_task_count": 12,
62
- "unsupported_task_count": 12,
63
  "not_evaluated_task_count": 0,
64
  "status_counts": {
65
- "not_supported_by_metadata_only_package": 12,
66
- "scored": 8
67
  },
68
- "coverage_fraction": 0.4,
69
  "result_record_fraction": 1.0
70
  },
71
  {
@@ -1295,26 +1294,26 @@
1295
  "raw128_proxy_axis": false,
1296
  "values": {
1297
  "metadata128_simple": {
1298
- "raw": null,
1299
  "metric_key": "macro_f1",
1300
- "source": null,
1301
  "scope": "multi_episode_128_metadata_baseline",
1302
- "status": "not_supported_by_metadata_only_package",
1303
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1304
- "normalized_score": null,
1305
- "raw_text": "n/a",
1306
- "status_label": "not supported"
1307
  },
1308
  "metadata128_neural_mlp": {
1309
- "raw": null,
1310
  "metric_key": "macro_f1",
1311
- "source": null,
1312
  "scope": "multi_episode_128_metadata_baseline",
1313
- "status": "not_supported_by_metadata_only_package",
1314
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1315
- "normalized_score": null,
1316
- "raw_text": "n/a",
1317
- "status_label": "not supported"
1318
  },
1319
  "raw128_simple": {
1320
  "raw": 0.0024280172369056294,
@@ -1386,26 +1385,26 @@
1386
  "raw128_proxy_axis": false,
1387
  "values": {
1388
  "metadata128_simple": {
1389
- "raw": null,
1390
  "metric_key": "macro_f1",
1391
- "source": null,
1392
  "scope": "multi_episode_128_metadata_baseline",
1393
- "status": "not_supported_by_metadata_only_package",
1394
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1395
- "normalized_score": null,
1396
- "raw_text": "n/a",
1397
- "status_label": "not supported"
1398
  },
1399
  "metadata128_neural_mlp": {
1400
- "raw": null,
1401
  "metric_key": "macro_f1",
1402
- "source": null,
1403
  "scope": "multi_episode_128_metadata_baseline",
1404
- "status": "not_supported_by_metadata_only_package",
1405
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1406
- "normalized_score": null,
1407
- "raw_text": "n/a",
1408
- "status_label": "not supported"
1409
  },
1410
  "raw128_simple": {
1411
  "raw": 0.0,
@@ -1479,13 +1478,13 @@
1479
  "metadata128_simple": {
1480
  "raw": null,
1481
  "metric_key": "macro_f1",
1482
- "source": null,
1483
  "scope": "multi_episode_128_metadata_baseline",
1484
- "status": "not_supported_by_metadata_only_package",
1485
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1486
  "normalized_score": null,
1487
  "raw_text": "n/a",
1488
- "status_label": "not supported"
1489
  },
1490
  "metadata128_neural_mlp": {
1491
  "raw": null,
@@ -1568,26 +1567,26 @@
1568
  "raw128_proxy_axis": false,
1569
  "values": {
1570
  "metadata128_simple": {
1571
- "raw": null,
1572
  "metric_key": "macro_f1",
1573
- "source": null,
1574
  "scope": "multi_episode_128_metadata_baseline",
1575
- "status": "not_supported_by_metadata_only_package",
1576
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1577
- "normalized_score": null,
1578
- "raw_text": "n/a",
1579
- "status_label": "not supported"
1580
  },
1581
  "metadata128_neural_mlp": {
1582
- "raw": null,
1583
  "metric_key": "macro_f1",
1584
- "source": null,
1585
  "scope": "multi_episode_128_metadata_baseline",
1586
- "status": "not_supported_by_metadata_only_package",
1587
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1588
- "normalized_score": null,
1589
- "raw_text": "n/a",
1590
- "status_label": "not supported"
1591
  },
1592
  "raw128_simple": {
1593
  "raw": 0.0,
@@ -1659,26 +1658,26 @@
1659
  "raw128_proxy_axis": false,
1660
  "values": {
1661
  "metadata128_simple": {
1662
- "raw": null,
1663
  "metric_key": "micro_f1",
1664
- "source": null,
1665
  "scope": "multi_episode_128_metadata_baseline",
1666
- "status": "not_supported_by_metadata_only_package",
1667
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1668
- "normalized_score": null,
1669
- "raw_text": "n/a",
1670
- "status_label": "not supported"
1671
  },
1672
  "metadata128_neural_mlp": {
1673
- "raw": null,
1674
  "metric_key": "micro_f1",
1675
- "source": null,
1676
  "scope": "multi_episode_128_metadata_baseline",
1677
- "status": "not_supported_by_metadata_only_package",
1678
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1679
- "normalized_score": null,
1680
- "raw_text": "n/a",
1681
- "status_label": "not supported"
1682
  },
1683
  "raw128_simple": {
1684
  "raw": 0.06469493412657774,
@@ -1752,13 +1751,13 @@
1752
  "metadata128_simple": {
1753
  "raw": null,
1754
  "metric_key": "mae",
1755
- "source": null,
1756
  "scope": "multi_episode_128_metadata_baseline",
1757
- "status": "not_supported_by_metadata_only_package",
1758
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1759
  "normalized_score": null,
1760
  "raw_text": "n/a",
1761
- "status_label": "not supported"
1762
  },
1763
  "metadata128_neural_mlp": {
1764
  "raw": null,
@@ -1843,13 +1842,13 @@
1843
  "metadata128_simple": {
1844
  "raw": null,
1845
  "metric_key": "mrr",
1846
- "source": null,
1847
  "scope": "multi_episode_128_metadata_baseline",
1848
- "status": "not_supported_by_metadata_only_package",
1849
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1850
  "normalized_score": null,
1851
  "raw_text": "n/a",
1852
- "status_label": "not supported"
1853
  },
1854
  "metadata128_neural_mlp": {
1855
  "raw": null,
@@ -1932,26 +1931,26 @@
1932
  "raw128_proxy_axis": false,
1933
  "values": {
1934
  "metadata128_simple": {
1935
- "raw": null,
1936
  "metric_key": "mae",
1937
- "source": null,
1938
  "scope": "multi_episode_128_metadata_baseline",
1939
- "status": "not_supported_by_metadata_only_package",
1940
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1941
- "normalized_score": null,
1942
- "raw_text": "n/a",
1943
- "status_label": "not supported"
1944
  },
1945
  "metadata128_neural_mlp": {
1946
- "raw": null,
1947
  "metric_key": "mae",
1948
- "source": null,
1949
  "scope": "multi_episode_128_metadata_baseline",
1950
- "status": "not_supported_by_metadata_only_package",
1951
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1952
- "normalized_score": null,
1953
- "raw_text": "n/a",
1954
- "status_label": "not supported"
1955
  },
1956
  "raw128_simple": {
1957
  "raw": 52.32759475708008,
@@ -3530,17 +3529,17 @@
3530
  "task_label": "Long-Horizon Next-Action Forecasting",
3531
  "series_id": "metadata128_simple",
3532
  "method": "128ep Metadata Simple",
3533
- "status": "not_supported_by_metadata_only_package",
3534
- "status_label": "not supported",
3535
- "scored": false,
3536
  "proxy_scored": false,
3537
- "raw": null,
3538
- "raw_text": "n/a",
3539
- "normalized_score": null,
3540
  "metric_key": "macro_f1",
3541
- "source": null,
3542
  "scope": "multi_episode_128_metadata_baseline",
3543
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3544
  },
3545
  {
3546
  "task_number": 13,
@@ -3548,17 +3547,17 @@
3548
  "task_label": "Long-Horizon Next-Action Forecasting",
3549
  "series_id": "metadata128_neural_mlp",
3550
  "method": "128ep Metadata NN",
3551
- "status": "not_supported_by_metadata_only_package",
3552
- "status_label": "not supported",
3553
- "scored": false,
3554
  "proxy_scored": false,
3555
- "raw": null,
3556
- "raw_text": "n/a",
3557
- "normalized_score": null,
3558
  "metric_key": "macro_f1",
3559
- "source": null,
3560
  "scope": "multi_episode_128_metadata_baseline",
3561
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3562
  },
3563
  {
3564
  "task_number": 13,
@@ -3656,17 +3655,17 @@
3656
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3657
  "series_id": "metadata128_simple",
3658
  "method": "128ep Metadata Simple",
3659
- "status": "not_supported_by_metadata_only_package",
3660
- "status_label": "not supported",
3661
- "scored": false,
3662
  "proxy_scored": false,
3663
- "raw": null,
3664
- "raw_text": "n/a",
3665
- "normalized_score": null,
3666
  "metric_key": "macro_f1",
3667
- "source": null,
3668
  "scope": "multi_episode_128_metadata_baseline",
3669
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3670
  },
3671
  {
3672
  "task_number": 14,
@@ -3674,17 +3673,17 @@
3674
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3675
  "series_id": "metadata128_neural_mlp",
3676
  "method": "128ep Metadata NN",
3677
- "status": "not_supported_by_metadata_only_package",
3678
- "status_label": "not supported",
3679
- "scored": false,
3680
  "proxy_scored": false,
3681
- "raw": null,
3682
- "raw_text": "n/a",
3683
- "normalized_score": null,
3684
  "metric_key": "macro_f1",
3685
- "source": null,
3686
  "scope": "multi_episode_128_metadata_baseline",
3687
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3688
  },
3689
  {
3690
  "task_number": 14,
@@ -3782,17 +3781,17 @@
3782
  "task_label": "Interaction Text Prediction",
3783
  "series_id": "metadata128_simple",
3784
  "method": "128ep Metadata Simple",
3785
- "status": "not_supported_by_metadata_only_package",
3786
- "status_label": "not supported",
3787
  "scored": false,
3788
  "proxy_scored": false,
3789
  "raw": null,
3790
  "raw_text": "n/a",
3791
  "normalized_score": null,
3792
  "metric_key": "macro_f1",
3793
- "source": null,
3794
  "scope": "multi_episode_128_metadata_baseline",
3795
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3796
  },
3797
  {
3798
  "task_number": 15,
@@ -3908,17 +3907,17 @@
3908
  "task_label": "Action-Object Relation Prediction",
3909
  "series_id": "metadata128_simple",
3910
  "method": "128ep Metadata Simple",
3911
- "status": "not_supported_by_metadata_only_package",
3912
- "status_label": "not supported",
3913
- "scored": false,
3914
  "proxy_scored": false,
3915
- "raw": null,
3916
- "raw_text": "n/a",
3917
- "normalized_score": null,
3918
  "metric_key": "macro_f1",
3919
- "source": null,
3920
  "scope": "multi_episode_128_metadata_baseline",
3921
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3922
  },
3923
  {
3924
  "task_number": 16,
@@ -3926,17 +3925,17 @@
3926
  "task_label": "Action-Object Relation Prediction",
3927
  "series_id": "metadata128_neural_mlp",
3928
  "method": "128ep Metadata NN",
3929
- "status": "not_supported_by_metadata_only_package",
3930
- "status_label": "not supported",
3931
- "scored": false,
3932
  "proxy_scored": false,
3933
- "raw": null,
3934
- "raw_text": "n/a",
3935
- "normalized_score": null,
3936
  "metric_key": "macro_f1",
3937
- "source": null,
3938
  "scope": "multi_episode_128_metadata_baseline",
3939
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3940
  },
3941
  {
3942
  "task_number": 16,
@@ -4034,17 +4033,17 @@
4034
  "task_label": "Future Object-Set Forecasting",
4035
  "series_id": "metadata128_simple",
4036
  "method": "128ep Metadata Simple",
4037
- "status": "not_supported_by_metadata_only_package",
4038
- "status_label": "not supported",
4039
- "scored": false,
4040
  "proxy_scored": false,
4041
- "raw": null,
4042
- "raw_text": "n/a",
4043
- "normalized_score": null,
4044
  "metric_key": "micro_f1",
4045
- "source": null,
4046
  "scope": "multi_episode_128_metadata_baseline",
4047
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4048
  },
4049
  {
4050
  "task_number": 17,
@@ -4052,17 +4051,17 @@
4052
  "task_label": "Future Object-Set Forecasting",
4053
  "series_id": "metadata128_neural_mlp",
4054
  "method": "128ep Metadata NN",
4055
- "status": "not_supported_by_metadata_only_package",
4056
- "status_label": "not supported",
4057
- "scored": false,
4058
  "proxy_scored": false,
4059
- "raw": null,
4060
- "raw_text": "n/a",
4061
- "normalized_score": null,
4062
  "metric_key": "micro_f1",
4063
- "source": null,
4064
  "scope": "multi_episode_128_metadata_baseline",
4065
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4066
  },
4067
  {
4068
  "task_number": 17,
@@ -4160,17 +4159,17 @@
4160
  "task_label": "IMU-to-Hand Pose Reconstruction",
4161
  "series_id": "metadata128_simple",
4162
  "method": "128ep Metadata Simple",
4163
- "status": "not_supported_by_metadata_only_package",
4164
- "status_label": "not supported",
4165
  "scored": false,
4166
  "proxy_scored": false,
4167
  "raw": null,
4168
  "raw_text": "n/a",
4169
  "normalized_score": null,
4170
  "metric_key": "mae",
4171
- "source": null,
4172
  "scope": "multi_episode_128_metadata_baseline",
4173
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4174
  },
4175
  {
4176
  "task_number": 18,
@@ -4286,17 +4285,17 @@
4286
  "task_label": "Camera-View Synchronization Retrieval",
4287
  "series_id": "metadata128_simple",
4288
  "method": "128ep Metadata Simple",
4289
- "status": "not_supported_by_metadata_only_package",
4290
- "status_label": "not supported",
4291
  "scored": false,
4292
  "proxy_scored": false,
4293
  "raw": null,
4294
  "raw_text": "n/a",
4295
  "normalized_score": null,
4296
  "metric_key": "mrr",
4297
- "source": null,
4298
  "scope": "multi_episode_128_metadata_baseline",
4299
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4300
  },
4301
  {
4302
  "task_number": 19,
@@ -4412,17 +4411,17 @@
4412
  "task_label": "Time-to-Next-Transition Regression",
4413
  "series_id": "metadata128_simple",
4414
  "method": "128ep Metadata Simple",
4415
- "status": "not_supported_by_metadata_only_package",
4416
- "status_label": "not supported",
4417
- "scored": false,
4418
  "proxy_scored": false,
4419
- "raw": null,
4420
- "raw_text": "n/a",
4421
- "normalized_score": null,
4422
  "metric_key": "mae",
4423
- "source": null,
4424
  "scope": "multi_episode_128_metadata_baseline",
4425
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4426
  },
4427
  {
4428
  "task_number": 20,
@@ -4430,17 +4429,17 @@
4430
  "task_label": "Time-to-Next-Transition Regression",
4431
  "series_id": "metadata128_neural_mlp",
4432
  "method": "128ep Metadata NN",
4433
- "status": "not_supported_by_metadata_only_package",
4434
- "status_label": "not supported",
4435
- "scored": false,
4436
  "proxy_scored": false,
4437
- "raw": null,
4438
- "raw_text": "n/a",
4439
- "normalized_score": null,
4440
  "metric_key": "mae",
4441
- "source": null,
4442
  "scope": "multi_episode_128_metadata_baseline",
4443
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4444
  },
4445
  {
4446
  "task_number": 20,
 
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
+ "scored_method_task_count": 93,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
 
30
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
+ "scored_task_count": 13,
34
+ "covered_task_count": 13,
35
  "proxy_scored_task_count": 0,
36
+ "scoreless_task_count": 7,
37
+ "unsupported_task_count": 7,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
+ "scored": 13,
41
+ "unsupported_without_required_target": 7
 
42
  },
43
+ "coverage_fraction": 0.65,
44
  "result_record_fraction": 1.0
45
  },
46
  {
 
54
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
+ "scored_task_count": 13,
58
+ "covered_task_count": 13,
59
  "proxy_scored_task_count": 0,
60
+ "scoreless_task_count": 7,
61
+ "unsupported_task_count": 7,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
+ "not_supported_by_metadata_only_package": 7,
65
+ "scored": 13
66
  },
67
+ "coverage_fraction": 0.65,
68
  "result_record_fraction": 1.0
69
  },
70
  {
 
1294
  "raw128_proxy_axis": false,
1295
  "values": {
1296
  "metadata128_simple": {
1297
+ "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
  "scope": "multi_episode_128_metadata_baseline",
1301
+ "status": "scored",
1302
+ "reason": null,
1303
+ "normalized_score": 0.004579592783699693,
1304
+ "raw_text": "0.0046",
1305
+ "status_label": "scored"
1306
  },
1307
  "metadata128_neural_mlp": {
1308
+ "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
  "scope": "multi_episode_128_metadata_baseline",
1312
+ "status": "scored",
1313
+ "reason": null,
1314
+ "normalized_score": 0.0029821307969142615,
1315
+ "raw_text": "0.0030",
1316
+ "status_label": "scored"
1317
  },
1318
  "raw128_simple": {
1319
  "raw": 0.0024280172369056294,
 
1385
  "raw128_proxy_axis": false,
1386
  "values": {
1387
  "metadata128_simple": {
1388
+ "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
  "scope": "multi_episode_128_metadata_baseline",
1392
+ "status": "scored",
1393
+ "reason": null,
1394
+ "normalized_score": 0.0001206030150753769,
1395
+ "raw_text": "0.0001",
1396
+ "status_label": "scored"
1397
  },
1398
  "metadata128_neural_mlp": {
1399
+ "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
  "scope": "multi_episode_128_metadata_baseline",
1403
+ "status": "scored",
1404
+ "reason": null,
1405
+ "normalized_score": 2.086049543676662e-05,
1406
+ "raw_text": "0.0000",
1407
+ "status_label": "scored"
1408
  },
1409
  "raw128_simple": {
1410
  "raw": 0.0,
 
1478
  "metadata128_simple": {
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
  "scope": "multi_episode_128_metadata_baseline",
1483
+ "status": "unsupported_without_required_target",
1484
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
1486
  "raw_text": "n/a",
1487
+ "status_label": "unsupported"
1488
  },
1489
  "metadata128_neural_mlp": {
1490
  "raw": null,
 
1567
  "raw128_proxy_axis": false,
1568
  "values": {
1569
  "metadata128_simple": {
1570
+ "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
  "scope": "multi_episode_128_metadata_baseline",
1574
+ "status": "scored",
1575
+ "reason": null,
1576
+ "normalized_score": 0.0,
1577
+ "raw_text": "0.0000",
1578
+ "status_label": "scored"
1579
  },
1580
  "metadata128_neural_mlp": {
1581
+ "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
  "scope": "multi_episode_128_metadata_baseline",
1585
+ "status": "scored",
1586
+ "reason": null,
1587
+ "normalized_score": 0.0,
1588
+ "raw_text": "0.0000",
1589
+ "status_label": "scored"
1590
  },
1591
  "raw128_simple": {
1592
  "raw": 0.0,
 
1658
  "raw128_proxy_axis": false,
1659
  "values": {
1660
  "metadata128_simple": {
1661
+ "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
  "scope": "multi_episode_128_metadata_baseline",
1665
+ "status": "scored",
1666
+ "reason": null,
1667
+ "normalized_score": 0.17656983343047333,
1668
+ "raw_text": "0.1766",
1669
+ "status_label": "scored"
1670
  },
1671
  "metadata128_neural_mlp": {
1672
+ "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
  "scope": "multi_episode_128_metadata_baseline",
1676
+ "status": "scored",
1677
+ "reason": null,
1678
+ "normalized_score": 0.17418550827844048,
1679
+ "raw_text": "0.1742",
1680
+ "status_label": "scored"
1681
  },
1682
  "raw128_simple": {
1683
  "raw": 0.06469493412657774,
 
1751
  "metadata128_simple": {
1752
  "raw": null,
1753
  "metric_key": "mae",
1754
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
  "scope": "multi_episode_128_metadata_baseline",
1756
+ "status": "unsupported_without_required_target",
1757
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
1758
  "normalized_score": null,
1759
  "raw_text": "n/a",
1760
+ "status_label": "unsupported"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
  "raw": null,
 
1842
  "metadata128_simple": {
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
  "scope": "multi_episode_128_metadata_baseline",
1847
+ "status": "unsupported_without_required_target",
1848
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
1850
  "raw_text": "n/a",
1851
+ "status_label": "unsupported"
1852
  },
1853
  "metadata128_neural_mlp": {
1854
  "raw": null,
 
1931
  "raw128_proxy_axis": false,
1932
  "values": {
1933
  "metadata128_simple": {
1934
+ "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
  "scope": "multi_episode_128_metadata_baseline",
1938
+ "status": "scored",
1939
+ "reason": null,
1940
+ "normalized_score": 0.016864874132806403,
1941
+ "raw_text": "624.81",
1942
+ "status_label": "scored"
1943
  },
1944
  "metadata128_neural_mlp": {
1945
+ "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
  "scope": "multi_episode_128_metadata_baseline",
1949
+ "status": "scored",
1950
+ "reason": null,
1951
+ "normalized_score": 0.25411768748242325,
1952
+ "raw_text": "41.47",
1953
+ "status_label": "scored"
1954
  },
1955
  "raw128_simple": {
1956
  "raw": 52.32759475708008,
 
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
  "method": "128ep Metadata Simple",
3532
+ "status": "scored",
3533
+ "status_label": "scored",
3534
+ "scored": true,
3535
  "proxy_scored": false,
3536
+ "raw": 0.004579592783699693,
3537
+ "raw_text": "0.0046",
3538
+ "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
  "scope": "multi_episode_128_metadata_baseline",
3542
+ "reason": null
3543
  },
3544
  {
3545
  "task_number": 13,
 
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
  "method": "128ep Metadata NN",
3550
+ "status": "scored",
3551
+ "status_label": "scored",
3552
+ "scored": true,
3553
  "proxy_scored": false,
3554
+ "raw": 0.0029821307969142615,
3555
+ "raw_text": "0.0030",
3556
+ "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
  "scope": "multi_episode_128_metadata_baseline",
3560
+ "reason": null
3561
  },
3562
  {
3563
  "task_number": 13,
 
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
  "method": "128ep Metadata Simple",
3658
+ "status": "scored",
3659
+ "status_label": "scored",
3660
+ "scored": true,
3661
  "proxy_scored": false,
3662
+ "raw": 0.0001206030150753769,
3663
+ "raw_text": "0.0001",
3664
+ "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
  "scope": "multi_episode_128_metadata_baseline",
3668
+ "reason": null
3669
  },
3670
  {
3671
  "task_number": 14,
 
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
  "method": "128ep Metadata NN",
3676
+ "status": "scored",
3677
+ "status_label": "scored",
3678
+ "scored": true,
3679
  "proxy_scored": false,
3680
+ "raw": 2.086049543676662e-05,
3681
+ "raw_text": "0.0000",
3682
+ "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
  "scope": "multi_episode_128_metadata_baseline",
3686
+ "reason": null
3687
  },
3688
  {
3689
  "task_number": 14,
 
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
  "method": "128ep Metadata Simple",
3784
+ "status": "unsupported_without_required_target",
3785
+ "status_label": "unsupported",
3786
  "scored": false,
3787
  "proxy_scored": false,
3788
  "raw": null,
3789
  "raw_text": "n/a",
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
  "scope": "multi_episode_128_metadata_baseline",
3794
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
3797
  "task_number": 15,
 
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
  "method": "128ep Metadata Simple",
3910
+ "status": "scored",
3911
+ "status_label": "scored",
3912
+ "scored": true,
3913
  "proxy_scored": false,
3914
+ "raw": 0.0,
3915
+ "raw_text": "0.0000",
3916
+ "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
  "scope": "multi_episode_128_metadata_baseline",
3920
+ "reason": null
3921
  },
3922
  {
3923
  "task_number": 16,
 
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
  "method": "128ep Metadata NN",
3928
+ "status": "scored",
3929
+ "status_label": "scored",
3930
+ "scored": true,
3931
  "proxy_scored": false,
3932
+ "raw": 0.0,
3933
+ "raw_text": "0.0000",
3934
+ "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
  "scope": "multi_episode_128_metadata_baseline",
3938
+ "reason": null
3939
  },
3940
  {
3941
  "task_number": 16,
 
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
  "method": "128ep Metadata Simple",
4036
+ "status": "scored",
4037
+ "status_label": "scored",
4038
+ "scored": true,
4039
  "proxy_scored": false,
4040
+ "raw": 0.17656983343047333,
4041
+ "raw_text": "0.1766",
4042
+ "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
  "scope": "multi_episode_128_metadata_baseline",
4046
+ "reason": null
4047
  },
4048
  {
4049
  "task_number": 17,
 
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
  "method": "128ep Metadata NN",
4054
+ "status": "scored",
4055
+ "status_label": "scored",
4056
+ "scored": true,
4057
  "proxy_scored": false,
4058
+ "raw": 0.17418550827844048,
4059
+ "raw_text": "0.1742",
4060
+ "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
  "scope": "multi_episode_128_metadata_baseline",
4064
+ "reason": null
4065
  },
4066
  {
4067
  "task_number": 17,
 
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
  "method": "128ep Metadata Simple",
4162
+ "status": "unsupported_without_required_target",
4163
+ "status_label": "unsupported",
4164
  "scored": false,
4165
  "proxy_scored": false,
4166
  "raw": null,
4167
  "raw_text": "n/a",
4168
  "normalized_score": null,
4169
  "metric_key": "mae",
4170
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
  "scope": "multi_episode_128_metadata_baseline",
4172
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
4173
  },
4174
  {
4175
  "task_number": 18,
 
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
  "method": "128ep Metadata Simple",
4288
+ "status": "unsupported_without_required_target",
4289
+ "status_label": "unsupported",
4290
  "scored": false,
4291
  "proxy_scored": false,
4292
  "raw": null,
4293
  "raw_text": "n/a",
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
  "scope": "multi_episode_128_metadata_baseline",
4298
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
4301
  "task_number": 19,
 
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
  "method": "128ep Metadata Simple",
4414
+ "status": "scored",
4415
+ "status_label": "scored",
4416
+ "scored": true,
4417
  "proxy_scored": false,
4418
+ "raw": 624.8108520507812,
4419
+ "raw_text": "624.81",
4420
+ "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
  "scope": "multi_episode_128_metadata_baseline",
4424
+ "reason": null
4425
  },
4426
  {
4427
  "task_number": 20,
 
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
  "method": "128ep Metadata NN",
4432
+ "status": "scored",
4433
+ "status_label": "scored",
4434
+ "scored": true,
4435
  "proxy_scored": false,
4436
+ "raw": 41.4664421081543,
4437
+ "raw_text": "41.47",
4438
+ "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
  "scope": "multi_episode_128_metadata_baseline",
4442
+ "reason": null
4443
  },
4444
  {
4445
  "task_number": 20,
metrics/mirror_parity.json CHANGED
The diff for this file is too large to render. See raw diff
 
metrics/public_surface_qa.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:41:42+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
@@ -18,7 +18,7 @@
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
- "generated_at_utc": "2026-06-18T11:18:05+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
@@ -43,12 +43,12 @@
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
- "generated_at_utc": "2026-06-18T11:18:57+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
- "generated_at_utc": "2026-06-18T11:21:54+00:00"
52
  }
53
  },
54
  "failures": {}
 
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
 
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
+ "generated_at_utc": "2026-06-18T11:41:43+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
 
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
+ "generated_at_utc": "2026-06-18T11:42:48+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
+ "generated_at_utc": "2026-06-18T11:43:59+00:00"
52
  }
53
  },
54
  "failures": {}
metrics/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:42:48+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -215,8 +215,8 @@
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
- "file_count": 1276,
219
- "text_file_count": 1072,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
@@ -226,8 +226,8 @@
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
- "file_count": 1058,
230
- "text_file_count": 879,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
@@ -237,8 +237,8 @@
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
- "file_count": 2537,
241
- "text_file_count": 1085,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
@@ -248,8 +248,8 @@
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
- "file_count": 2956,
252
- "text_file_count": 1247,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:10:47+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
+ "file_count": 1321,
219
+ "text_file_count": 1108,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
 
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
+ "file_count": 1103,
230
+ "text_file_count": 915,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
 
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
+ "file_count": 2582,
241
+ "text_file_count": 1121,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
 
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
+ "file_count": 3001,
252
+ "text_file_count": 1283,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
metrics/quality_gates.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:20:56+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
metrics/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:18:06+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:48+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
metrics/single_episode_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
 
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
metrics/source_alignment_audit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:18:04+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
 
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:09:45+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
metrics/task_method_20_gap_audit.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "generated_at_utc": "2026-06-18T11:15:34+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
- "purpose": "Keep the 57 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
@@ -50,11 +50,12 @@
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
  "scope": "128 selected episodes, JSONL metadata/text only",
53
- "scored_task_count": 8,
54
- "scoreless_task_count": 12,
55
  "status_counts": {
56
- "not_supported_by_metadata_only_package": 12,
57
- "scored": 8
 
58
  }
59
  },
60
  "metadata128_simple": {
@@ -63,12 +64,11 @@
63
  "proxy_scored_task_count": 0,
64
  "result_record_count": 20,
65
  "scope": "128 selected episodes, JSONL metadata/text only",
66
- "scored_task_count": 8,
67
- "scoreless_task_count": 12,
68
  "status_counts": {
69
- "not_supported_by_metadata_only_package": 8,
70
- "scored": 8,
71
- "unsupported_without_required_target": 4
72
  }
73
  },
74
  "minimal": {
@@ -138,18 +138,25 @@
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
- "metadata128_neural_mlp": 12,
142
- "metadata128_simple": 12,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
- "not_supported_by_metadata_only_package": 20,
148
- "unsupported_without_required_target": 4
149
  },
150
  "missing_by_task": {
 
 
 
151
  "02 Procedure Step Recognition": [
152
- "cosmos3_nano_future_window"
 
 
 
 
153
  ],
154
  "05 Hand Trajectory Forecasting": [
155
  "cosmos3_nano_future_window",
@@ -190,14 +197,12 @@
190
  "13 Long-Horizon Next-Action Forecasting": [
191
  "cosmos3_nano_future_window",
192
  "cosmos3_super_reasoner",
193
- "metadata128_neural_mlp",
194
- "metadata128_simple"
195
  ],
196
  "14 Long-Horizon Next-Subtask Forecasting": [
197
  "cosmos3_nano_future_window",
198
  "cosmos3_super_reasoner",
199
- "metadata128_neural_mlp",
200
- "metadata128_simple"
201
  ],
202
  "15 Interaction Text Prediction": [
203
  "cosmos3_nano_future_window",
@@ -208,14 +213,11 @@
208
  ],
209
  "16 Action-Object Relation Prediction": [
210
  "cosmos3_nano_future_window",
211
- "metadata128_neural_mlp",
212
- "metadata128_simple"
213
  ],
214
  "17 Future Object-Set Forecasting": [
215
  "cosmos3_nano_future_window",
216
- "cosmos3_super_reasoner",
217
- "metadata128_neural_mlp",
218
- "metadata128_simple"
219
  ],
220
  "18 IMU-to-Hand Pose Reconstruction": [
221
  "cosmos3_nano_future_window",
@@ -233,12 +235,36 @@
233
  ],
234
  "20 Time-to-Next-Transition Regression": [
235
  "cosmos3_nano_future_window",
236
- "cosmos3_super_reasoner",
237
- "metadata128_neural_mlp",
238
- "metadata128_simple"
239
  ]
240
  },
241
  "missing_records": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  {
243
  "method": "Cosmos3-Nano Future Window",
244
  "metric_key": "macro_f1",
@@ -252,6 +278,19 @@
252
  "task_label": "Procedure Step Recognition",
253
  "task_number": 2
254
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
255
  {
256
  "method": "128ep Metadata Simple",
257
  "metric_key": "mpjpe",
@@ -538,28 +577,15 @@
538
  "task_label": "Multimodal Synchronization Detection",
539
  "task_number": 12
540
  },
541
- {
542
- "method": "128ep Metadata Simple",
543
- "metric_key": "macro_f1",
544
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
545
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
546
- "scope": "multi_episode_128_metadata_baseline",
547
- "series_id": "metadata128_simple",
548
- "status": "not_supported_by_metadata_only_package",
549
- "status_label": "not supported",
550
- "task_id": "long_horizon_next_action",
551
- "task_label": "Long-Horizon Next-Action Forecasting",
552
- "task_number": 13
553
- },
554
  {
555
  "method": "128ep Metadata NN",
556
  "metric_key": "macro_f1",
557
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
558
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
559
  "scope": "multi_episode_128_metadata_baseline",
560
  "series_id": "metadata128_neural_mlp",
561
- "status": "not_supported_by_metadata_only_package",
562
- "status_label": "not supported",
563
  "task_id": "long_horizon_next_action",
564
  "task_label": "Long-Horizon Next-Action Forecasting",
565
  "task_number": 13
@@ -590,28 +616,15 @@
590
  "task_label": "Long-Horizon Next-Action Forecasting",
591
  "task_number": 13
592
  },
593
- {
594
- "method": "128ep Metadata Simple",
595
- "metric_key": "macro_f1",
596
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
597
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
598
- "scope": "multi_episode_128_metadata_baseline",
599
- "series_id": "metadata128_simple",
600
- "status": "not_supported_by_metadata_only_package",
601
- "status_label": "not supported",
602
- "task_id": "next_subtask_forecast",
603
- "task_label": "Long-Horizon Next-Subtask Forecasting",
604
- "task_number": 14
605
- },
606
  {
607
  "method": "128ep Metadata NN",
608
  "metric_key": "macro_f1",
609
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
610
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
611
  "scope": "multi_episode_128_metadata_baseline",
612
  "series_id": "metadata128_neural_mlp",
613
- "status": "not_supported_by_metadata_only_package",
614
- "status_label": "not supported",
615
  "task_id": "next_subtask_forecast",
616
  "task_label": "Long-Horizon Next-Subtask Forecasting",
617
  "task_number": 14
@@ -645,12 +658,12 @@
645
  {
646
  "method": "128ep Metadata Simple",
647
  "metric_key": "macro_f1",
648
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
649
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
650
  "scope": "multi_episode_128_metadata_baseline",
651
  "series_id": "metadata128_simple",
652
- "status": "not_supported_by_metadata_only_package",
653
- "status_label": "not supported",
654
  "task_id": "interaction_text_prediction",
655
  "task_label": "Interaction Text Prediction",
656
  "task_number": 15
@@ -707,28 +720,15 @@
707
  "task_label": "Interaction Text Prediction",
708
  "task_number": 15
709
  },
710
- {
711
- "method": "128ep Metadata Simple",
712
- "metric_key": "macro_f1",
713
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
714
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
715
- "scope": "multi_episode_128_metadata_baseline",
716
- "series_id": "metadata128_simple",
717
- "status": "not_supported_by_metadata_only_package",
718
- "status_label": "not supported",
719
- "task_id": "action_object_relation",
720
- "task_label": "Action-Object Relation Prediction",
721
- "task_number": 16
722
- },
723
  {
724
  "method": "128ep Metadata NN",
725
  "metric_key": "macro_f1",
726
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
727
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
728
  "scope": "multi_episode_128_metadata_baseline",
729
  "series_id": "metadata128_neural_mlp",
730
- "status": "not_supported_by_metadata_only_package",
731
- "status_label": "not supported",
732
  "task_id": "action_object_relation",
733
  "task_label": "Action-Object Relation Prediction",
734
  "task_number": 16
@@ -746,32 +746,6 @@
746
  "task_label": "Action-Object Relation Prediction",
747
  "task_number": 16
748
  },
749
- {
750
- "method": "128ep Metadata Simple",
751
- "metric_key": "micro_f1",
752
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
753
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
754
- "scope": "multi_episode_128_metadata_baseline",
755
- "series_id": "metadata128_simple",
756
- "status": "not_supported_by_metadata_only_package",
757
- "status_label": "not supported",
758
- "task_id": "object_set_forecast",
759
- "task_label": "Future Object-Set Forecasting",
760
- "task_number": 17
761
- },
762
- {
763
- "method": "128ep Metadata NN",
764
- "metric_key": "micro_f1",
765
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
766
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
767
- "scope": "multi_episode_128_metadata_baseline",
768
- "series_id": "metadata128_neural_mlp",
769
- "status": "not_supported_by_metadata_only_package",
770
- "status_label": "not supported",
771
- "task_id": "object_set_forecast",
772
- "task_label": "Future Object-Set Forecasting",
773
- "task_number": 17
774
- },
775
  {
776
  "method": "Cosmos3-Super Reasoner",
777
  "metric_key": "micro_f1",
@@ -801,12 +775,12 @@
801
  {
802
  "method": "128ep Metadata Simple",
803
  "metric_key": "mae",
804
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
805
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
806
  "scope": "multi_episode_128_metadata_baseline",
807
  "series_id": "metadata128_simple",
808
- "status": "not_supported_by_metadata_only_package",
809
- "status_label": "not supported",
810
  "task_id": "imu_to_hand_pose",
811
  "task_label": "IMU-to-Hand Pose Reconstruction",
812
  "task_number": 18
@@ -866,12 +840,12 @@
866
  {
867
  "method": "128ep Metadata Simple",
868
  "metric_key": "mrr",
869
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
870
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
871
  "scope": "multi_episode_128_metadata_baseline",
872
  "series_id": "metadata128_simple",
873
- "status": "not_supported_by_metadata_only_package",
874
- "status_label": "not supported",
875
  "task_id": "camera_view_sync_retrieval",
876
  "task_label": "Camera-View Synchronization Retrieval",
877
  "task_number": 19
@@ -928,32 +902,6 @@
928
  "task_label": "Camera-View Synchronization Retrieval",
929
  "task_number": 19
930
  },
931
- {
932
- "method": "128ep Metadata Simple",
933
- "metric_key": "mae",
934
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
935
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
936
- "scope": "multi_episode_128_metadata_baseline",
937
- "series_id": "metadata128_simple",
938
- "status": "not_supported_by_metadata_only_package",
939
- "status_label": "not supported",
940
- "task_id": "time_to_transition",
941
- "task_label": "Time-to-Next-Transition Regression",
942
- "task_number": 20
943
- },
944
- {
945
- "method": "128ep Metadata NN",
946
- "metric_key": "mae",
947
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
948
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
949
- "scope": "multi_episode_128_metadata_baseline",
950
- "series_id": "metadata128_neural_mlp",
951
- "status": "not_supported_by_metadata_only_package",
952
- "status_label": "not supported",
953
- "task_id": "time_to_transition",
954
- "task_label": "Time-to-Next-Transition Regression",
955
- "task_number": 20
956
- },
957
  {
958
  "method": "Cosmos3-Super Reasoner",
959
  "metric_key": "mae",
@@ -1027,8 +975,8 @@
1027
  "method_count": 9,
1028
  "method_task_record_count": 180,
1029
  "proxy_scored_method_task_count": 4,
1030
- "scored_method_task_count": 123,
1031
- "scoreless_method_task_count": 57,
1032
  "task_count": 20
1033
  },
1034
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
 
1
  {
2
+ "generated_at_utc": "2026-06-18T12:07:14+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
+ "purpose": "Keep the 53 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
 
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
  "scope": "128 selected episodes, JSONL metadata/text only",
53
+ "scored_task_count": 7,
54
+ "scoreless_task_count": 13,
55
  "status_counts": {
56
+ "not_supported_by_metadata_only_package": 7,
57
+ "scored": 7,
58
+ "unsupported_without_required_target": 6
59
  }
60
  },
61
  "metadata128_simple": {
 
64
  "proxy_scored_task_count": 0,
65
  "result_record_count": 20,
66
  "scope": "128 selected episodes, JSONL metadata/text only",
67
+ "scored_task_count": 13,
68
+ "scoreless_task_count": 7,
69
  "status_counts": {
70
+ "scored": 13,
71
+ "unsupported_without_required_target": 7
 
72
  }
73
  },
74
  "minimal": {
 
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
+ "metadata128_neural_mlp": 13,
142
+ "metadata128_simple": 7,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
+ "not_supported_by_metadata_only_package": 7,
148
+ "unsupported_without_required_target": 13
149
  },
150
  "missing_by_task": {
151
+ "01 Action Recognition": [
152
+ "metadata128_neural_mlp"
153
+ ],
154
  "02 Procedure Step Recognition": [
155
+ "cosmos3_nano_future_window",
156
+ "metadata128_neural_mlp"
157
+ ],
158
+ "04 Next-Action Prediction": [
159
+ "metadata128_neural_mlp"
160
  ],
161
  "05 Hand Trajectory Forecasting": [
162
  "cosmos3_nano_future_window",
 
197
  "13 Long-Horizon Next-Action Forecasting": [
198
  "cosmos3_nano_future_window",
199
  "cosmos3_super_reasoner",
200
+ "metadata128_neural_mlp"
 
201
  ],
202
  "14 Long-Horizon Next-Subtask Forecasting": [
203
  "cosmos3_nano_future_window",
204
  "cosmos3_super_reasoner",
205
+ "metadata128_neural_mlp"
 
206
  ],
207
  "15 Interaction Text Prediction": [
208
  "cosmos3_nano_future_window",
 
213
  ],
214
  "16 Action-Object Relation Prediction": [
215
  "cosmos3_nano_future_window",
216
+ "metadata128_neural_mlp"
 
217
  ],
218
  "17 Future Object-Set Forecasting": [
219
  "cosmos3_nano_future_window",
220
+ "cosmos3_super_reasoner"
 
 
221
  ],
222
  "18 IMU-to-Hand Pose Reconstruction": [
223
  "cosmos3_nano_future_window",
 
235
  ],
236
  "20 Time-to-Next-Transition Regression": [
237
  "cosmos3_nano_future_window",
238
+ "cosmos3_super_reasoner"
 
 
239
  ]
240
  },
241
  "missing_records": [
242
+ {
243
+ "method": "128ep Metadata NN",
244
+ "metric_key": "macro_f1",
245
+ "reason": "train class count 896 exceeds --max-neural-classes 512",
246
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
247
+ "scope": "multi_episode_128_metadata_baseline",
248
+ "series_id": "metadata128_neural_mlp",
249
+ "status": "unsupported_without_required_target",
250
+ "status_label": "unsupported",
251
+ "task_id": "timeline_action",
252
+ "task_label": "Action Recognition",
253
+ "task_number": 1
254
+ },
255
+ {
256
+ "method": "128ep Metadata NN",
257
+ "metric_key": "macro_f1",
258
+ "reason": "train class count 652 exceeds --max-neural-classes 512",
259
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
260
+ "scope": "multi_episode_128_metadata_baseline",
261
+ "series_id": "metadata128_neural_mlp",
262
+ "status": "unsupported_without_required_target",
263
+ "status_label": "unsupported",
264
+ "task_id": "timeline_subtask",
265
+ "task_label": "Procedure Step Recognition",
266
+ "task_number": 2
267
+ },
268
  {
269
  "method": "Cosmos3-Nano Future Window",
270
  "metric_key": "macro_f1",
 
278
  "task_label": "Procedure Step Recognition",
279
  "task_number": 2
280
  },
281
+ {
282
+ "method": "128ep Metadata NN",
283
+ "metric_key": "macro_f1",
284
+ "reason": "train class count 891 exceeds --max-neural-classes 512",
285
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
286
+ "scope": "multi_episode_128_metadata_baseline",
287
+ "series_id": "metadata128_neural_mlp",
288
+ "status": "unsupported_without_required_target",
289
+ "status_label": "unsupported",
290
+ "task_id": "next_action",
291
+ "task_label": "Next-Action Prediction",
292
+ "task_number": 4
293
+ },
294
  {
295
  "method": "128ep Metadata Simple",
296
  "metric_key": "mpjpe",
 
577
  "task_label": "Multimodal Synchronization Detection",
578
  "task_number": 12
579
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
580
  {
581
  "method": "128ep Metadata NN",
582
  "metric_key": "macro_f1",
583
+ "reason": "train class count 887 exceeds --max-neural-classes 512",
584
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
585
  "scope": "multi_episode_128_metadata_baseline",
586
  "series_id": "metadata128_neural_mlp",
587
+ "status": "unsupported_without_required_target",
588
+ "status_label": "unsupported",
589
  "task_id": "long_horizon_next_action",
590
  "task_label": "Long-Horizon Next-Action Forecasting",
591
  "task_number": 13
 
616
  "task_label": "Long-Horizon Next-Action Forecasting",
617
  "task_number": 13
618
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
619
  {
620
  "method": "128ep Metadata NN",
621
  "metric_key": "macro_f1",
622
+ "reason": "train class count 651 exceeds --max-neural-classes 512",
623
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
624
  "scope": "multi_episode_128_metadata_baseline",
625
  "series_id": "metadata128_neural_mlp",
626
+ "status": "unsupported_without_required_target",
627
+ "status_label": "unsupported",
628
  "task_id": "next_subtask_forecast",
629
  "task_label": "Long-Horizon Next-Subtask Forecasting",
630
  "task_number": 14
 
658
  {
659
  "method": "128ep Metadata Simple",
660
  "metric_key": "macro_f1",
661
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
662
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
663
  "scope": "multi_episode_128_metadata_baseline",
664
  "series_id": "metadata128_simple",
665
+ "status": "unsupported_without_required_target",
666
+ "status_label": "unsupported",
667
  "task_id": "interaction_text_prediction",
668
  "task_label": "Interaction Text Prediction",
669
  "task_number": 15
 
720
  "task_label": "Interaction Text Prediction",
721
  "task_number": 15
722
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
723
  {
724
  "method": "128ep Metadata NN",
725
  "metric_key": "macro_f1",
726
+ "reason": "train class count 3058 exceeds --max-neural-classes 512",
727
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
728
  "scope": "multi_episode_128_metadata_baseline",
729
  "series_id": "metadata128_neural_mlp",
730
+ "status": "unsupported_without_required_target",
731
+ "status_label": "unsupported",
732
  "task_id": "action_object_relation",
733
  "task_label": "Action-Object Relation Prediction",
734
  "task_number": 16
 
746
  "task_label": "Action-Object Relation Prediction",
747
  "task_number": 16
748
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
749
  {
750
  "method": "Cosmos3-Super Reasoner",
751
  "metric_key": "micro_f1",
 
775
  {
776
  "method": "128ep Metadata Simple",
777
  "metric_key": "mae",
778
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
779
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
780
  "scope": "multi_episode_128_metadata_baseline",
781
  "series_id": "metadata128_simple",
782
+ "status": "unsupported_without_required_target",
783
+ "status_label": "unsupported",
784
  "task_id": "imu_to_hand_pose",
785
  "task_label": "IMU-to-Hand Pose Reconstruction",
786
  "task_number": 18
 
840
  {
841
  "method": "128ep Metadata Simple",
842
  "metric_key": "mrr",
843
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
844
+ "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
845
  "scope": "multi_episode_128_metadata_baseline",
846
  "series_id": "metadata128_simple",
847
+ "status": "unsupported_without_required_target",
848
+ "status_label": "unsupported",
849
  "task_id": "camera_view_sync_retrieval",
850
  "task_label": "Camera-View Synchronization Retrieval",
851
  "task_number": 19
 
902
  "task_label": "Camera-View Synchronization Retrieval",
903
  "task_number": 19
904
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
905
  {
906
  "method": "Cosmos3-Super Reasoner",
907
  "metric_key": "mae",
 
975
  "method_count": 9,
976
  "method_task_record_count": 180,
977
  "proxy_scored_method_task_count": 4,
978
+ "scored_method_task_count": 127,
979
+ "scoreless_method_task_count": 53,
980
  "task_count": 20
981
  },
982
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
metrics/task_method_20_result_matrix.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 123,
9
  "series": [
10
  {
11
  "id": "minimal",
@@ -64,18 +64,17 @@
64
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
- "scored_task_count": 8,
68
- "covered_task_count": 8,
69
  "proxy_scored_task_count": 0,
70
- "scoreless_task_count": 12,
71
- "unsupported_task_count": 12,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
- "not_supported_by_metadata_only_package": 8,
75
- "scored": 8,
76
- "unsupported_without_required_target": 4
77
  },
78
- "coverage_fraction": 0.4,
79
  "result_record_fraction": 1.0
80
  },
81
  {
@@ -89,17 +88,17 @@
89
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
90
  "plotted_as": "colored point overlay",
91
  "result_record_count": 20,
92
- "scored_task_count": 8,
93
- "covered_task_count": 8,
94
  "proxy_scored_task_count": 0,
95
- "scoreless_task_count": 12,
96
- "unsupported_task_count": 12,
97
  "not_evaluated_task_count": 0,
98
  "status_counts": {
99
- "not_supported_by_metadata_only_package": 12,
100
- "scored": 8
101
  },
102
- "coverage_fraction": 0.4,
103
  "result_record_fraction": 1.0
104
  },
105
  {
@@ -2210,17 +2209,17 @@
2210
  "task_label": "Long-Horizon Next-Action Forecasting",
2211
  "series_id": "metadata128_simple",
2212
  "method": "128ep Metadata Simple",
2213
- "status": "not_supported_by_metadata_only_package",
2214
- "status_label": "not supported",
2215
- "scored": false,
2216
  "proxy_scored": false,
2217
- "raw": null,
2218
- "raw_text": "n/a",
2219
- "normalized_score": null,
2220
  "metric_key": "macro_f1",
2221
- "source": null,
2222
  "scope": "multi_episode_128_metadata_baseline",
2223
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2224
  },
2225
  {
2226
  "task_number": 13,
@@ -2228,17 +2227,17 @@
2228
  "task_label": "Long-Horizon Next-Action Forecasting",
2229
  "series_id": "metadata128_neural_mlp",
2230
  "method": "128ep Metadata NN",
2231
- "status": "not_supported_by_metadata_only_package",
2232
- "status_label": "not supported",
2233
- "scored": false,
2234
  "proxy_scored": false,
2235
- "raw": null,
2236
- "raw_text": "n/a",
2237
- "normalized_score": null,
2238
  "metric_key": "macro_f1",
2239
- "source": null,
2240
  "scope": "multi_episode_128_metadata_baseline",
2241
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2242
  },
2243
  {
2244
  "task_number": 13,
@@ -2372,17 +2371,17 @@
2372
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2373
  "series_id": "metadata128_simple",
2374
  "method": "128ep Metadata Simple",
2375
- "status": "not_supported_by_metadata_only_package",
2376
- "status_label": "not supported",
2377
- "scored": false,
2378
  "proxy_scored": false,
2379
- "raw": null,
2380
- "raw_text": "n/a",
2381
- "normalized_score": null,
2382
  "metric_key": "macro_f1",
2383
- "source": null,
2384
  "scope": "multi_episode_128_metadata_baseline",
2385
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2386
  },
2387
  {
2388
  "task_number": 14,
@@ -2390,17 +2389,17 @@
2390
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2391
  "series_id": "metadata128_neural_mlp",
2392
  "method": "128ep Metadata NN",
2393
- "status": "not_supported_by_metadata_only_package",
2394
- "status_label": "not supported",
2395
- "scored": false,
2396
  "proxy_scored": false,
2397
- "raw": null,
2398
- "raw_text": "n/a",
2399
- "normalized_score": null,
2400
  "metric_key": "macro_f1",
2401
- "source": null,
2402
  "scope": "multi_episode_128_metadata_baseline",
2403
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2404
  },
2405
  {
2406
  "task_number": 14,
@@ -2534,17 +2533,17 @@
2534
  "task_label": "Interaction Text Prediction",
2535
  "series_id": "metadata128_simple",
2536
  "method": "128ep Metadata Simple",
2537
- "status": "not_supported_by_metadata_only_package",
2538
- "status_label": "not supported",
2539
  "scored": false,
2540
  "proxy_scored": false,
2541
  "raw": null,
2542
  "raw_text": "n/a",
2543
  "normalized_score": null,
2544
  "metric_key": "macro_f1",
2545
- "source": null,
2546
  "scope": "multi_episode_128_metadata_baseline",
2547
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2548
  },
2549
  {
2550
  "task_number": 15,
@@ -2696,17 +2695,17 @@
2696
  "task_label": "Action-Object Relation Prediction",
2697
  "series_id": "metadata128_simple",
2698
  "method": "128ep Metadata Simple",
2699
- "status": "not_supported_by_metadata_only_package",
2700
- "status_label": "not supported",
2701
- "scored": false,
2702
  "proxy_scored": false,
2703
- "raw": null,
2704
- "raw_text": "n/a",
2705
- "normalized_score": null,
2706
  "metric_key": "macro_f1",
2707
- "source": null,
2708
  "scope": "multi_episode_128_metadata_baseline",
2709
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2710
  },
2711
  {
2712
  "task_number": 16,
@@ -2714,17 +2713,17 @@
2714
  "task_label": "Action-Object Relation Prediction",
2715
  "series_id": "metadata128_neural_mlp",
2716
  "method": "128ep Metadata NN",
2717
- "status": "not_supported_by_metadata_only_package",
2718
- "status_label": "not supported",
2719
- "scored": false,
2720
  "proxy_scored": false,
2721
- "raw": null,
2722
- "raw_text": "n/a",
2723
- "normalized_score": null,
2724
  "metric_key": "macro_f1",
2725
- "source": null,
2726
  "scope": "multi_episode_128_metadata_baseline",
2727
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2728
  },
2729
  {
2730
  "task_number": 16,
@@ -2858,17 +2857,17 @@
2858
  "task_label": "Future Object-Set Forecasting",
2859
  "series_id": "metadata128_simple",
2860
  "method": "128ep Metadata Simple",
2861
- "status": "not_supported_by_metadata_only_package",
2862
- "status_label": "not supported",
2863
- "scored": false,
2864
  "proxy_scored": false,
2865
- "raw": null,
2866
- "raw_text": "n/a",
2867
- "normalized_score": null,
2868
  "metric_key": "micro_f1",
2869
- "source": null,
2870
  "scope": "multi_episode_128_metadata_baseline",
2871
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2872
  },
2873
  {
2874
  "task_number": 17,
@@ -2876,17 +2875,17 @@
2876
  "task_label": "Future Object-Set Forecasting",
2877
  "series_id": "metadata128_neural_mlp",
2878
  "method": "128ep Metadata NN",
2879
- "status": "not_supported_by_metadata_only_package",
2880
- "status_label": "not supported",
2881
- "scored": false,
2882
  "proxy_scored": false,
2883
- "raw": null,
2884
- "raw_text": "n/a",
2885
- "normalized_score": null,
2886
  "metric_key": "micro_f1",
2887
- "source": null,
2888
  "scope": "multi_episode_128_metadata_baseline",
2889
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2890
  },
2891
  {
2892
  "task_number": 17,
@@ -3020,17 +3019,17 @@
3020
  "task_label": "IMU-to-Hand Pose Reconstruction",
3021
  "series_id": "metadata128_simple",
3022
  "method": "128ep Metadata Simple",
3023
- "status": "not_supported_by_metadata_only_package",
3024
- "status_label": "not supported",
3025
  "scored": false,
3026
  "proxy_scored": false,
3027
  "raw": null,
3028
  "raw_text": "n/a",
3029
  "normalized_score": null,
3030
  "metric_key": "mae",
3031
- "source": null,
3032
  "scope": "multi_episode_128_metadata_baseline",
3033
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3034
  },
3035
  {
3036
  "task_number": 18,
@@ -3182,17 +3181,17 @@
3182
  "task_label": "Camera-View Synchronization Retrieval",
3183
  "series_id": "metadata128_simple",
3184
  "method": "128ep Metadata Simple",
3185
- "status": "not_supported_by_metadata_only_package",
3186
- "status_label": "not supported",
3187
  "scored": false,
3188
  "proxy_scored": false,
3189
  "raw": null,
3190
  "raw_text": "n/a",
3191
  "normalized_score": null,
3192
  "metric_key": "mrr",
3193
- "source": null,
3194
  "scope": "multi_episode_128_metadata_baseline",
3195
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3196
  },
3197
  {
3198
  "task_number": 19,
@@ -3344,17 +3343,17 @@
3344
  "task_label": "Time-to-Next-Transition Regression",
3345
  "series_id": "metadata128_simple",
3346
  "method": "128ep Metadata Simple",
3347
- "status": "not_supported_by_metadata_only_package",
3348
- "status_label": "not supported",
3349
- "scored": false,
3350
  "proxy_scored": false,
3351
- "raw": null,
3352
- "raw_text": "n/a",
3353
- "normalized_score": null,
3354
  "metric_key": "mae",
3355
- "source": null,
3356
  "scope": "multi_episode_128_metadata_baseline",
3357
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3358
  },
3359
  {
3360
  "task_number": 20,
@@ -3362,17 +3361,17 @@
3362
  "task_label": "Time-to-Next-Transition Regression",
3363
  "series_id": "metadata128_neural_mlp",
3364
  "method": "128ep Metadata NN",
3365
- "status": "not_supported_by_metadata_only_package",
3366
- "status_label": "not supported",
3367
- "scored": false,
3368
  "proxy_scored": false,
3369
- "raw": null,
3370
- "raw_text": "n/a",
3371
- "normalized_score": null,
3372
  "metric_key": "mae",
3373
- "source": null,
3374
  "scope": "multi_episode_128_metadata_baseline",
3375
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3376
  },
3377
  {
3378
  "task_number": 20,
 
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 133,
9
  "series": [
10
  {
11
  "id": "minimal",
 
64
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
+ "scored_task_count": 13,
68
+ "covered_task_count": 13,
69
  "proxy_scored_task_count": 0,
70
+ "scoreless_task_count": 7,
71
+ "unsupported_task_count": 7,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
+ "scored": 13,
75
+ "unsupported_without_required_target": 7
 
76
  },
77
+ "coverage_fraction": 0.65,
78
  "result_record_fraction": 1.0
79
  },
80
  {
 
88
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
89
  "plotted_as": "colored point overlay",
90
  "result_record_count": 20,
91
+ "scored_task_count": 13,
92
+ "covered_task_count": 13,
93
  "proxy_scored_task_count": 0,
94
+ "scoreless_task_count": 7,
95
+ "unsupported_task_count": 7,
96
  "not_evaluated_task_count": 0,
97
  "status_counts": {
98
+ "not_supported_by_metadata_only_package": 7,
99
+ "scored": 13
100
  },
101
+ "coverage_fraction": 0.65,
102
  "result_record_fraction": 1.0
103
  },
104
  {
 
2209
  "task_label": "Long-Horizon Next-Action Forecasting",
2210
  "series_id": "metadata128_simple",
2211
  "method": "128ep Metadata Simple",
2212
+ "status": "scored",
2213
+ "status_label": "scored",
2214
+ "scored": true,
2215
  "proxy_scored": false,
2216
+ "raw": 0.004579592783699693,
2217
+ "raw_text": "0.0046",
2218
+ "normalized_score": 0.004579592783699693,
2219
  "metric_key": "macro_f1",
2220
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
2221
  "scope": "multi_episode_128_metadata_baseline",
2222
+ "reason": null
2223
  },
2224
  {
2225
  "task_number": 13,
 
2227
  "task_label": "Long-Horizon Next-Action Forecasting",
2228
  "series_id": "metadata128_neural_mlp",
2229
  "method": "128ep Metadata NN",
2230
+ "status": "scored",
2231
+ "status_label": "scored",
2232
+ "scored": true,
2233
  "proxy_scored": false,
2234
+ "raw": 0.0029821307969142615,
2235
+ "raw_text": "0.0030",
2236
+ "normalized_score": 0.0029821307969142615,
2237
  "metric_key": "macro_f1",
2238
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
2239
  "scope": "multi_episode_128_metadata_baseline",
2240
+ "reason": null
2241
  },
2242
  {
2243
  "task_number": 13,
 
2371
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2372
  "series_id": "metadata128_simple",
2373
  "method": "128ep Metadata Simple",
2374
+ "status": "scored",
2375
+ "status_label": "scored",
2376
+ "scored": true,
2377
  "proxy_scored": false,
2378
+ "raw": 0.0001206030150753769,
2379
+ "raw_text": "0.0001",
2380
+ "normalized_score": 0.0001206030150753769,
2381
  "metric_key": "macro_f1",
2382
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
2383
  "scope": "multi_episode_128_metadata_baseline",
2384
+ "reason": null
2385
  },
2386
  {
2387
  "task_number": 14,
 
2389
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2390
  "series_id": "metadata128_neural_mlp",
2391
  "method": "128ep Metadata NN",
2392
+ "status": "scored",
2393
+ "status_label": "scored",
2394
+ "scored": true,
2395
  "proxy_scored": false,
2396
+ "raw": 2.086049543676662e-05,
2397
+ "raw_text": "0.0000",
2398
+ "normalized_score": 2.086049543676662e-05,
2399
  "metric_key": "macro_f1",
2400
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
2401
  "scope": "multi_episode_128_metadata_baseline",
2402
+ "reason": null
2403
  },
2404
  {
2405
  "task_number": 14,
 
2533
  "task_label": "Interaction Text Prediction",
2534
  "series_id": "metadata128_simple",
2535
  "method": "128ep Metadata Simple",
2536
+ "status": "unsupported_without_required_target",
2537
+ "status_label": "unsupported",
2538
  "scored": false,
2539
  "proxy_scored": false,
2540
  "raw": null,
2541
  "raw_text": "n/a",
2542
  "normalized_score": null,
2543
  "metric_key": "macro_f1",
2544
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
2545
  "scope": "multi_episode_128_metadata_baseline",
2546
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
2547
  },
2548
  {
2549
  "task_number": 15,
 
2695
  "task_label": "Action-Object Relation Prediction",
2696
  "series_id": "metadata128_simple",
2697
  "method": "128ep Metadata Simple",
2698
+ "status": "scored",
2699
+ "status_label": "scored",
2700
+ "scored": true,
2701
  "proxy_scored": false,
2702
+ "raw": 0.0,
2703
+ "raw_text": "0.0000",
2704
+ "normalized_score": 0.0,
2705
  "metric_key": "macro_f1",
2706
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
2707
  "scope": "multi_episode_128_metadata_baseline",
2708
+ "reason": null
2709
  },
2710
  {
2711
  "task_number": 16,
 
2713
  "task_label": "Action-Object Relation Prediction",
2714
  "series_id": "metadata128_neural_mlp",
2715
  "method": "128ep Metadata NN",
2716
+ "status": "scored",
2717
+ "status_label": "scored",
2718
+ "scored": true,
2719
  "proxy_scored": false,
2720
+ "raw": 0.0,
2721
+ "raw_text": "0.0000",
2722
+ "normalized_score": 0.0,
2723
  "metric_key": "macro_f1",
2724
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
2725
  "scope": "multi_episode_128_metadata_baseline",
2726
+ "reason": null
2727
  },
2728
  {
2729
  "task_number": 16,
 
2857
  "task_label": "Future Object-Set Forecasting",
2858
  "series_id": "metadata128_simple",
2859
  "method": "128ep Metadata Simple",
2860
+ "status": "scored",
2861
+ "status_label": "scored",
2862
+ "scored": true,
2863
  "proxy_scored": false,
2864
+ "raw": 0.17656983343047333,
2865
+ "raw_text": "0.1766",
2866
+ "normalized_score": 0.17656983343047333,
2867
  "metric_key": "micro_f1",
2868
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2869
  "scope": "multi_episode_128_metadata_baseline",
2870
+ "reason": null
2871
  },
2872
  {
2873
  "task_number": 17,
 
2875
  "task_label": "Future Object-Set Forecasting",
2876
  "series_id": "metadata128_neural_mlp",
2877
  "method": "128ep Metadata NN",
2878
+ "status": "scored",
2879
+ "status_label": "scored",
2880
+ "scored": true,
2881
  "proxy_scored": false,
2882
+ "raw": 0.17418550827844048,
2883
+ "raw_text": "0.1742",
2884
+ "normalized_score": 0.17418550827844048,
2885
  "metric_key": "micro_f1",
2886
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2887
  "scope": "multi_episode_128_metadata_baseline",
2888
+ "reason": null
2889
  },
2890
  {
2891
  "task_number": 17,
 
3019
  "task_label": "IMU-to-Hand Pose Reconstruction",
3020
  "series_id": "metadata128_simple",
3021
  "method": "128ep Metadata Simple",
3022
+ "status": "unsupported_without_required_target",
3023
+ "status_label": "unsupported",
3024
  "scored": false,
3025
  "proxy_scored": false,
3026
  "raw": null,
3027
  "raw_text": "n/a",
3028
  "normalized_score": null,
3029
  "metric_key": "mae",
3030
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
3031
  "scope": "multi_episode_128_metadata_baseline",
3032
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
3033
  },
3034
  {
3035
  "task_number": 18,
 
3181
  "task_label": "Camera-View Synchronization Retrieval",
3182
  "series_id": "metadata128_simple",
3183
  "method": "128ep Metadata Simple",
3184
+ "status": "unsupported_without_required_target",
3185
+ "status_label": "unsupported",
3186
  "scored": false,
3187
  "proxy_scored": false,
3188
  "raw": null,
3189
  "raw_text": "n/a",
3190
  "normalized_score": null,
3191
  "metric_key": "mrr",
3192
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
3193
  "scope": "multi_episode_128_metadata_baseline",
3194
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
3195
  },
3196
  {
3197
  "task_number": 19,
 
3343
  "task_label": "Time-to-Next-Transition Regression",
3344
  "series_id": "metadata128_simple",
3345
  "method": "128ep Metadata Simple",
3346
+ "status": "scored",
3347
+ "status_label": "scored",
3348
+ "scored": true,
3349
  "proxy_scored": false,
3350
+ "raw": 624.8108520507812,
3351
+ "raw_text": "624.81",
3352
+ "normalized_score": 0.016864874132806403,
3353
  "metric_key": "mae",
3354
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
3355
  "scope": "multi_episode_128_metadata_baseline",
3356
+ "reason": null
3357
  },
3358
  {
3359
  "task_number": 20,
 
3361
  "task_label": "Time-to-Next-Transition Regression",
3362
  "series_id": "metadata128_neural_mlp",
3363
  "method": "128ep Metadata NN",
3364
+ "status": "scored",
3365
+ "status_label": "scored",
3366
+ "scored": true,
3367
  "proxy_scored": false,
3368
+ "raw": 41.4664421081543,
3369
+ "raw_text": "41.47",
3370
+ "normalized_score": 0.25411768748242325,
3371
  "metric_key": "mae",
3372
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
3373
  "scope": "multi_episode_128_metadata_baseline",
3374
+ "reason": null
3375
  },
3376
  {
3377
  "task_number": 20,
metrics/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:18:04+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:25+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
metrics/unified_task_model_radar.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T11:15:02+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 123,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -73,18 +73,17 @@
73
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
- "scored_task_count": 8,
77
- "covered_task_count": 8,
78
  "proxy_scored_task_count": 0,
79
- "scoreless_task_count": 12,
80
- "unsupported_task_count": 12,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
- "not_supported_by_metadata_only_package": 8,
84
- "scored": 8,
85
- "unsupported_without_required_target": 4
86
  },
87
- "coverage_fraction": 0.4,
88
  "result_record_fraction": 1.0
89
  },
90
  {
@@ -98,17 +97,17 @@
98
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
99
  "plotted_as": "colored point overlay",
100
  "result_record_count": 20,
101
- "scored_task_count": 8,
102
- "covered_task_count": 8,
103
  "proxy_scored_task_count": 0,
104
- "scoreless_task_count": 12,
105
- "unsupported_task_count": 12,
106
  "not_evaluated_task_count": 0,
107
  "status_counts": {
108
- "not_supported_by_metadata_only_package": 12,
109
- "scored": 8
110
  },
111
- "coverage_fraction": 0.4,
112
  "result_record_fraction": 1.0
113
  },
114
  {
@@ -1608,6 +1607,28 @@
1608
  "raw_text": "0.0023",
1609
  "status_label": "scored"
1610
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1611
  "raw128_simple": {
1612
  "raw": 0.0024280172369056294,
1613
  "metric_key": "macro_f1",
@@ -1630,28 +1651,6 @@
1630
  "raw_text": "0.0011",
1631
  "status_label": "scored"
1632
  },
1633
- "metadata128_simple": {
1634
- "raw": null,
1635
- "metric_key": "macro_f1",
1636
- "source": null,
1637
- "scope": "multi_episode_128_metadata_baseline",
1638
- "status": "not_supported_by_metadata_only_package",
1639
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1640
- "normalized_score": null,
1641
- "raw_text": "n/a",
1642
- "status_label": "not supported"
1643
- },
1644
- "metadata128_neural_mlp": {
1645
- "raw": null,
1646
- "metric_key": "macro_f1",
1647
- "source": null,
1648
- "scope": "multi_episode_128_metadata_baseline",
1649
- "status": "not_supported_by_metadata_only_package",
1650
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1651
- "normalized_score": null,
1652
- "raw_text": "n/a",
1653
- "status_label": "not supported"
1654
- },
1655
  "cosmos3_super_reasoner": {
1656
  "raw": null,
1657
  "metric_key": "macro_f1",
@@ -1719,6 +1718,28 @@
1719
  "raw_text": "0.0042",
1720
  "status_label": "scored"
1721
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1722
  "raw128_simple": {
1723
  "raw": 0.0,
1724
  "metric_key": "macro_f1",
@@ -1741,28 +1762,6 @@
1741
  "raw_text": "0.0000",
1742
  "status_label": "scored"
1743
  },
1744
- "metadata128_simple": {
1745
- "raw": null,
1746
- "metric_key": "macro_f1",
1747
- "source": null,
1748
- "scope": "multi_episode_128_metadata_baseline",
1749
- "status": "not_supported_by_metadata_only_package",
1750
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1751
- "normalized_score": null,
1752
- "raw_text": "n/a",
1753
- "status_label": "not supported"
1754
- },
1755
- "metadata128_neural_mlp": {
1756
- "raw": null,
1757
- "metric_key": "macro_f1",
1758
- "source": null,
1759
- "scope": "multi_episode_128_metadata_baseline",
1760
- "status": "not_supported_by_metadata_only_package",
1761
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1762
- "normalized_score": null,
1763
- "raw_text": "n/a",
1764
- "status_label": "not supported"
1765
- },
1766
  "cosmos3_super_reasoner": {
1767
  "raw": null,
1768
  "metric_key": "macro_f1",
@@ -1819,6 +1818,17 @@
1819
  "raw_text": "0.0381",
1820
  "status_label": "scored"
1821
  },
 
 
 
 
 
 
 
 
 
 
 
1822
  "raw128_simple": {
1823
  "raw": 0.012611998261547169,
1824
  "metric_key": "macro_f1",
@@ -1841,17 +1851,6 @@
1841
  "raw_text": "0.0098",
1842
  "status_label": "proxy scored"
1843
  },
1844
- "metadata128_simple": {
1845
- "raw": null,
1846
- "metric_key": "macro_f1",
1847
- "source": null,
1848
- "scope": "multi_episode_128_metadata_baseline",
1849
- "status": "not_supported_by_metadata_only_package",
1850
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1851
- "normalized_score": null,
1852
- "raw_text": "n/a",
1853
- "status_label": "not supported"
1854
- },
1855
  "metadata128_neural_mlp": {
1856
  "raw": null,
1857
  "metric_key": "macro_f1",
@@ -1952,6 +1951,28 @@
1952
  "raw_text": "0.0000",
1953
  "status_label": "scored"
1954
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1955
  "raw128_simple": {
1956
  "raw": 0.0,
1957
  "metric_key": "macro_f1",
@@ -1974,28 +1995,6 @@
1974
  "raw_text": "0.0000",
1975
  "status_label": "scored"
1976
  },
1977
- "metadata128_simple": {
1978
- "raw": null,
1979
- "metric_key": "macro_f1",
1980
- "source": null,
1981
- "scope": "multi_episode_128_metadata_baseline",
1982
- "status": "not_supported_by_metadata_only_package",
1983
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1984
- "normalized_score": null,
1985
- "raw_text": "n/a",
1986
- "status_label": "not supported"
1987
- },
1988
- "metadata128_neural_mlp": {
1989
- "raw": null,
1990
- "metric_key": "macro_f1",
1991
- "source": null,
1992
- "scope": "multi_episode_128_metadata_baseline",
1993
- "status": "not_supported_by_metadata_only_package",
1994
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1995
- "normalized_score": null,
1996
- "raw_text": "n/a",
1997
- "status_label": "not supported"
1998
- },
1999
  "cosmos3_nano_future_window": {
2000
  "raw": null,
2001
  "metric_key": "macro_f1",
@@ -2052,6 +2051,28 @@
2052
  "raw_text": "0.1659",
2053
  "status_label": "scored"
2054
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2055
  "raw128_simple": {
2056
  "raw": 0.06469493412657774,
2057
  "metric_key": "micro_f1",
@@ -2074,28 +2095,6 @@
2074
  "raw_text": "0.1752",
2075
  "status_label": "scored"
2076
  },
2077
- "metadata128_simple": {
2078
- "raw": null,
2079
- "metric_key": "micro_f1",
2080
- "source": null,
2081
- "scope": "multi_episode_128_metadata_baseline",
2082
- "status": "not_supported_by_metadata_only_package",
2083
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2084
- "normalized_score": null,
2085
- "raw_text": "n/a",
2086
- "status_label": "not supported"
2087
- },
2088
- "metadata128_neural_mlp": {
2089
- "raw": null,
2090
- "metric_key": "micro_f1",
2091
- "source": null,
2092
- "scope": "multi_episode_128_metadata_baseline",
2093
- "status": "not_supported_by_metadata_only_package",
2094
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2095
- "normalized_score": null,
2096
- "raw_text": "n/a",
2097
- "status_label": "not supported"
2098
- },
2099
  "cosmos3_super_reasoner": {
2100
  "raw": null,
2101
  "metric_key": "micro_f1",
@@ -2152,6 +2151,17 @@
2152
  "raw_text": "0.0426",
2153
  "status_label": "scored"
2154
  },
 
 
 
 
 
 
 
 
 
 
 
2155
  "raw128_simple": {
2156
  "raw": 0.22941437363624573,
2157
  "metric_key": "mae",
@@ -2174,17 +2184,6 @@
2174
  "raw_text": "0.2530",
2175
  "status_label": "scored"
2176
  },
2177
- "metadata128_simple": {
2178
- "raw": null,
2179
- "metric_key": "mae",
2180
- "source": null,
2181
- "scope": "multi_episode_128_metadata_baseline",
2182
- "status": "not_supported_by_metadata_only_package",
2183
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2184
- "normalized_score": null,
2185
- "raw_text": "n/a",
2186
- "status_label": "not supported"
2187
- },
2188
  "metadata128_neural_mlp": {
2189
  "raw": null,
2190
  "metric_key": "mae",
@@ -2263,6 +2262,17 @@
2263
  "raw_text": "0.2409",
2264
  "status_label": "scored"
2265
  },
 
 
 
 
 
 
 
 
 
 
 
2266
  "raw128_simple": {
2267
  "raw": 0.0026625150348991156,
2268
  "metric_key": "mrr",
@@ -2285,17 +2295,6 @@
2285
  "raw_text": "0.0025",
2286
  "status_label": "proxy scored"
2287
  },
2288
- "metadata128_simple": {
2289
- "raw": null,
2290
- "metric_key": "mrr",
2291
- "source": null,
2292
- "scope": "multi_episode_128_metadata_baseline",
2293
- "status": "not_supported_by_metadata_only_package",
2294
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2295
- "normalized_score": null,
2296
- "raw_text": "n/a",
2297
- "status_label": "not supported"
2298
- },
2299
  "metadata128_neural_mlp": {
2300
  "raw": null,
2301
  "metric_key": "mrr",
@@ -2385,6 +2384,28 @@
2385
  "raw_text": "134.07",
2386
  "status_label": "scored"
2387
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2388
  "raw128_simple": {
2389
  "raw": 52.32759475708008,
2390
  "metric_key": "mae",
@@ -2407,28 +2428,6 @@
2407
  "raw_text": "42.37",
2408
  "status_label": "scored"
2409
  },
2410
- "metadata128_simple": {
2411
- "raw": null,
2412
- "metric_key": "mae",
2413
- "source": null,
2414
- "scope": "multi_episode_128_metadata_baseline",
2415
- "status": "not_supported_by_metadata_only_package",
2416
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2417
- "normalized_score": null,
2418
- "raw_text": "n/a",
2419
- "status_label": "not supported"
2420
- },
2421
- "metadata128_neural_mlp": {
2422
- "raw": null,
2423
- "metric_key": "mae",
2424
- "source": null,
2425
- "scope": "multi_episode_128_metadata_baseline",
2426
- "status": "not_supported_by_metadata_only_package",
2427
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2428
- "normalized_score": null,
2429
- "raw_text": "n/a",
2430
- "status_label": "not supported"
2431
- },
2432
  "cosmos3_super_reasoner": {
2433
  "raw": null,
2434
  "metric_key": "mae",
@@ -2459,7 +2458,7 @@
2459
  "id": "metadata128_simple",
2460
  "title": "128ep Metadata Simple",
2461
  "status": "a100_rerun_pass",
2462
- "coverage": "20 records / 8 scored JSONL-supported axes",
2463
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2464
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2465
  },
@@ -2467,7 +2466,7 @@
2467
  "id": "metadata128_neural_mlp",
2468
  "title": "128ep Metadata NN",
2469
  "status": "a100_rerun_pass",
2470
- "coverage": "20 records / 8 scored JSONL-supported axes",
2471
  "headline": "compact MLP heads over metadata/text features",
2472
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2473
  },
@@ -4508,17 +4507,17 @@
4508
  "task_label": "Long-Horizon Next-Action Forecasting",
4509
  "series_id": "metadata128_simple",
4510
  "method": "128ep Metadata Simple",
4511
- "status": "not_supported_by_metadata_only_package",
4512
- "status_label": "not supported",
4513
- "scored": false,
4514
  "proxy_scored": false,
4515
- "raw": null,
4516
- "raw_text": "n/a",
4517
- "normalized_score": null,
4518
  "metric_key": "macro_f1",
4519
- "source": null,
4520
  "scope": "multi_episode_128_metadata_baseline",
4521
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4522
  },
4523
  {
4524
  "task_number": 13,
@@ -4526,17 +4525,17 @@
4526
  "task_label": "Long-Horizon Next-Action Forecasting",
4527
  "series_id": "metadata128_neural_mlp",
4528
  "method": "128ep Metadata NN",
4529
- "status": "not_supported_by_metadata_only_package",
4530
- "status_label": "not supported",
4531
- "scored": false,
4532
  "proxy_scored": false,
4533
- "raw": null,
4534
- "raw_text": "n/a",
4535
- "normalized_score": null,
4536
  "metric_key": "macro_f1",
4537
- "source": null,
4538
  "scope": "multi_episode_128_metadata_baseline",
4539
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4540
  },
4541
  {
4542
  "task_number": 13,
@@ -4670,17 +4669,17 @@
4670
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4671
  "series_id": "metadata128_simple",
4672
  "method": "128ep Metadata Simple",
4673
- "status": "not_supported_by_metadata_only_package",
4674
- "status_label": "not supported",
4675
- "scored": false,
4676
  "proxy_scored": false,
4677
- "raw": null,
4678
- "raw_text": "n/a",
4679
- "normalized_score": null,
4680
  "metric_key": "macro_f1",
4681
- "source": null,
4682
  "scope": "multi_episode_128_metadata_baseline",
4683
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4684
  },
4685
  {
4686
  "task_number": 14,
@@ -4688,17 +4687,17 @@
4688
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4689
  "series_id": "metadata128_neural_mlp",
4690
  "method": "128ep Metadata NN",
4691
- "status": "not_supported_by_metadata_only_package",
4692
- "status_label": "not supported",
4693
- "scored": false,
4694
  "proxy_scored": false,
4695
- "raw": null,
4696
- "raw_text": "n/a",
4697
- "normalized_score": null,
4698
  "metric_key": "macro_f1",
4699
- "source": null,
4700
  "scope": "multi_episode_128_metadata_baseline",
4701
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4702
  },
4703
  {
4704
  "task_number": 14,
@@ -4832,17 +4831,17 @@
4832
  "task_label": "Interaction Text Prediction",
4833
  "series_id": "metadata128_simple",
4834
  "method": "128ep Metadata Simple",
4835
- "status": "not_supported_by_metadata_only_package",
4836
- "status_label": "not supported",
4837
  "scored": false,
4838
  "proxy_scored": false,
4839
  "raw": null,
4840
  "raw_text": "n/a",
4841
  "normalized_score": null,
4842
  "metric_key": "macro_f1",
4843
- "source": null,
4844
  "scope": "multi_episode_128_metadata_baseline",
4845
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4846
  },
4847
  {
4848
  "task_number": 15,
@@ -4994,17 +4993,17 @@
4994
  "task_label": "Action-Object Relation Prediction",
4995
  "series_id": "metadata128_simple",
4996
  "method": "128ep Metadata Simple",
4997
- "status": "not_supported_by_metadata_only_package",
4998
- "status_label": "not supported",
4999
- "scored": false,
5000
  "proxy_scored": false,
5001
- "raw": null,
5002
- "raw_text": "n/a",
5003
- "normalized_score": null,
5004
  "metric_key": "macro_f1",
5005
- "source": null,
5006
  "scope": "multi_episode_128_metadata_baseline",
5007
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5008
  },
5009
  {
5010
  "task_number": 16,
@@ -5012,17 +5011,17 @@
5012
  "task_label": "Action-Object Relation Prediction",
5013
  "series_id": "metadata128_neural_mlp",
5014
  "method": "128ep Metadata NN",
5015
- "status": "not_supported_by_metadata_only_package",
5016
- "status_label": "not supported",
5017
- "scored": false,
5018
  "proxy_scored": false,
5019
- "raw": null,
5020
- "raw_text": "n/a",
5021
- "normalized_score": null,
5022
  "metric_key": "macro_f1",
5023
- "source": null,
5024
  "scope": "multi_episode_128_metadata_baseline",
5025
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5026
  },
5027
  {
5028
  "task_number": 16,
@@ -5156,17 +5155,17 @@
5156
  "task_label": "Future Object-Set Forecasting",
5157
  "series_id": "metadata128_simple",
5158
  "method": "128ep Metadata Simple",
5159
- "status": "not_supported_by_metadata_only_package",
5160
- "status_label": "not supported",
5161
- "scored": false,
5162
  "proxy_scored": false,
5163
- "raw": null,
5164
- "raw_text": "n/a",
5165
- "normalized_score": null,
5166
  "metric_key": "micro_f1",
5167
- "source": null,
5168
  "scope": "multi_episode_128_metadata_baseline",
5169
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5170
  },
5171
  {
5172
  "task_number": 17,
@@ -5174,17 +5173,17 @@
5174
  "task_label": "Future Object-Set Forecasting",
5175
  "series_id": "metadata128_neural_mlp",
5176
  "method": "128ep Metadata NN",
5177
- "status": "not_supported_by_metadata_only_package",
5178
- "status_label": "not supported",
5179
- "scored": false,
5180
  "proxy_scored": false,
5181
- "raw": null,
5182
- "raw_text": "n/a",
5183
- "normalized_score": null,
5184
  "metric_key": "micro_f1",
5185
- "source": null,
5186
  "scope": "multi_episode_128_metadata_baseline",
5187
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5188
  },
5189
  {
5190
  "task_number": 17,
@@ -5318,17 +5317,17 @@
5318
  "task_label": "IMU-to-Hand Pose Reconstruction",
5319
  "series_id": "metadata128_simple",
5320
  "method": "128ep Metadata Simple",
5321
- "status": "not_supported_by_metadata_only_package",
5322
- "status_label": "not supported",
5323
  "scored": false,
5324
  "proxy_scored": false,
5325
  "raw": null,
5326
  "raw_text": "n/a",
5327
  "normalized_score": null,
5328
  "metric_key": "mae",
5329
- "source": null,
5330
  "scope": "multi_episode_128_metadata_baseline",
5331
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5332
  },
5333
  {
5334
  "task_number": 18,
@@ -5480,17 +5479,17 @@
5480
  "task_label": "Camera-View Synchronization Retrieval",
5481
  "series_id": "metadata128_simple",
5482
  "method": "128ep Metadata Simple",
5483
- "status": "not_supported_by_metadata_only_package",
5484
- "status_label": "not supported",
5485
  "scored": false,
5486
  "proxy_scored": false,
5487
  "raw": null,
5488
  "raw_text": "n/a",
5489
  "normalized_score": null,
5490
  "metric_key": "mrr",
5491
- "source": null,
5492
  "scope": "multi_episode_128_metadata_baseline",
5493
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5494
  },
5495
  {
5496
  "task_number": 19,
@@ -5642,17 +5641,17 @@
5642
  "task_label": "Time-to-Next-Transition Regression",
5643
  "series_id": "metadata128_simple",
5644
  "method": "128ep Metadata Simple",
5645
- "status": "not_supported_by_metadata_only_package",
5646
- "status_label": "not supported",
5647
- "scored": false,
5648
  "proxy_scored": false,
5649
- "raw": null,
5650
- "raw_text": "n/a",
5651
- "normalized_score": null,
5652
  "metric_key": "mae",
5653
- "source": null,
5654
  "scope": "multi_episode_128_metadata_baseline",
5655
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5656
  },
5657
  {
5658
  "task_number": 20,
@@ -5660,17 +5659,17 @@
5660
  "task_label": "Time-to-Next-Transition Regression",
5661
  "series_id": "metadata128_neural_mlp",
5662
  "method": "128ep Metadata NN",
5663
- "status": "not_supported_by_metadata_only_package",
5664
- "status_label": "not supported",
5665
- "scored": false,
5666
  "proxy_scored": false,
5667
- "raw": null,
5668
- "raw_text": "n/a",
5669
- "normalized_score": null,
5670
  "metric_key": "mae",
5671
- "source": null,
5672
  "scope": "multi_episode_128_metadata_baseline",
5673
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5674
  },
5675
  {
5676
  "task_number": 20,
 
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 133,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
 
73
  "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
+ "scored_task_count": 13,
77
+ "covered_task_count": 13,
78
  "proxy_scored_task_count": 0,
79
+ "scoreless_task_count": 7,
80
+ "unsupported_task_count": 7,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
+ "scored": 13,
84
+ "unsupported_without_required_target": 7
 
85
  },
86
+ "coverage_fraction": 0.65,
87
  "result_record_fraction": 1.0
88
  },
89
  {
 
97
  "method_detail": "128-episode JSONL metadata/text MLP baselines.",
98
  "plotted_as": "colored point overlay",
99
  "result_record_count": 20,
100
+ "scored_task_count": 13,
101
+ "covered_task_count": 13,
102
  "proxy_scored_task_count": 0,
103
+ "scoreless_task_count": 7,
104
+ "unsupported_task_count": 7,
105
  "not_evaluated_task_count": 0,
106
  "status_counts": {
107
+ "not_supported_by_metadata_only_package": 7,
108
+ "scored": 13
109
  },
110
+ "coverage_fraction": 0.65,
111
  "result_record_fraction": 1.0
112
  },
113
  {
 
1607
  "raw_text": "0.0023",
1608
  "status_label": "scored"
1609
  },
1610
+ "metadata128_simple": {
1611
+ "raw": 0.004579592783699693,
1612
+ "metric_key": "macro_f1",
1613
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1614
+ "scope": "multi_episode_128_metadata_baseline",
1615
+ "status": "scored",
1616
+ "reason": null,
1617
+ "normalized_score": 0.004579592783699693,
1618
+ "raw_text": "0.0046",
1619
+ "status_label": "scored"
1620
+ },
1621
+ "metadata128_neural_mlp": {
1622
+ "raw": 0.0029821307969142615,
1623
+ "metric_key": "macro_f1",
1624
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1625
+ "scope": "multi_episode_128_metadata_baseline",
1626
+ "status": "scored",
1627
+ "reason": null,
1628
+ "normalized_score": 0.0029821307969142615,
1629
+ "raw_text": "0.0030",
1630
+ "status_label": "scored"
1631
+ },
1632
  "raw128_simple": {
1633
  "raw": 0.0024280172369056294,
1634
  "metric_key": "macro_f1",
 
1651
  "raw_text": "0.0011",
1652
  "status_label": "scored"
1653
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1654
  "cosmos3_super_reasoner": {
1655
  "raw": null,
1656
  "metric_key": "macro_f1",
 
1718
  "raw_text": "0.0042",
1719
  "status_label": "scored"
1720
  },
1721
+ "metadata128_simple": {
1722
+ "raw": 0.0001206030150753769,
1723
+ "metric_key": "macro_f1",
1724
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1725
+ "scope": "multi_episode_128_metadata_baseline",
1726
+ "status": "scored",
1727
+ "reason": null,
1728
+ "normalized_score": 0.0001206030150753769,
1729
+ "raw_text": "0.0001",
1730
+ "status_label": "scored"
1731
+ },
1732
+ "metadata128_neural_mlp": {
1733
+ "raw": 2.086049543676662e-05,
1734
+ "metric_key": "macro_f1",
1735
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1736
+ "scope": "multi_episode_128_metadata_baseline",
1737
+ "status": "scored",
1738
+ "reason": null,
1739
+ "normalized_score": 2.086049543676662e-05,
1740
+ "raw_text": "0.0000",
1741
+ "status_label": "scored"
1742
+ },
1743
  "raw128_simple": {
1744
  "raw": 0.0,
1745
  "metric_key": "macro_f1",
 
1762
  "raw_text": "0.0000",
1763
  "status_label": "scored"
1764
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1765
  "cosmos3_super_reasoner": {
1766
  "raw": null,
1767
  "metric_key": "macro_f1",
 
1818
  "raw_text": "0.0381",
1819
  "status_label": "scored"
1820
  },
1821
+ "metadata128_simple": {
1822
+ "raw": null,
1823
+ "metric_key": "macro_f1",
1824
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1825
+ "scope": "multi_episode_128_metadata_baseline",
1826
+ "status": "unsupported_without_required_target",
1827
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1828
+ "normalized_score": null,
1829
+ "raw_text": "n/a",
1830
+ "status_label": "unsupported"
1831
+ },
1832
  "raw128_simple": {
1833
  "raw": 0.012611998261547169,
1834
  "metric_key": "macro_f1",
 
1851
  "raw_text": "0.0098",
1852
  "status_label": "proxy scored"
1853
  },
 
 
 
 
 
 
 
 
 
 
 
1854
  "metadata128_neural_mlp": {
1855
  "raw": null,
1856
  "metric_key": "macro_f1",
 
1951
  "raw_text": "0.0000",
1952
  "status_label": "scored"
1953
  },
1954
+ "metadata128_simple": {
1955
+ "raw": 0.0,
1956
+ "metric_key": "macro_f1",
1957
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1958
+ "scope": "multi_episode_128_metadata_baseline",
1959
+ "status": "scored",
1960
+ "reason": null,
1961
+ "normalized_score": 0.0,
1962
+ "raw_text": "0.0000",
1963
+ "status_label": "scored"
1964
+ },
1965
+ "metadata128_neural_mlp": {
1966
+ "raw": 0.0,
1967
+ "metric_key": "macro_f1",
1968
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1969
+ "scope": "multi_episode_128_metadata_baseline",
1970
+ "status": "scored",
1971
+ "reason": null,
1972
+ "normalized_score": 0.0,
1973
+ "raw_text": "0.0000",
1974
+ "status_label": "scored"
1975
+ },
1976
  "raw128_simple": {
1977
  "raw": 0.0,
1978
  "metric_key": "macro_f1",
 
1995
  "raw_text": "0.0000",
1996
  "status_label": "scored"
1997
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1998
  "cosmos3_nano_future_window": {
1999
  "raw": null,
2000
  "metric_key": "macro_f1",
 
2051
  "raw_text": "0.1659",
2052
  "status_label": "scored"
2053
  },
2054
+ "metadata128_simple": {
2055
+ "raw": 0.17656983343047333,
2056
+ "metric_key": "micro_f1",
2057
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2058
+ "scope": "multi_episode_128_metadata_baseline",
2059
+ "status": "scored",
2060
+ "reason": null,
2061
+ "normalized_score": 0.17656983343047333,
2062
+ "raw_text": "0.1766",
2063
+ "status_label": "scored"
2064
+ },
2065
+ "metadata128_neural_mlp": {
2066
+ "raw": 0.17418550827844048,
2067
+ "metric_key": "micro_f1",
2068
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2069
+ "scope": "multi_episode_128_metadata_baseline",
2070
+ "status": "scored",
2071
+ "reason": null,
2072
+ "normalized_score": 0.17418550827844048,
2073
+ "raw_text": "0.1742",
2074
+ "status_label": "scored"
2075
+ },
2076
  "raw128_simple": {
2077
  "raw": 0.06469493412657774,
2078
  "metric_key": "micro_f1",
 
2095
  "raw_text": "0.1752",
2096
  "status_label": "scored"
2097
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2098
  "cosmos3_super_reasoner": {
2099
  "raw": null,
2100
  "metric_key": "micro_f1",
 
2151
  "raw_text": "0.0426",
2152
  "status_label": "scored"
2153
  },
2154
+ "metadata128_simple": {
2155
+ "raw": null,
2156
+ "metric_key": "mae",
2157
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
2158
+ "scope": "multi_episode_128_metadata_baseline",
2159
+ "status": "unsupported_without_required_target",
2160
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
2161
+ "normalized_score": null,
2162
+ "raw_text": "n/a",
2163
+ "status_label": "unsupported"
2164
+ },
2165
  "raw128_simple": {
2166
  "raw": 0.22941437363624573,
2167
  "metric_key": "mae",
 
2184
  "raw_text": "0.2530",
2185
  "status_label": "scored"
2186
  },
 
 
 
 
 
 
 
 
 
 
 
2187
  "metadata128_neural_mlp": {
2188
  "raw": null,
2189
  "metric_key": "mae",
 
2262
  "raw_text": "0.2409",
2263
  "status_label": "scored"
2264
  },
2265
+ "metadata128_simple": {
2266
+ "raw": null,
2267
+ "metric_key": "mrr",
2268
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
2269
+ "scope": "multi_episode_128_metadata_baseline",
2270
+ "status": "unsupported_without_required_target",
2271
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
2272
+ "normalized_score": null,
2273
+ "raw_text": "n/a",
2274
+ "status_label": "unsupported"
2275
+ },
2276
  "raw128_simple": {
2277
  "raw": 0.0026625150348991156,
2278
  "metric_key": "mrr",
 
2295
  "raw_text": "0.0025",
2296
  "status_label": "proxy scored"
2297
  },
 
 
 
 
 
 
 
 
 
 
 
2298
  "metadata128_neural_mlp": {
2299
  "raw": null,
2300
  "metric_key": "mrr",
 
2384
  "raw_text": "134.07",
2385
  "status_label": "scored"
2386
  },
2387
+ "metadata128_simple": {
2388
+ "raw": 624.8108520507812,
2389
+ "metric_key": "mae",
2390
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
2391
+ "scope": "multi_episode_128_metadata_baseline",
2392
+ "status": "scored",
2393
+ "reason": null,
2394
+ "normalized_score": 0.016864874132806403,
2395
+ "raw_text": "624.81",
2396
+ "status_label": "scored"
2397
+ },
2398
+ "metadata128_neural_mlp": {
2399
+ "raw": 41.4664421081543,
2400
+ "metric_key": "mae",
2401
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
2402
+ "scope": "multi_episode_128_metadata_baseline",
2403
+ "status": "scored",
2404
+ "reason": null,
2405
+ "normalized_score": 0.25411768748242325,
2406
+ "raw_text": "41.47",
2407
+ "status_label": "scored"
2408
+ },
2409
  "raw128_simple": {
2410
  "raw": 52.32759475708008,
2411
  "metric_key": "mae",
 
2428
  "raw_text": "42.37",
2429
  "status_label": "scored"
2430
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2431
  "cosmos3_super_reasoner": {
2432
  "raw": null,
2433
  "metric_key": "mae",
 
2458
  "id": "metadata128_simple",
2459
  "title": "128ep Metadata Simple",
2460
  "status": "a100_rerun_pass",
2461
+ "coverage": "20 records / 13 scored JSONL-supported axes",
2462
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2463
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2464
  },
 
2466
  "id": "metadata128_neural_mlp",
2467
  "title": "128ep Metadata NN",
2468
  "status": "a100_rerun_pass",
2469
+ "coverage": "20 records / 13 scored JSONL-supported axes",
2470
  "headline": "compact MLP heads over metadata/text features",
2471
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2472
  },
 
4507
  "task_label": "Long-Horizon Next-Action Forecasting",
4508
  "series_id": "metadata128_simple",
4509
  "method": "128ep Metadata Simple",
4510
+ "status": "scored",
4511
+ "status_label": "scored",
4512
+ "scored": true,
4513
  "proxy_scored": false,
4514
+ "raw": 0.004579592783699693,
4515
+ "raw_text": "0.0046",
4516
+ "normalized_score": 0.004579592783699693,
4517
  "metric_key": "macro_f1",
4518
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
4519
  "scope": "multi_episode_128_metadata_baseline",
4520
+ "reason": null
4521
  },
4522
  {
4523
  "task_number": 13,
 
4525
  "task_label": "Long-Horizon Next-Action Forecasting",
4526
  "series_id": "metadata128_neural_mlp",
4527
  "method": "128ep Metadata NN",
4528
+ "status": "scored",
4529
+ "status_label": "scored",
4530
+ "scored": true,
4531
  "proxy_scored": false,
4532
+ "raw": 0.0029821307969142615,
4533
+ "raw_text": "0.0030",
4534
+ "normalized_score": 0.0029821307969142615,
4535
  "metric_key": "macro_f1",
4536
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
4537
  "scope": "multi_episode_128_metadata_baseline",
4538
+ "reason": null
4539
  },
4540
  {
4541
  "task_number": 13,
 
4669
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4670
  "series_id": "metadata128_simple",
4671
  "method": "128ep Metadata Simple",
4672
+ "status": "scored",
4673
+ "status_label": "scored",
4674
+ "scored": true,
4675
  "proxy_scored": false,
4676
+ "raw": 0.0001206030150753769,
4677
+ "raw_text": "0.0001",
4678
+ "normalized_score": 0.0001206030150753769,
4679
  "metric_key": "macro_f1",
4680
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
4681
  "scope": "multi_episode_128_metadata_baseline",
4682
+ "reason": null
4683
  },
4684
  {
4685
  "task_number": 14,
 
4687
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4688
  "series_id": "metadata128_neural_mlp",
4689
  "method": "128ep Metadata NN",
4690
+ "status": "scored",
4691
+ "status_label": "scored",
4692
+ "scored": true,
4693
  "proxy_scored": false,
4694
+ "raw": 2.086049543676662e-05,
4695
+ "raw_text": "0.0000",
4696
+ "normalized_score": 2.086049543676662e-05,
4697
  "metric_key": "macro_f1",
4698
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
4699
  "scope": "multi_episode_128_metadata_baseline",
4700
+ "reason": null
4701
  },
4702
  {
4703
  "task_number": 14,
 
4831
  "task_label": "Interaction Text Prediction",
4832
  "series_id": "metadata128_simple",
4833
  "method": "128ep Metadata Simple",
4834
+ "status": "unsupported_without_required_target",
4835
+ "status_label": "unsupported",
4836
  "scored": false,
4837
  "proxy_scored": false,
4838
  "raw": null,
4839
  "raw_text": "n/a",
4840
  "normalized_score": null,
4841
  "metric_key": "macro_f1",
4842
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
4843
  "scope": "multi_episode_128_metadata_baseline",
4844
+ "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
4845
  },
4846
  {
4847
  "task_number": 15,
 
4993
  "task_label": "Action-Object Relation Prediction",
4994
  "series_id": "metadata128_simple",
4995
  "method": "128ep Metadata Simple",
4996
+ "status": "scored",
4997
+ "status_label": "scored",
4998
+ "scored": true,
4999
  "proxy_scored": false,
5000
+ "raw": 0.0,
5001
+ "raw_text": "0.0000",
5002
+ "normalized_score": 0.0,
5003
  "metric_key": "macro_f1",
5004
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
5005
  "scope": "multi_episode_128_metadata_baseline",
5006
+ "reason": null
5007
  },
5008
  {
5009
  "task_number": 16,
 
5011
  "task_label": "Action-Object Relation Prediction",
5012
  "series_id": "metadata128_neural_mlp",
5013
  "method": "128ep Metadata NN",
5014
+ "status": "scored",
5015
+ "status_label": "scored",
5016
+ "scored": true,
5017
  "proxy_scored": false,
5018
+ "raw": 0.0,
5019
+ "raw_text": "0.0000",
5020
+ "normalized_score": 0.0,
5021
  "metric_key": "macro_f1",
5022
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
5023
  "scope": "multi_episode_128_metadata_baseline",
5024
+ "reason": null
5025
  },
5026
  {
5027
  "task_number": 16,
 
5155
  "task_label": "Future Object-Set Forecasting",
5156
  "series_id": "metadata128_simple",
5157
  "method": "128ep Metadata Simple",
5158
+ "status": "scored",
5159
+ "status_label": "scored",
5160
+ "scored": true,
5161
  "proxy_scored": false,
5162
+ "raw": 0.17656983343047333,
5163
+ "raw_text": "0.1766",
5164
+ "normalized_score": 0.17656983343047333,
5165
  "metric_key": "micro_f1",
5166
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
5167
  "scope": "multi_episode_128_metadata_baseline",
5168
+ "reason": null
5169
  },
5170
  {
5171
  "task_number": 17,
 
5173
  "task_label": "Future Object-Set Forecasting",
5174
  "series_id": "metadata128_neural_mlp",
5175
  "method": "128ep Metadata NN",
5176
+ "status": "scored",
5177
+ "status_label": "scored",
5178
+ "scored": true,
5179
  "proxy_scored": false,
5180
+ "raw": 0.17418550827844048,
5181
+ "raw_text": "0.1742",
5182
+ "normalized_score": 0.17418550827844048,
5183
  "metric_key": "micro_f1",
5184
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
5185
  "scope": "multi_episode_128_metadata_baseline",
5186
+ "reason": null
5187
  },
5188
  {
5189
  "task_number": 17,
 
5317
  "task_label": "IMU-to-Hand Pose Reconstruction",
5318
  "series_id": "metadata128_simple",
5319
  "method": "128ep Metadata Simple",
5320
+ "status": "unsupported_without_required_target",
5321
+ "status_label": "unsupported",
5322
  "scored": false,
5323
  "proxy_scored": false,
5324
  "raw": null,
5325
  "raw_text": "n/a",
5326
  "normalized_score": null,
5327
  "metric_key": "mae",
5328
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
5329
  "scope": "multi_episode_128_metadata_baseline",
5330
+ "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
5331
  },
5332
  {
5333
  "task_number": 18,
 
5479
  "task_label": "Camera-View Synchronization Retrieval",
5480
  "series_id": "metadata128_simple",
5481
  "method": "128ep Metadata Simple",
5482
+ "status": "unsupported_without_required_target",
5483
+ "status_label": "unsupported",
5484
  "scored": false,
5485
  "proxy_scored": false,
5486
  "raw": null,
5487
  "raw_text": "n/a",
5488
  "normalized_score": null,
5489
  "metric_key": "mrr",
5490
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
5491
  "scope": "multi_episode_128_metadata_baseline",
5492
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
5493
  },
5494
  {
5495
  "task_number": 19,
 
5641
  "task_label": "Time-to-Next-Transition Regression",
5642
  "series_id": "metadata128_simple",
5643
  "method": "128ep Metadata Simple",
5644
+ "status": "scored",
5645
+ "status_label": "scored",
5646
+ "scored": true,
5647
  "proxy_scored": false,
5648
+ "raw": 624.8108520507812,
5649
+ "raw_text": "624.81",
5650
+ "normalized_score": 0.016864874132806403,
5651
  "metric_key": "mae",
5652
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
5653
  "scope": "multi_episode_128_metadata_baseline",
5654
+ "reason": null
5655
  },
5656
  {
5657
  "task_number": 20,
 
5659
  "task_label": "Time-to-Next-Transition Regression",
5660
  "series_id": "metadata128_neural_mlp",
5661
  "method": "128ep Metadata NN",
5662
+ "status": "scored",
5663
+ "status_label": "scored",
5664
+ "scored": true,
5665
  "proxy_scored": false,
5666
+ "raw": 41.4664421081543,
5667
+ "raw_text": "41.47",
5668
+ "normalized_score": 0.25411768748242325,
5669
  "metric_key": "mae",
5670
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
5671
  "scope": "multi_episode_128_metadata_baseline",
5672
+ "reason": null
5673
  },
5674
  {
5675
  "task_number": 20,
metrics/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T11:41:43+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
- "bytes": 116109,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -316,7 +316,7 @@
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
- "bytes": 187099,
320
  "top_level_type": "dict"
321
  },
322
  {
@@ -486,12 +486,12 @@
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
- "bytes": 50687,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
- "bytes": 129600,
495
  "top_level_type": "dict"
496
  },
497
  {
@@ -526,7 +526,7 @@
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
- "bytes": 230951,
530
  "top_level_type": "dict"
531
  },
532
  {
@@ -571,7 +571,7 @@
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
- "bytes": 44825,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
@@ -641,7 +641,7 @@
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
- "bytes": 50841,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:09:46+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
+ "bytes": 116110,
305
  "top_level_type": "dict"
306
  },
307
  {
 
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
+ "bytes": 186443,
320
  "top_level_type": "dict"
321
  },
322
  {
 
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
+ "bytes": 46902,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
+ "bytes": 129242,
495
  "top_level_type": "dict"
496
  },
497
  {
 
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
+ "bytes": 230297,
530
  "top_level_type": "dict"
531
  },
532
  {
 
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
+ "bytes": 45937,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
 
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
+ "bytes": 51953,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/BASELINE_ALIGNMENT_REPORT.md CHANGED
@@ -27,6 +27,14 @@ The runner uses the derived Qwen JSONL export and public-safe metadata. It does
27
  | Cross-Modal Reconstruction | `modality_reconstruction` | unsupported_without_raw_128_feature_blocks | | not_run | |
28
  | Temporal Order Verification | `temporal_order` | pass | 0.4199 | pass | 0.8252 |
29
  | Multimodal Synchronization Detection | `misalignment_detection` | unsupported_without_raw_128_feature_blocks | | not_run | |
 
 
 
 
 
 
 
 
30
 
31
  ## Interpretation
32
 
 
27
  | Cross-Modal Reconstruction | `modality_reconstruction` | unsupported_without_raw_128_feature_blocks | | not_run | |
28
  | Temporal Order Verification | `temporal_order` | pass | 0.4199 | pass | 0.8252 |
29
  | Multimodal Synchronization Detection | `misalignment_detection` | unsupported_without_raw_128_feature_blocks | | not_run | |
30
+ | Long Horizon Next Action | `long_horizon_next_action` | pass | 0.0046 | pass | 0.0030 |
31
+ | Next Subtask Forecast | `next_subtask_forecast` | pass | 0.0001 | pass | 0.0000 |
32
+ | Interaction Text Prediction | `interaction_text_prediction` | unsupported_without_raw_128_feature_blocks | | not_run | |
33
+ | Action Object Relation | `action_object_relation` | pass | 0.0000 | pass | 0.0000 |
34
+ | Object Set Forecast | `object_set_forecast` | pass | 0.1766 | pass | 0.1742 |
35
+ | Imu To Hand Pose | `imu_to_hand_pose` | unsupported_without_raw_128_feature_blocks | | not_run | |
36
+ | Camera View Sync Retrieval | `camera_view_sync_retrieval` | unsupported_without_raw_128_feature_blocks | | not_run | |
37
+ | Time To Transition | `time_to_transition` | pass | 624.8109 | pass | 41.4664 |
38
 
39
  ## Interpretation
40
 
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "status": "unsupported_without_raw_128_feature_blocks",
3
+ "task": "camera_view_sync_retrieval",
4
+ "task_display_name": "Camera View Sync Retrieval",
5
+ "primary_metric": "mrr",
6
+ "primary_score": null,
7
+ "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
8
+ "source": "128_episode_qwen_jsonl_metadata"
9
+ }
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/confusion_matrix.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "status": "pass",
3
+ "task": "long_horizon_next_action",
4
+ "task_display_name": "Long Horizon Next Action",
5
+ "model_family": "simple_centroid_metadata",
6
+ "source": "128_episode_qwen_jsonl_metadata",
7
+ "input_features": "frame/context metadata plus hashed prompt/options/main_task text; answer_json fields are excluded from inputs",
8
+ "split_policy": "train on train split, report val and held-out test split",
9
+ "num_train_windows": 25068,
10
+ "num_val_windows": 4496,
11
+ "num_test_windows": 3951,
12
+ "num_classes": 1211,
13
+ "num_train_classes": 887,
14
+ "majority_baseline_accuracy": 0.0,
15
+ "history": [
16
+ {
17
+ "method": "train_class_centroid_cosine",
18
+ "reason": "train_class_count=887 exceeds softmax_max_train_classes=256"
19
+ }
20
+ ],
21
+ "splits": {
22
+ "val": {
23
+ "accuracy": 0.027135231316725978,
24
+ "balanced_accuracy": 0.007303546460754275,
25
+ "macro_f1": 0.003918205667693489,
26
+ "weighted_f1": 0.015520680430211261,
27
+ "num_eval_windows": 4496,
28
+ "num_classes": 1211
29
+ },
30
+ "test": {
31
+ "accuracy": 0.008605416350291066,
32
+ "balanced_accuracy": 0.008329048558933617,
33
+ "macro_f1": 0.004579592783699693,
34
+ "weighted_f1": 0.007358915849162803,
35
+ "num_eval_windows": 3951,
36
+ "num_classes": 1211
37
+ }
38
+ },
39
+ "primary_metric": "macro_f1",
40
+ "primary_score": 0.004579592783699693,
41
+ "unseen_test_class_count": 144,
42
+ "unseen_test_classes": [
43
+ "Pick up dustpan",
44
+ "Hold container lid",
45
+ "Move towards the stove",
46
+ "Open stove pot lid",
47
+ "Closing the door",
48
+ "Picking up bottle",
49
+ "Wipe kitchen counter",
50
+ "Move towards kitchen area",
51
+ "Place cloth on floor",
52
+ "Reach for cleaning supplies",
53
+ "Remove cleaning bottle",
54
+ "Washing hands in sink",
55
+ "Grasping cleaning cloth",
56
+ "Wiping countertop",
57
+ "Lift pot lid",
58
+ "Stir contents",
59
+ "Place lid back",
60
+ "Adjust pot position",
61
+ "Move pot",
62
+ "Place towel",
63
+ "Start cutting",
64
+ "Cut along the marked line",
65
+ "Observe and walk through store",
66
+ "Inspect shelf condition",
67
+ "Approach boxes",
68
+ "Reach for wire hangers",
69
+ "Extract wire hangers from box",
70
+ "Bundle display hooks",
71
+ "Release hook",
72
+ "Move through aisle",
73
+ "Pick up items from the shopping bag",
74
+ "Place items on the shelf",
75
+ "Release cardboard piece and gesture",
76
+ "Move marker and adjust hand",
77
+ "Identify next cardboard piece",
78
+ "Observe and pause",
79
+ "Resume observation",
80
+ "Reach for next can",
81
+ "Hold canned food",
82
+ "Retrieve next canned food item",
83
+ "Align canned food on shelf",
84
+ "Retrieve canned food from box",
85
+ "Place another canned food on shelf",
86
+ "Adjust canned food on shelf",
87
+ "Move hand away from shelf",
88
+ "Hold earbud case",
89
+ "sort craft materials",
90
+ "Manipulate craft piece",
91
+ "Manipulate craft paper strips",
92
+ "Operate smartphone",
93
+ "Release smartphone",
94
+ "Sort small craft pieces",
95
+ "Open paper lantern",
96
+ "Fold paper lantern",
97
+ "Grasp lantern",
98
+ "Grasp lantern component",
99
+ "Align paper lantern edges",
100
+ "Release lantern",
101
+ "Pick up packaged paper lantern component",
102
+ "Handle paper lantern component",
103
+ "Open folded paper lantern",
104
+ "Hold paper lantern",
105
+ "Apply adhesive tape to lantern",
106
+ "Remove paper lantern part from packaging",
107
+ "Remove plastic packaging",
108
+ "Open paper lantern component",
109
+ "Expand paper lantern",
110
+ "Align edges of paper lantern",
111
+ "Reach for craft items",
112
+ "Place hand on table",
113
+ "Browse smartphone screen",
114
+ "Scroll smartphone screen",
115
+ "Put down smartphone",
116
+ "Place smartphone down",
117
+ "Pick up puzzle piece",
118
+ "Place piece into puzzle",
119
+ "Manipulate puzzle piece",
120
+ "Observe puzzle progress",
121
+ "Reach for puzzle piece",
122
+ "Attempt to fit puzzle piece",
123
+ "Sort puzzle pieces",
124
+ "Walking across the room",
125
+ "Approaching the table",
126
+ "Preparing to craft",
127
+ "Picking up crafting material",
128
+ "Manipulate material",
129
+ "Place material",
130
+ "Manipulate yellow strip",
131
+ "Manipulating paper strips",
132
+ "Manipulate bead",
133
+ "Manipulate beads",
134
+ "Hold and manipulate paper strip",
135
+ "Sort buttons",
136
+ "Arrange buttons in a line",
137
+ "Sort and arrange buttons",
138
+ "Sort button",
139
+ "Sort and adjust button line",
140
+ "Sort and place buttons",
141
+ "Walking in the hallway",
142
+ "Approaching and pressing the door switch",
143
+ "Entering the VR training room",
144
+ "Greeting/acknowledging participants",
145
+ "Move through the training room",
146
+ "Manipulate plastic strips",
147
+ "Manipulate plastic strip",
148
+ "Hold and bend plastic strip",
149
+ "Bend and manipulate plastic strip",
150
+ "Fold plastic strip",
151
+ "Manipulate paper decoration",
152
+ "Manipulate paper edge",
153
+ "Placing paper strip",
154
+ "Securing paper structure",
155
+ "Manipulate adhesive strip",
156
+ "Secure paper edges with adhesive",
157
+ "Record count",
158
+ "Sort beads and write count",
159
+ "Counting and organizing beads",
160
+ "Pick up star bead",
161
+ "Place and count bead",
162
+ "Arrange star beads",
163
+ "Counting star beads",
164
+ "Adjust paper",
165
+ "Gather star beads",
166
+ "Arrange star beads for counting",
167
+ "Sort and count beads",
168
+ "Rinse cloth in sink",
169
+ "Reposition hand",
170
+ "Walk towards other aisles",
171
+ "Place marked piece down",
172
+ "Gesturing",
173
+ "Reach for next canned food",
174
+ "Move hand away",
175
+ "Sort craft items",
176
+ "Retrieving more beads",
177
+ "Place smartphone on stand",
178
+ "Move dustpan to side",
179
+ "Walking towards door",
180
+ "Grasp cleaning bottle",
181
+ "Observe colleague and workspace",
182
+ "Open earbud case",
183
+ "Adjust lantern string",
184
+ "Adjust lantern shape",
185
+ "Pick up small piece of material",
186
+ "Use phone while crafting"
187
+ ]
188
+ }
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/per_class_metrics.csv ADDED
@@ -0,0 +1,1212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ class_id,class_name,support,predicted,precision,recall,f1
2
+ 0,Place jar on shelf,0,0,0.0,0.0,0.0
3
+ 1,Reach for item in box,0,1,0.0,0.0,0.0
4
+ 2,Pick up product from box,0,1,0.0,0.0,0.0
5
+ 3,Place product on shelf,0,151,0.0,0.0,0.0
6
+ 4,Pick up product from bin,0,6,0.0,0.0,0.0
7
+ 5,Reach for next product,0,0,0.0,0.0,0.0
8
+ 6,Place canned product on shelf,0,0,0.0,0.0,0.0
9
+ 7,Pick up canned product,0,0,0.0,0.0,0.0
10
+ 8,Arrange canned products on shelf,0,0,0.0,0.0,0.0
11
+ 9,Move bin to shelf area,0,0,0.0,0.0,0.0
12
+ 10,Place can on shelf,20,0,0.0,0.0,0.0
13
+ 11,Retract hand,0,1,0.0,0.0,0.0
14
+ 12,Hold and wipe product,0,0,0.0,0.0,0.0
15
+ 13,Wipe down shelf,0,0,0.0,0.0,0.0
16
+ 14,Pick up product,0,6,0.0,0.0,0.0
17
+ 15,Wipe product,0,7,0.0,0.0,0.0
18
+ 16,Wipe food product,0,4,0.0,0.0,0.0
19
+ 17,Reach into box,9,2,0.0,0.0,0.0
20
+ 18,Wipe jar,0,0,0.0,0.0,0.0
21
+ 19,Wipe shelf,0,1,0.0,0.0,0.0
22
+ 20,Hold pickle jar,0,0,0.0,0.0,0.0
23
+ 21,Release pickle jar,0,0,0.0,0.0,0.0
24
+ 22,Hold cleaning cloth,0,1,0.0,0.0,0.0
25
+ 23,Wipe the shelf,0,0,0.0,0.0,0.0
26
+ 24,Wipe the product jar,0,0,0.0,0.0,0.0
27
+ 25,Move to next section,0,10,0.0,0.0,0.0
28
+ 26,Place product in box,0,0,0.0,0.0,0.0
29
+ 27,Hold product,0,2,0.0,0.0,0.0
30
+ 28,Hold item and adjust posture,0,1,0.0,0.0,0.0
31
+ 29,Grasp product from box,0,0,0.0,0.0,0.0
32
+ 30,Prepare to place product,0,1,0.0,0.0,0.0
33
+ 31,Move product to shelf,0,0,0.0,0.0,0.0
34
+ 32,Move product to box,0,1,0.0,0.0,0.0
35
+ 33,Grasp next item,0,0,0.0,0.0,0.0
36
+ 34,Align button in row,0,1,0.0,0.0,0.0
37
+ 35,Pick up button,9,1,0.0,0.0,0.0
38
+ 36,Place button in row,0,0,0.0,0.0,0.0
39
+ 37,Align button,0,0,0.0,0.0,0.0
40
+ 38,Arrange button cluster,0,0,0.0,0.0,0.0
41
+ 39,Align button row,0,1,0.0,0.0,0.0
42
+ 40,Arrange buttons on table,0,0,0.0,0.0,0.0
43
+ 41,Adjust red button in row,0,1,0.0,0.0,0.0
44
+ 42,Select button from pile,0,5,0.0,0.0,0.0
45
+ 43,Adjust red button,0,0,0.0,0.0,0.0
46
+ 44,Pull back hand,0,0,0.0,0.0,0.0
47
+ 45,Survey the table,0,2,0.0,0.0,0.0
48
+ 46,Arrange black buttons,0,3,0.0,0.0,0.0
49
+ 47,Pick up black button,0,0,0.0,0.0,0.0
50
+ 48,Move black button,0,0,0.0,0.0,0.0
51
+ 49,Pick up red button,0,0,0.0,0.0,0.0
52
+ 50,Reach for multicolored buttons,0,3,0.0,0.0,0.0
53
+ 51,Arrange buttons in row,0,0,0.0,0.0,0.0
54
+ 52,Touch device,0,0,0.0,0.0,0.0
55
+ 53,Align red button in row,0,0,0.0,0.0,0.0
56
+ 54,Reach and sort buttons,0,0,0.0,0.0,0.0
57
+ 55,Reach for button,0,0,0.0,0.0,0.0
58
+ 56,Place button,31,0,0.0,0.0,0.0
59
+ 57,Cut cardboard piece,40,0,0.0,0.0,0.0
60
+ 58,Manipulate cardboard piece,0,1,0.0,0.0,0.0
61
+ 59,Position scissors to cut cardboard,0,0,0.0,0.0,0.0
62
+ 60,Cut cardboard,174,231,0.021645021645021644,0.028735632183908046,0.024691358024691357
63
+ 61,Release scissors,4,1,0.0,0.0,0.0
64
+ 62,Position scissors,0,0,0.0,0.0,0.0
65
+ 63,Pick up scissors,0,0,0.0,0.0,0.0
66
+ 64,Move away from workstation,0,0,0.0,0.0,0.0
67
+ 65,Walk through corridor,0,0,0.0,0.0,0.0
68
+ 66,Mark cardboard with ruler and pen,0,0,0.0,0.0,0.0
69
+ 67,Mark cardboard with pen and ruler,0,0,0.0,0.0,0.0
70
+ 68,Cut cardboard with scissors,0,0,0.0,0.0,0.0
71
+ 69,Place down scissors,0,0,0.0,0.0,0.0
72
+ 70,Position ruler on cardboard,0,1,0.0,0.0,0.0
73
+ 71,Hold ruler steady,0,0,0.0,0.0,0.0
74
+ 72,Mark line on cardboard,0,2,0.0,0.0,0.0
75
+ 73,Move marker and ruler,0,0,0.0,0.0,0.0
76
+ 74,Pick up smartphone,6,4,0.0,0.0,0.0
77
+ 75,Mark cardboard,0,0,0.0,0.0,0.0
78
+ 76,Hold cardboard pieces,0,0,0.0,0.0,0.0
79
+ 77,Place cardboard,0,1,0.0,0.0,0.0
80
+ 78,Align ruler on cardboard,0,9,0.0,0.0,0.0
81
+ 79,Mark cardboard with pen,0,0,0.0,0.0,0.0
82
+ 80,Plug cable into portable charger,0,0,0.0,0.0,0.0
83
+ 81,Pick up portable charger,0,0,0.0,0.0,0.0
84
+ 82,Walk towards room,0,0,0.0,0.0,0.0
85
+ 83,Hold portable charger,0,0,0.0,0.0,0.0
86
+ 84,Place charger on table,0,0,0.0,0.0,0.0
87
+ 85,Hold charger and cable,0,0,0.0,0.0,0.0
88
+ 86,Manipulate power cable plug,0,0,0.0,0.0,0.0
89
+ 87,Insert plug into power adapter,0,0,0.0,0.0,0.0
90
+ 88,Hold power adapter,0,0,0.0,0.0,0.0
91
+ 89,Align charging cable,0,0,0.0,0.0,0.0
92
+ 90,Insert charging cable,0,0,0.0,0.0,0.0
93
+ 91,Observe desktop layout,0,0,0.0,0.0,0.0
94
+ 92,Retract camera/reposition view,0,0,0.0,0.0,0.0
95
+ 93,Manipulate paper strip,152,0,0.0,0.0,0.0
96
+ 94,Fold paper strip,0,313,0.0,0.0,0.0
97
+ 95,Pick up yellow paper strip,0,0,0.0,0.0,0.0
98
+ 96,Pick up phone,0,4,0.0,0.0,0.0
99
+ 97,Use phone,31,12,0.0,0.0,0.0
100
+ 98,Place phone on desk,0,2,0.0,0.0,0.0
101
+ 99,Interact with phone,0,1,0.0,0.0,0.0
102
+ 100,Place phone down,11,4,0.0,0.0,0.0
103
+ 101,Adjust paper strip,0,10,0.0,0.0,0.0
104
+ 102,Release folded paper,0,4,0.0,0.0,0.0
105
+ 103,Fold paper strip into knot,0,0,0.0,0.0,0.0
106
+ 104,Fold paper strip into lucky star,0,0,0.0,0.0,0.0
107
+ 105,Inflate paper star,0,0,0.0,0.0,0.0
108
+ 106,Fold purple paper strip,0,0,0.0,0.0,0.0
109
+ 107,Fold purple paper,0,0,0.0,0.0,0.0
110
+ 108,Hold and crease purple paper,0,1,0.0,0.0,0.0
111
+ 109,Release paper,0,0,0.0,0.0,0.0
112
+ 110,Reach for phone,0,0,0.0,0.0,0.0
113
+ 111,Retrieve paper strips,0,0,0.0,0.0,0.0
114
+ 112,Fold and organize paper strips,0,2,0.0,0.0,0.0
115
+ 113,Hold charger,0,0,0.0,0.0,0.0
116
+ 114,Separate cardboard piece,0,0,0.0,0.0,0.0
117
+ 115,Cut cardboard with utility knife,0,0,0.0,0.0,0.0
118
+ 116,Mark cardboard piece,85,0,0.0,0.0,0.0
119
+ 117,Retract hand from bag,0,5,0.0,0.0,0.0
120
+ 118,Open small case,0,2,0.0,0.0,0.0
121
+ 119,Measure cardboard with ruler,0,2,0.0,0.0,0.0
122
+ 120,Move smartphone,28,1,0.0,0.0,0.0
123
+ 121,Hold ruler on cardboard,0,3,0.0,0.0,0.0
124
+ 122,Remove ruler,0,0,0.0,0.0,0.0
125
+ 123,Walk towards table,0,0,0.0,0.0,0.0
126
+ 124,Approach desk,0,0,0.0,0.0,0.0
127
+ 125,Position hands for work,0,1,0.0,0.0,0.0
128
+ 126,Manipulate quilling strip,0,0,0.0,0.0,0.0
129
+ 127,Release quilling strip,0,0,0.0,0.0,0.0
130
+ 128,Release paper strip,35,0,0.0,0.0,0.0
131
+ 129,Begin rolling quilling strip,0,1,0.0,0.0,0.0
132
+ 130,Type on smartphone,0,26,0.0,0.0,0.0
133
+ 131,Use phone to check stock,0,0,0.0,0.0,0.0
134
+ 132,Grasp package,0,1,0.0,0.0,0.0
135
+ 133,Place item on shelf,23,109,0.0,0.0,0.0
136
+ 134,Observe shelf status,0,2,0.0,0.0,0.0
137
+ 135,Walk towards aisle,0,3,0.0,0.0,0.0
138
+ 136,Move towards shelf,0,1,0.0,0.0,0.0
139
+ 137,Adjust item on shelf,26,0,0.0,0.0,0.0
140
+ 138,Reach for another item,18,1,0.0,0.0,0.0
141
+ 139,Remove item from shelf,0,0,0.0,0.0,0.0
142
+ 140,Discard item into bin,0,3,0.0,0.0,0.0
143
+ 141,Sweep debris,0,8,0.0,0.0,0.0
144
+ 142,Touch shelf edge,0,1,0.0,0.0,0.0
145
+ 143,Reach for product,0,0,0.0,0.0,0.0
146
+ 144,Release label,0,0,0.0,0.0,0.0
147
+ 145,Remove shelf label,0,1,0.0,0.0,0.0
148
+ 146,Move along shelf,0,0,0.0,0.0,0.0
149
+ 147,Carry stool to next shelf,0,0,0.0,0.0,0.0
150
+ 148,Place stool on floor,0,1,0.0,0.0,0.0
151
+ 149,Observe shelf,0,2,0.0,0.0,0.0
152
+ 150,Walk towards next aisle,0,0,0.0,0.0,0.0
153
+ 151,Reach for product labels,0,0,0.0,0.0,0.0
154
+ 152,Hold product labels,0,3,0.0,0.0,0.0
155
+ 153,Examine labels,0,0,0.0,0.0,0.0
156
+ 154,Place sauce in container,0,0,0.0,0.0,0.0
157
+ 155,Pick up supplement bottle,0,0,0.0,0.0,0.0
158
+ 156,Hold supplement bottle,0,0,0.0,0.0,0.0
159
+ 157,Open supplement bottle,0,0,0.0,0.0,0.0
160
+ 158,Walk through store,0,0,0.0,0.0,0.0
161
+ 159,Reach for item on shelf,0,0,0.0,0.0,0.0
162
+ 160,Examine item,0,0,0.0,0.0,0.0
163
+ 161,Place item in container,0,0,0.0,0.0,0.0
164
+ 162,Pick up another item,0,1,0.0,0.0,0.0
165
+ 163,Pick up oil bottle,0,0,0.0,0.0,0.0
166
+ 164,Place oil in container,0,0,0.0,0.0,0.0
167
+ 165,Inspect supplement bottle,0,3,0.0,0.0,0.0
168
+ 166,Place supplement bottle in container,0,0,0.0,0.0,0.0
169
+ 167,Pick up spice jar,0,0,0.0,0.0,0.0
170
+ 168,Place spice jar in container,0,0,0.0,0.0,0.0
171
+ 169,Sort beads,16,74,0.0,0.0,0.0
172
+ 170,Adjust hand position,0,0,0.0,0.0,0.0
173
+ 171,Arrange star-shaped beads,0,0,0.0,0.0,0.0
174
+ 172,Move pen,0,1,0.0,0.0,0.0
175
+ 173,Sort beads by color,0,12,0.0,0.0,0.0
176
+ 174,Move towards table,0,0,0.0,0.0,0.0
177
+ 175,Observe room,0,1,0.0,0.0,0.0
178
+ 176,Check watch,0,2,0.0,0.0,0.0
179
+ 177,Prepare to sort beads,0,1,0.0,0.0,0.0
180
+ 178,Align ruler,0,0,0.0,0.0,0.0
181
+ 179,Adjust grip,0,1,0.0,0.0,0.0
182
+ 180,Move ruler,0,0,0.0,0.0,0.0
183
+ 181,Adjust ruler position,0,0,0.0,0.0,0.0
184
+ 182,Mark cardboard with marker,0,0,0.0,0.0,0.0
185
+ 183,Draw lines on cardboard,0,1,0.0,0.0,0.0
186
+ 184,Drawing lines on cardboard,0,0,0.0,0.0,0.0
187
+ 185,Reposition marker,0,0,0.0,0.0,0.0
188
+ 186,Mark lines with marker,0,0,0.0,0.0,0.0
189
+ 187,Position the ruler,0,0,0.0,0.0,0.0
190
+ 188,Stack cardboard pieces,0,0,0.0,0.0,0.0
191
+ 189,Walking in the workspace,0,0,0.0,0.0,0.0
192
+ 190,Insert charging cable into power bank,0,0,0.0,0.0,0.0
193
+ 191,Manipulate and inspect colorful pieces,0,0,0.0,0.0,0.0
194
+ 192,Manipulate colorful pieces,0,0,0.0,0.0,0.0
195
+ 193,Sort colorful pieces,0,0,0.0,0.0,0.0
196
+ 194,Hold power bank and cable,0,0,0.0,0.0,0.0
197
+ 195,Touch pieces in box,0,0,0.0,0.0,0.0
198
+ 196,Hold small white box,0,1,0.0,0.0,0.0
199
+ 197,Place white box on table,0,0,0.0,0.0,0.0
200
+ 198,Adjust smartphone and sort pieces,0,0,0.0,0.0,0.0
201
+ 199,Sort small colorful pieces,0,1,0.0,0.0,0.0
202
+ 200,Sorting colorful paper pieces,0,0,0.0,0.0,0.0
203
+ 201,Use phone to check instructions,0,31,0.0,0.0,0.0
204
+ 202,Trace pattern on cardboard,0,4,0.0,0.0,0.0
205
+ 203,Remove cardboard pattern,0,7,0.0,0.0,0.0
206
+ 204,Remove cardboard pattern piece,0,2,0.0,0.0,0.0
207
+ 205,Cut out cardboard pattern,0,9,0.0,0.0,0.0
208
+ 206,Cut cardboard pattern,0,12,0.0,0.0,0.0
209
+ 207,Adjust cardboard position,0,4,0.0,0.0,0.0
210
+ 208,Interact with smartphone screen,0,1,0.0,0.0,0.0
211
+ 209,Pick up metal ruler,0,0,0.0,0.0,0.0
212
+ 210,Pick up pen,8,0,0.0,0.0,0.0
213
+ 211,Move pen aside,0,9,0.0,0.0,0.0
214
+ 212,Reposition and cut,0,0,0.0,0.0,0.0
215
+ 213,Hold quilling paper,0,0,0.0,0.0,0.0
216
+ 214,Roll quilling paper,0,0,0.0,0.0,0.0
217
+ 215,Release paper coil,0,3,0.0,0.0,0.0
218
+ 216,Pick up paper strip,0,1,0.0,0.0,0.0
219
+ 217,Manipulate quilled paper strip,0,0,0.0,0.0,0.0
220
+ 218,Release and prepare new strip,0,0,0.0,0.0,0.0
221
+ 219,Manipulate small paper segment,0,1,0.0,0.0,0.0
222
+ 220,Place down paper segment,0,2,0.0,0.0,0.0
223
+ 221,Reach for paper strips,0,1,0.0,0.0,0.0
224
+ 222,Browse and interact with phone interface,0,4,0.0,0.0,0.0
225
+ 223,Interacting with phone screen,0,0,0.0,0.0,0.0
226
+ 224,Pick up light blue strip,0,8,0.0,0.0,0.0
227
+ 225,Inspect strip,0,0,0.0,0.0,0.0
228
+ 226,Manipulate light blue strip,0,1,0.0,0.0,0.0
229
+ 227,Cut cardboard tube,0,2,0.0,0.0,0.0
230
+ 228,Stacking cardboard pieces,0,14,0.0,0.0,0.0
231
+ 229,Moving hand,0,0,0.0,0.0,0.0
232
+ 230,Position cardboard for cutting,0,0,0.0,0.0,0.0
233
+ 231,Place cardboard piece,0,0,0.0,0.0,0.0
234
+ 232,Pick up cardboard piece,0,0,0.0,0.0,0.0
235
+ 233,Cut cardboard piece with scissors,0,0,0.0,0.0,0.0
236
+ 234,Release cardboard piece,0,4,0.0,0.0,0.0
237
+ 235,Walk across office,0,0,0.0,0.0,0.0
238
+ 236,Cut cardboard into triangles,0,0,0.0,0.0,0.0
239
+ 237,Cut cardboard shape,0,53,0.0,0.0,0.0
240
+ 238,Pick up cardboard cutout,0,9,0.0,0.0,0.0
241
+ 239,Walk with cardboard cutout,0,0,0.0,0.0,0.0
242
+ 240,Approach workstation,0,4,0.0,0.0,0.0
243
+ 241,Organize tools and materials,0,0,0.0,0.0,0.0
244
+ 242,Cut cardboard triangle,0,0,0.0,0.0,0.0
245
+ 243,Holding marker,0,32,0.0,0.0,0.0
246
+ 244,Lift utility knife,0,0,0.0,0.0,0.0
247
+ 245,Inspect cardboard piece,0,0,0.0,0.0,0.0
248
+ 246,Position cardboard piece,0,0,0.0,0.0,0.0
249
+ 247,Align scissors,0,0,0.0,0.0,0.0
250
+ 248,Cut cardboard strip,0,0,0.0,0.0,0.0
251
+ 249,Position cardboard strip,0,4,0.0,0.0,0.0
252
+ 250,Inspect cardboard strip,0,0,0.0,0.0,0.0
253
+ 251,Align cardboard piece,0,0,0.0,0.0,0.0
254
+ 252,Complete the cut,0,0,0.0,0.0,0.0
255
+ 253,Put down utility knife,0,1,0.0,0.0,0.0
256
+ 254,Fold cardboard,0,10,0.0,0.0,0.0
257
+ 255,Pick up utility knife,18,0,0.0,0.0,0.0
258
+ 256,Hold utility knife,0,1,0.0,0.0,0.0
259
+ 257,Pick up cardboard strip,0,0,0.0,0.0,0.0
260
+ 258,Place cardboard strip,0,2,0.0,0.0,0.0
261
+ 259,Place cans into box,0,0,0.0,0.0,0.0
262
+ 260,Arrange cans in box,0,0,0.0,0.0,0.0
263
+ 261,Reach for can,0,0,0.0,0.0,0.0
264
+ 262,Arrange cans on shelf,0,3,0.0,0.0,0.0
265
+ 263,Reach for additional items,0,0,0.0,0.0,0.0
266
+ 264,Reach for container,0,3,0.0,0.0,0.0
267
+ 265,Adjust position,0,1,0.0,0.0,0.0
268
+ 266,Prepare to pick up item,0,0,0.0,0.0,0.0
269
+ 267,Place container in bin,0,0,0.0,0.0,0.0
270
+ 268,Adjust cans in bin,0,0,0.0,0.0,0.0
271
+ 269,Hold and inspect can,0,2,0.0,0.0,0.0
272
+ 270,Adjust perspective,0,1,0.0,0.0,0.0
273
+ 271,Inspect shelf and organize stock,0,0,0.0,0.0,0.0
274
+ 272,Placing stock on shelf,0,10,0.0,0.0,0.0
275
+ 273,Hold small product bag,0,12,0.0,0.0,0.0
276
+ 274,Position shelving divider,0,2,0.0,0.0,0.0
277
+ 275,Move away from shelf,0,0,0.0,0.0,0.0
278
+ 276,Pick up container,0,4,0.0,0.0,0.0
279
+ 277,Pick up cleaning cloth,0,1,0.0,0.0,0.0
280
+ 278,Pick up product box,0,4,0.0,0.0,0.0
281
+ 279,Place box on shelf,0,8,0.0,0.0,0.0
282
+ 280,Reach for next item,9,3,0.0,0.0,0.0
283
+ 281,Place plush toy on shelf,0,2,0.0,0.0,0.0
284
+ 282,Adjust placement on shelf,0,2,0.0,0.0,0.0
285
+ 283,Move plush toy,0,1,0.0,0.0,0.0
286
+ 284,Reach for product on shelf,0,0,0.0,0.0,0.0
287
+ 285,Hold cardboard,0,0,0.0,0.0,0.0
288
+ 286,Arrange cardboard,0,0,0.0,0.0,0.0
289
+ 287,Walk with marker,0,0,0.0,0.0,0.0
290
+ 288,Pick up small object,0,0,0.0,0.0,0.0
291
+ 289,Walk across room,0,0,0.0,0.0,0.0
292
+ 290,Place cardboard square on stack,0,0,0.0,0.0,0.0
293
+ 291,Arrange cardboard squares,0,0,0.0,0.0,0.0
294
+ 292,Stacking cardboard squares,0,0,0.0,0.0,0.0
295
+ 293,Stacking cardboard square,0,0,0.0,0.0,0.0
296
+ 294,Stack cardboard square,0,0,0.0,0.0,0.0
297
+ 295,Stack cardboard squares,0,0,0.0,0.0,0.0
298
+ 296,Sorting paper stars,0,0,0.0,0.0,0.0
299
+ 297,Place star,0,0,0.0,0.0,0.0
300
+ 298,Sort paper star,0,24,0.0,0.0,0.0
301
+ 299,Sort paper stars,0,2,0.0,0.0,0.0
302
+ 300,Place paper star,0,0,0.0,0.0,0.0
303
+ 301,Walk away,0,1,0.0,0.0,0.0
304
+ 302,Open door,0,1,0.0,0.0,0.0
305
+ 303,Walk through doorway,0,2,0.0,0.0,0.0
306
+ 304,Pick up object,0,0,0.0,0.0,0.0
307
+ 305,Place item on table,0,4,0.0,0.0,0.0
308
+ 306,Move phone,24,1,0.0,0.0,0.0
309
+ 307,Sort and place paper star,0,0,0.0,0.0,0.0
310
+ 308,Hold cardboard strip,0,0,0.0,0.0,0.0
311
+ 309,Align cardboard strip,0,1,0.0,0.0,0.0
312
+ 310,Hold cardboard with ruler,0,0,0.0,0.0,0.0
313
+ 311,Move utility knife along ruler,0,0,0.0,0.0,0.0
314
+ 312,Slide utility knife along ruler,0,0,0.0,0.0,0.0
315
+ 313,Guide utility knife along ruler,0,0,0.0,0.0,0.0
316
+ 314,Draw line on cardboard,0,0,0.0,0.0,0.0
317
+ 315,Marking lines on cardboard,0,0,0.0,0.0,0.0
318
+ 316,Hold craft tool,0,9,0.0,0.0,0.0
319
+ 317,Approach table,0,12,0.0,0.0,0.0
320
+ 318,Place tool on table,0,3,0.0,0.0,0.0
321
+ 319,Move hand toward craft materials,0,17,0.0,0.0,0.0
322
+ 320,Manipulate paper strips,0,58,0.0,0.0,0.0
323
+ 321,Pick up blue paper strip,0,7,0.0,0.0,0.0
324
+ 322,Hold and bend paper strip,0,8,0.0,0.0,0.0
325
+ 323,Hold small object,0,8,0.0,0.0,0.0
326
+ 324,Move hand away from workspace,0,6,0.0,0.0,0.0
327
+ 325,Observe workspace,13,53,0.0,0.0,0.0
328
+ 326,Place puzzle piece,21,69,0.0,0.0,0.0
329
+ 327,Release puzzle piece,4,3,0.0,0.0,0.0
330
+ 328,Scan for next piece,0,21,0.0,0.0,0.0
331
+ 329,Positioning puzzle piece,0,6,0.0,0.0,0.0
332
+ 330,Manipulate puzzle pieces,35,24,0.0,0.0,0.0
333
+ 331,Adjust puzzle piece,11,152,0.07236842105263158,1.0,0.13496932515337423
334
+ 332,Adjusting puzzle piece,0,2,0.0,0.0,0.0
335
+ 333,Adjusting a puzzle piece,0,0,0.0,0.0,0.0
336
+ 334,Draw line along ruler,0,0,0.0,0.0,0.0
337
+ 335,Reposition ruler,0,0,0.0,0.0,0.0
338
+ 336,Hold ruler and pen steady,0,0,0.0,0.0,0.0
339
+ 337,Mark lines on cardboard,0,0,0.0,0.0,0.0
340
+ 338,Place marker down,0,0,0.0,0.0,0.0
341
+ 339,Walk across the room,0,0,0.0,0.0,0.0
342
+ 340,Approach packing area,0,0,0.0,0.0,0.0
343
+ 341,Pack beads into box,0,0,0.0,0.0,0.0
344
+ 342,Pick up beads,0,0,0.0,0.0,0.0
345
+ 343,Deposit beads into box,0,0,0.0,0.0,0.0
346
+ 344,Pick up cardboard tray,0,0,0.0,0.0,0.0
347
+ 345,Move tray towards packing area,0,3,0.0,0.0,0.0
348
+ 346,Position cardboard tray,0,0,0.0,0.0,0.0
349
+ 347,Cut light green fabric,0,0,0.0,0.0,0.0
350
+ 348,Continue cutting fabric,0,1,0.0,0.0,0.0
351
+ 349,Cut fabric with scissors,0,1,0.0,0.0,0.0
352
+ 350,Adjusting fabric for cutting,0,0,0.0,0.0,0.0
353
+ 351,Cutting fabric,0,0,0.0,0.0,0.0
354
+ 352,Mark fabric with pen,0,0,0.0,0.0,0.0
355
+ 353,Mark fabric,0,4,0.0,0.0,0.0
356
+ 354,Mark fabric with pen and ruler,0,0,0.0,0.0,0.0
357
+ 355,Carry cardboard piece,0,0,0.0,0.0,0.0
358
+ 356,Pick up electronic accessory from box,0,0,0.0,0.0,0.0
359
+ 357,Place accessory on shelf,0,0,0.0,0.0,0.0
360
+ 358,Pick up accessory,0,0,0.0,0.0,0.0
361
+ 359,Reach towards shelf,0,0,0.0,0.0,0.0
362
+ 360,Place accessory into box,0,0,0.0,0.0,0.0
363
+ 361,Pick up new electronic product,0,0,0.0,0.0,0.0
364
+ 362,Pick up electronic product,0,0,0.0,0.0,0.0
365
+ 363,Move hand back to box,0,0,0.0,0.0,0.0
366
+ 364,Move product towards shelf,0,0,0.0,0.0,0.0
367
+ 365,Walk with shopping bag,0,0,0.0,0.0,0.0
368
+ 366,Pick up item from box,0,0,0.0,0.0,0.0
369
+ 367,Move towards box,0,9,0.0,0.0,0.0
370
+ 368,Hold items,0,4,0.0,0.0,0.0
371
+ 369,Place items on shelf,0,8,0.0,0.0,0.0
372
+ 370,Move to box,0,4,0.0,0.0,0.0
373
+ 371,Move box to next position,0,2,0.0,0.0,0.0
374
+ 372,Hold snack package,0,0,0.0,0.0,0.0
375
+ 373,Place snack package on shelf,0,0,0.0,0.0,0.0
376
+ 374,Place snack package in box,0,0,0.0,0.0,0.0
377
+ 375,Hold snack packages,0,2,0.0,0.0,0.0
378
+ 376,Pick up snack packages,0,13,0.0,0.0,0.0
379
+ 377,Walk towards shelves,9,1,0.0,0.0,0.0
380
+ 378,Pick up snack package,0,8,0.0,0.0,0.0
381
+ 379,Organize snacks in box,0,0,0.0,0.0,0.0
382
+ 380,Adjust snack package,0,6,0.0,0.0,0.0
383
+ 381,Open cardboard box,0,5,0.0,0.0,0.0
384
+ 382,Remove cardboard flap,0,2,0.0,0.0,0.0
385
+ 383,Align plastic containers,0,0,0.0,0.0,0.0
386
+ 384,Reach for items,0,0,0.0,0.0,0.0
387
+ 385,Adjust containers on shelf,0,0,0.0,0.0,0.0
388
+ 386,Adjust container position,0,0,0.0,0.0,0.0
389
+ 387,Withdraw hand,0,0,0.0,0.0,0.0
390
+ 388,Place container on shelf,0,0,0.0,0.0,0.0
391
+ 389,Place item in shopping bag,0,1,0.0,0.0,0.0
392
+ 390,Grasp item,0,0,0.0,0.0,0.0
393
+ 391,Move item to bag,0,0,0.0,0.0,0.0
394
+ 392,Pick up plush toy,0,0,0.0,0.0,0.0
395
+ 393,Place plush toy into bag,0,0,0.0,0.0,0.0
396
+ 394,Grasp shopping bag,0,2,0.0,0.0,0.0
397
+ 395,Prepare to place item in bag,0,0,0.0,0.0,0.0
398
+ 396,Organize bag contents,0,0,0.0,0.0,0.0
399
+ 397,Grasp and retrieve item,0,0,0.0,0.0,0.0
400
+ 398,Place item into shopping bag,0,0,0.0,0.0,0.0
401
+ 399,Sort star-shaped beads,16,0,0.0,0.0,0.0
402
+ 400,Sort beads on the table,0,0,0.0,0.0,0.0
403
+ 401,Sort beads on table,0,0,0.0,0.0,0.0
404
+ 402,Hold instructional sign,0,1,0.0,0.0,0.0
405
+ 403,Pick up star-shaped bead,0,17,0.0,0.0,0.0
406
+ 404,Place bead on table,0,1,0.0,0.0,0.0
407
+ 405,Reposition sign and organize beads,0,9,0.0,0.0,0.0
408
+ 406,Reposition ruler and pen,0,0,0.0,0.0,0.0
409
+ 407,Reposition pen and prepare for next line,0,0,0.0,0.0,0.0
410
+ 408,Draw straight lines on cardboard,0,0,0.0,0.0,0.0
411
+ 409,Draw lines with ruler,0,0,0.0,0.0,0.0
412
+ 410,Sort origami stars,0,0,0.0,0.0,0.0
413
+ 411,Walk in hallway,0,0,0.0,0.0,0.0
414
+ 412,Reach for stars,0,0,0.0,0.0,0.0
415
+ 413,Walk towards desk,0,0,0.0,0.0,0.0
416
+ 414,Grasp origami stars,0,0,0.0,0.0,0.0
417
+ 415,Place stars in container,0,1,0.0,0.0,0.0
418
+ 416,Sort light blue origami stars,0,0,0.0,0.0,0.0
419
+ 417,Sort origami stars by color,0,0,0.0,0.0,0.0
420
+ 418,Move origami stars,0,0,0.0,0.0,0.0
421
+ 419,Put down scissors,0,0,0.0,0.0,0.0
422
+ 420,Use smartphone,70,0,0.0,0.0,0.0
423
+ 421,Pick up water bottle,0,4,0.0,0.0,0.0
424
+ 422,Hold water bottle,0,0,0.0,0.0,0.0
425
+ 423,Place water bottle on table,0,0,0.0,0.0,0.0
426
+ 424,Hold phone,0,0,0.0,0.0,0.0
427
+ 425,Hold and view phone,0,0,0.0,0.0,0.0
428
+ 426,Cut cardboard pieces with scissors,0,30,0.0,0.0,0.0
429
+ 427,Vacuum the carpet,0,0,0.0,0.0,0.0
430
+ 428,Push vacuum cleaner,0,0,0.0,0.0,0.0
431
+ 429,Adjust vacuum cleaner position,0,0,0.0,0.0,0.0
432
+ 430,Vacuuming carpet edge,0,1,0.0,0.0,0.0
433
+ 431,Vacuum edge of carpet,0,0,0.0,0.0,0.0
434
+ 432,Vacuuming carpet corner,0,2,0.0,0.0,0.0
435
+ 433,Vacuuming the carpet edge,0,0,0.0,0.0,0.0
436
+ 434,Move vacuum cleaner,0,0,0.0,0.0,0.0
437
+ 435,Vacuuming along the wall edge,0,0,0.0,0.0,0.0
438
+ 436,Fold paper strip into star,0,0,0.0,0.0,0.0
439
+ 437,Arrange Mahjong tiles,0,0,0.0,0.0,0.0
440
+ 438,Rearrange Mahjong tiles,0,0,0.0,0.0,0.0
441
+ 439,Adjust Mahjong tiles,0,0,0.0,0.0,0.0
442
+ 440,Reach for Mahjong tiles,0,0,0.0,0.0,0.0
443
+ 441,Rearrange Mahjong tile,0,2,0.0,0.0,0.0
444
+ 442,Adjust Mahjong tile,0,0,0.0,0.0,0.0
445
+ 443,Align Mahjong tiles,0,0,0.0,0.0,0.0
446
+ 444,Move Mahjong tile,0,0,0.0,0.0,0.0
447
+ 445,Realign Mahjong tiles,0,2,0.0,0.0,0.0
448
+ 446,Cut cardboard square,0,0,0.0,0.0,0.0
449
+ 447,Trim cardboard piece,0,0,0.0,0.0,0.0
450
+ 448,Pick up cereal boxes,0,0,0.0,0.0,0.0
451
+ 449,Carry cereal boxes,0,6,0.0,0.0,0.0
452
+ 450,Carry cereal towards aisle,0,6,0.0,0.0,0.0
453
+ 451,Carry pasta box towards aisle,0,2,0.0,0.0,0.0
454
+ 452,Pick up container from box,0,3,0.0,0.0,0.0
455
+ 453,Hold container,0,144,0.0,0.0,0.0
456
+ 454,Reach for items in box,0,3,0.0,0.0,0.0
457
+ 455,Pick up grocery item,0,8,0.0,0.0,0.0
458
+ 456,Carry item to shelf,0,3,0.0,0.0,0.0
459
+ 457,Move to stock products,0,1,0.0,0.0,0.0
460
+ 458,Wipe shelf surface,0,0,0.0,0.0,0.0
461
+ 459,Move cardboard box,0,0,0.0,0.0,0.0
462
+ 460,Place snack on shelf,0,0,0.0,0.0,0.0
463
+ 461,Retrieve snack from container,0,0,0.0,0.0,0.0
464
+ 462,Pick up gift box,0,0,0.0,0.0,0.0
465
+ 463,Pick up next gift box,0,0,0.0,0.0,0.0
466
+ 464,Pick up snack pouch,0,1,0.0,0.0,0.0
467
+ 465,Place snack pouch in container,0,0,0.0,0.0,0.0
468
+ 466,Reach for snack pouch,0,0,0.0,0.0,0.0
469
+ 467,Move storage bin,0,0,0.0,0.0,0.0
470
+ 468,Reach for shelf,0,0,0.0,0.0,0.0
471
+ 469,Hold bin and move through aisle,0,0,0.0,0.0,0.0
472
+ 470,Remove storage bin from shelf,0,1,0.0,0.0,0.0
473
+ 471,Reach for empty shelf space,0,0,0.0,0.0,0.0
474
+ 472,Grasp plastic bag on shelf,0,0,0.0,0.0,0.0
475
+ 473,Remove plastic container from shelf,0,261,0.0,0.0,0.0
476
+ 474,Arrange plastic containers,0,0,0.0,0.0,0.0
477
+ 475,Retrieve another container,0,0,0.0,0.0,0.0
478
+ 476,Arrange container on shelf,0,0,0.0,0.0,0.0
479
+ 477,Hold smartphone,42,0,0.0,0.0,0.0
480
+ 478,Place smartphone on desk,0,0,0.0,0.0,0.0
481
+ 479,Reach for water bottle,0,0,0.0,0.0,0.0
482
+ 480,Hold scissors,0,0,0.0,0.0,0.0
483
+ 481,Cut newspaper,0,0,0.0,0.0,0.0
484
+ 482,Continue cutting newspaper,0,0,0.0,0.0,0.0
485
+ 483,Place scissors on table,0,0,0.0,0.0,0.0
486
+ 484,Move scissors away,0,0,0.0,0.0,0.0
487
+ 485,Place scissors down,0,0,0.0,0.0,0.0
488
+ 486,Arrange tiles into row,0,0,0.0,0.0,0.0
489
+ 487,Adjust tile row alignment,0,0,0.0,0.0,0.0
490
+ 488,Adjust Mahjong tile alignment,0,0,0.0,0.0,0.0
491
+ 489,Adjust Mahjong tile on the stack,0,0,0.0,0.0,0.0
492
+ 490,Pick up Mahjong tile,0,0,0.0,0.0,0.0
493
+ 491,Place Mahjong tile on the stack,0,0,0.0,0.0,0.0
494
+ 492,Place Mahjong tile on stack,0,0,0.0,0.0,0.0
495
+ 493,Hold ruler and draw line,0,0,0.0,0.0,0.0
496
+ 494,Draw line,0,0,0.0,0.0,0.0
497
+ 495,Mark lines with pen along ruler,0,0,0.0,0.0,0.0
498
+ 496,Hold ruler and mark cardboard,0,0,0.0,0.0,0.0
499
+ 497,Hold ruler and marker,0,0,0.0,0.0,0.0
500
+ 498,Inspect charging case,0,14,0.0,0.0,0.0
501
+ 499,Place charging case down,0,0,0.0,0.0,0.0
502
+ 500,Hold paper strip,0,0,0.0,0.0,0.0
503
+ 501,Measure and mark cardboard,0,0,0.0,0.0,0.0
504
+ 502,Hold and align cardboard,0,0,0.0,0.0,0.0
505
+ 503,Position cardboard tube,0,0,0.0,0.0,0.0
506
+ 504,Cut cardboard strip with scissors,0,0,0.0,0.0,0.0
507
+ 505,Scroll on smartphone,0,0,0.0,0.0,0.0
508
+ 506,Tap smartphone screen,0,0,0.0,0.0,0.0
509
+ 507,Scroll through photo gallery,0,0,0.0,0.0,0.0
510
+ 508,Typing message on smartphone,0,0,0.0,0.0,0.0
511
+ 509,Typing on smartphone,0,0,0.0,0.0,0.0
512
+ 510,Tapping smartphone screen,0,0,0.0,0.0,0.0
513
+ 511,Putting away smartphone,0,0,0.0,0.0,0.0
514
+ 512,Stop measuring and put down tools,0,0,0.0,0.0,0.0
515
+ 513,Draw line with pen,0,0,0.0,0.0,0.0
516
+ 514,Prepare to draw lines,0,1,0.0,0.0,0.0
517
+ 515,Remove ruler and marker,0,1,0.0,0.0,0.0
518
+ 516,Align ruler and mark cardboard,0,0,0.0,0.0,0.0
519
+ 517,Walking through classroom,0,0,0.0,0.0,0.0
520
+ 518,Assemble cardboard pieces,0,0,0.0,0.0,0.0
521
+ 519,Move marker away,0,0,0.0,0.0,0.0
522
+ 520,Arrange cardboard piece,0,0,0.0,0.0,0.0
523
+ 521,Position ruler and mark cardboard,0,0,0.0,0.0,0.0
524
+ 522,Place canned good on shelf,0,0,0.0,0.0,0.0
525
+ 523,Move canned goods container,0,0,0.0,0.0,0.0
526
+ 524,Position container near shelf,0,0,0.0,0.0,0.0
527
+ 525,Adjust container on shelf,0,0,0.0,0.0,0.0
528
+ 526,Pick up canned food,13,0,0.0,0.0,0.0
529
+ 527,Place canned food in container,0,0,0.0,0.0,0.0
530
+ 528,Adjust cans in container,0,0,0.0,0.0,0.0
531
+ 529,Adjust cans in tray,0,0,0.0,0.0,0.0
532
+ 530,Adjusting canned goods on shelf,0,0,0.0,0.0,0.0
533
+ 531,Align canned goods on shelf,0,0,0.0,0.0,0.0
534
+ 532,Place canned food on shelf,69,0,0.0,0.0,0.0
535
+ 533,Reach for next canned food item,0,0,0.0,0.0,0.0
536
+ 534,Wipe the plastic jar,0,195,0.0,0.0,0.0
537
+ 535,Finish wiping and inspect jar,0,1,0.0,0.0,0.0
538
+ 536,Inspect jar,0,18,0.0,0.0,0.0
539
+ 537,Pick up tin can,0,0,0.0,0.0,0.0
540
+ 538,Hold items and inspect shelf,0,58,0.0,0.0,0.0
541
+ 539,Move cardboard,0,3,0.0,0.0,0.0
542
+ 540,Stabilize cardboard,0,6,0.0,0.0,0.0
543
+ 541,Stabilize ruler,0,3,0.0,0.0,0.0
544
+ 542,Labeling cardboard squares,0,15,0.0,0.0,0.0
545
+ 543,Moving cardboard square,0,1,0.0,0.0,0.0
546
+ 544,Labeling cardboard square,0,46,0.0,0.0,0.0
547
+ 545,Starting to label next square,0,6,0.0,0.0,0.0
548
+ 546,Placing labeled cardboard square,0,2,0.0,0.0,0.0
549
+ 547,Labeling cardboard piece,0,13,0.0,0.0,0.0
550
+ 548,Move cardboard piece,0,2,0.0,0.0,0.0
551
+ 549,Reach for next piece,0,1,0.0,0.0,0.0
552
+ 550,Marking cardboard with pen,0,2,0.0,0.0,0.0
553
+ 551,Repositioning ruler and cardboard,0,1,0.0,0.0,0.0
554
+ 552,Folding cardboard,0,1,0.0,0.0,0.0
555
+ 553,Place cardboard piece on stack,0,32,0.0,0.0,0.0
556
+ 554,Arrange buttons on the table,0,11,0.0,0.0,0.0
557
+ 555,Arrange buttons,33,18,1.0,0.5454545454545454,0.7058823529411764
558
+ 556,Sorting buttons,0,3,0.0,0.0,0.0
559
+ 557,Sort orange buttons,0,85,0.0,0.0,0.0
560
+ 558,Sort orange button,0,14,0.0,0.0,0.0
561
+ 559,Move hand over button pile,0,1,0.0,0.0,0.0
562
+ 560,Move orange buttons,0,17,0.0,0.0,0.0
563
+ 561,Arrange orange buttons,0,83,0.0,0.0,0.0
564
+ 562,Pick up stapler,0,9,0.0,0.0,0.0
565
+ 563,Drawing grid line with ruler,0,0,0.0,0.0,0.0
566
+ 564,Drawing grid line with pen and ruler,0,0,0.0,0.0,0.0
567
+ 565,Draw grid line with pen,0,2,0.0,0.0,0.0
568
+ 566,Pick up cardboard,0,2,0.0,0.0,0.0
569
+ 567,Draw grid line,0,0,0.0,0.0,0.0
570
+ 568,Drawing grid line,0,0,0.0,0.0,0.0
571
+ 569,Manipulate paper star,0,16,0.0,0.0,0.0
572
+ 570,Fold paper star,0,18,0.0,0.0,0.0
573
+ 571,Reach for beads,0,49,0.0,0.0,0.0
574
+ 572,Sort purple beads,0,0,0.0,0.0,0.0
575
+ 573,Write on paper,13,0,0.0,0.0,0.0
576
+ 574,Gathering star beads,0,1,0.0,0.0,0.0
577
+ 575,Sort beads by hand,0,8,0.0,0.0,0.0
578
+ 576,Pick up can,3,0,0.0,0.0,0.0
579
+ 577,Hold tray of canned goods,0,0,0.0,0.0,0.0
580
+ 578,Position tray,0,0,0.0,0.0,0.0
581
+ 579,Sort canned goods in tray,0,0,0.0,0.0,0.0
582
+ 580,Carry crate of cans,0,0,0.0,0.0,0.0
583
+ 581,Move can towards shelf,0,0,0.0,0.0,0.0
584
+ 582,Wipe item,0,0,0.0,0.0,0.0
585
+ 583,Place item back,0,0,0.0,0.0,0.0
586
+ 584,Wipe retail item,0,0,0.0,0.0,0.0
587
+ 585,Reach for retail item,0,0,0.0,0.0,0.0
588
+ 586,Grasp retail item,0,4,0.0,0.0,0.0
589
+ 587,Adjust retail items on shelf,0,0,0.0,0.0,0.0
590
+ 588,Align and place retail item,0,0,0.0,0.0,0.0
591
+ 589,Arrange items on shelf,0,0,0.0,0.0,0.0
592
+ 590,Pick up pink water bottle,0,3,0.0,0.0,0.0
593
+ 591,Place down pink water bottle,0,2,0.0,0.0,0.0
594
+ 592,Place star in row,0,0,0.0,0.0,0.0
595
+ 593,Reach for star,0,4,0.0,0.0,0.0
596
+ 594,Retrieve star,0,1,0.0,0.0,0.0
597
+ 595,Pick up star,0,1,0.0,0.0,0.0
598
+ 596,Hold recording sheet and pen,0,0,0.0,0.0,0.0
599
+ 597,Record star count,0,0,0.0,0.0,0.0
600
+ 598,Hold pen and paper,0,0,0.0,0.0,0.0
601
+ 599,Observe paper and count objects,0,0,0.0,0.0,0.0
602
+ 600,Write count on paper,17,139,0.0,0.0,0.0
603
+ 601,Place pen on table,0,0,0.0,0.0,0.0
604
+ 602,View content on smartphone,0,1,0.0,0.0,0.0
605
+ 603,Resume writing on paper,0,14,0.0,0.0,0.0
606
+ 604,Pick up paper star,0,1,0.0,0.0,0.0
607
+ 605,Place paper star in row,0,0,0.0,0.0,0.0
608
+ 606,Manipulate star,0,0,0.0,0.0,0.0
609
+ 607,Arrange paper stars,0,0,0.0,0.0,0.0
610
+ 608,Cut cardboard grid,0,0,0.0,0.0,0.0
611
+ 609,Pick up small item,0,0,0.0,0.0,0.0
612
+ 610,Walking to sink,0,0,0.0,0.0,0.0
613
+ 611,Washing hands,0,0,0.0,0.0,0.0
614
+ 612,Finish washing hands,0,0,0.0,0.0,0.0
615
+ 613,Pick up paper towel,0,0,0.0,0.0,0.0
616
+ 614,Dry hands,0,0,0.0,0.0,0.0
617
+ 615,Begin folding paper strip,0,0,0.0,0.0,0.0
618
+ 616,Fold paper strip into a star,0,8,0.0,0.0,0.0
619
+ 617,Prepare paper strip,0,0,0.0,0.0,0.0
620
+ 618,Continue folding paper strip,0,19,0.0,0.0,0.0
621
+ 619,Fold lucky star,0,0,0.0,0.0,0.0
622
+ 620,Manipulate folded paper star,0,1,0.0,0.0,0.0
623
+ 621,Grasp paper strip,0,0,0.0,0.0,0.0
624
+ 622,Sort colored tiles,0,0,0.0,0.0,0.0
625
+ 623,Pick up colored tile,0,0,0.0,0.0,0.0
626
+ 624,Place colored tile,0,0,0.0,0.0,0.0
627
+ 625,Sort tiles,0,0,0.0,0.0,0.0
628
+ 626,Sort tiles by color,0,0,0.0,0.0,0.0
629
+ 627,Write on notepad,0,49,0.0,0.0,0.0
630
+ 628,Writing on notepad,0,18,0.0,0.0,0.0
631
+ 629,Reaching for beads,0,2,0.0,0.0,0.0
632
+ 630,Cut section from newspaper,0,0,0.0,0.0,0.0
633
+ 631,Tear newspaper,0,0,0.0,0.0,0.0
634
+ 632,Hold newspaper,0,0,0.0,0.0,0.0
635
+ 633,Hold and align newspaper,0,0,0.0,0.0,0.0
636
+ 634,Fold newspaper,0,0,0.0,0.0,0.0
637
+ 635,Reposition newspaper,0,0,0.0,0.0,0.0
638
+ 636,Cut along the edge of the newspaper,0,0,0.0,0.0,0.0
639
+ 637,Cut along the newspaper edge,0,0,0.0,0.0,0.0
640
+ 638,Browsing mobile phone,0,0,0.0,0.0,0.0
641
+ 639,Browse mobile phone,0,0,0.0,0.0,0.0
642
+ 640,Cut newspaper with scissors,0,0,0.0,0.0,0.0
643
+ 641,Sort blue star-shaped pieces,0,3,0.0,0.0,0.0
644
+ 642,Sort small plastic pieces,0,0,0.0,0.0,0.0
645
+ 643,Reach for more pieces,0,0,0.0,0.0,0.0
646
+ 644,Sort plastic pieces,0,0,0.0,0.0,0.0
647
+ 645,Move pieces into box,0,0,0.0,0.0,0.0
648
+ 646,Gather pieces into box,0,0,0.0,0.0,0.0
649
+ 647,Typing on phone,0,0,0.0,0.0,0.0
650
+ 648,Scrolling and viewing content on phone,0,0,0.0,0.0,0.0
651
+ 649,Pick up item from shelf,0,0,0.0,0.0,0.0
652
+ 650,Pick up charging cable,0,0,0.0,0.0,0.0
653
+ 651,Pick up electronic item,0,0,0.0,0.0,0.0
654
+ 652,Wipe electronic item,0,1,0.0,0.0,0.0
655
+ 653,Place item in bag,0,1,0.0,0.0,0.0
656
+ 654,Inspect smartphone box,0,2,0.0,0.0,0.0
657
+ 655,Hold smartphone box,0,0,0.0,0.0,0.0
658
+ 656,Examine product,0,0,0.0,0.0,0.0
659
+ 657,Pick up another canned item,0,6,0.0,0.0,0.0
660
+ 658,Carry plastic container,0,5,0.0,0.0,0.0
661
+ 659,Reach for another container,0,7,0.0,0.0,0.0
662
+ 660,Release container,0,15,0.0,0.0,0.0
663
+ 661,Pick up storage container,0,7,0.0,0.0,0.0
664
+ 662,Move container toward shelf,0,8,0.0,0.0,0.0
665
+ 663,Position container on shelf,0,8,0.0,0.0,0.0
666
+ 664,Remove lid from container,0,7,0.0,0.0,0.0
667
+ 665,Pick up canned goods,0,7,0.0,0.0,0.0
668
+ 666,Place canned goods in container,0,5,0.0,0.0,0.0
669
+ 667,Pick up next product from bin,0,4,0.0,0.0,0.0
670
+ 668,Move bin,0,32,0.0,0.0,0.0
671
+ 669,Walking along the aisle,0,0,0.0,0.0,0.0
672
+ 670,Move plastic storage bin,0,0,0.0,0.0,0.0
673
+ 671,Place canned food in bin,0,0,0.0,0.0,0.0
674
+ 672,Hold container of canned food,0,0,0.0,0.0,0.0
675
+ 673,Move towards aisle,0,0,0.0,0.0,0.0
676
+ 674,Approach restocking supplies,0,0,0.0,0.0,0.0
677
+ 675,Pick up plastic container,0,0,0.0,0.0,0.0
678
+ 676,Move along the shelves,0,0,0.0,0.0,0.0
679
+ 677,Forming quilled paper shape,0,0,0.0,0.0,0.0
680
+ 678,Manipulate quilled paper shape,0,0,0.0,0.0,0.0
681
+ 679,Place quilled paper shape,0,0,0.0,0.0,0.0
682
+ 680,Retrieve paper strip,0,0,0.0,0.0,0.0
683
+ 681,Select paper strip,0,0,0.0,0.0,0.0
684
+ 682,Transition to standing position,0,0,0.0,0.0,0.0
685
+ 683,Observe paper quilling station,0,0,0.0,0.0,0.0
686
+ 684,Sort quilled paper pieces,0,1,0.0,0.0,0.0
687
+ 685,Walk towards storage area,0,2,0.0,0.0,0.0
688
+ 686,Hold device and cable,0,0,0.0,0.0,0.0
689
+ 687,Move piece to pile,0,0,0.0,0.0,0.0
690
+ 688,Manipulate quilled paper,0,1,0.0,0.0,0.0
691
+ 689,Pick up and sort cardboard,0,0,0.0,0.0,0.0
692
+ 690,Sort and arrange cardboard pieces,0,0,0.0,0.0,0.0
693
+ 691,Move camera over surface,0,0,0.0,0.0,0.0
694
+ 692,Observe sorting progress,0,5,0.0,0.0,0.0
695
+ 693,Reach for cardboard piece,0,1,0.0,0.0,0.0
696
+ 694,Lock phone,0,0,0.0,0.0,0.0
697
+ 695,Sort and stack cardboard pieces,0,0,0.0,0.0,0.0
698
+ 696,Mark list with pen,0,0,0.0,0.0,0.0
699
+ 697,Adjust bead piles,0,0,0.0,0.0,0.0
700
+ 698,Sort blue beads,0,2,0.0,0.0,0.0
701
+ 699,Place down pen,0,1,0.0,0.0,0.0
702
+ 700,Move away from desk,0,0,0.0,0.0,0.0
703
+ 701,Walking through the office,0,3,0.0,0.0,0.0
704
+ 702,Resume sorting blue beads,0,0,0.0,0.0,0.0
705
+ 703,Fold cardboard shape,0,4,0.0,0.0,0.0
706
+ 704,Reach for cardboard box,0,9,0.0,0.0,0.0
707
+ 705,Reach for object,0,5,0.0,0.0,0.0
708
+ 706,Release cardboard shape,0,2,0.0,0.0,0.0
709
+ 707,Reposition hands,0,0,0.0,0.0,0.0
710
+ 708,Rolling paper strip,0,0,0.0,0.0,0.0
711
+ 709,Finishing coil,0,2,0.0,0.0,0.0
712
+ 710,Start folding paper strip,0,0,0.0,0.0,0.0
713
+ 711,Folding paper strip,0,4,0.0,0.0,0.0
714
+ 712,Positioning paper strip,0,5,0.0,0.0,0.0
715
+ 713,Manipulate quilling paper,0,0,0.0,0.0,0.0
716
+ 714,Walk towards workspace,0,0,0.0,0.0,0.0
717
+ 715,Interaction with coworker,0,0,0.0,0.0,0.0
718
+ 716,Walk through workspace,0,0,0.0,0.0,0.0
719
+ 717,Manipulate small object,0,0,0.0,0.0,0.0
720
+ 718,Manipulate paper quilling piece,0,3,0.0,0.0,0.0
721
+ 719,Hold quilled paper piece,0,4,0.0,0.0,0.0
722
+ 720,Pull paper strip,0,10,0.0,0.0,0.0
723
+ 721,Hold and align paper strip,0,1,0.0,0.0,0.0
724
+ 722,Hold and rotate paper strip,0,1,0.0,0.0,0.0
725
+ 723,Marking cardboard piece,30,0,0.0,0.0,0.0
726
+ 724,Hold and mark cardboard piece,0,0,0.0,0.0,0.0
727
+ 725,Organize cardboard pieces,15,2,0.0,0.0,0.0
728
+ 726,Walking towards workstation,0,0,0.0,0.0,0.0
729
+ 727,Move to desk,0,0,0.0,0.0,0.0
730
+ 728,Sort small objects,0,0,0.0,0.0,0.0
731
+ 729,Gathering items,0,0,0.0,0.0,0.0
732
+ 730,Place items on table,0,0,0.0,0.0,0.0
733
+ 731,Gathering colored beads,0,0,0.0,0.0,0.0
734
+ 732,Arrange beads by color,0,0,0.0,0.0,0.0
735
+ 733,Sort star-shaped objects by color,0,0,0.0,0.0,0.0
736
+ 734,Sort star-shaped objects,0,0,0.0,0.0,0.0
737
+ 735,Sort yellow star-shaped objects,0,0,0.0,0.0,0.0
738
+ 736,Sort purple star-shaped objects,0,0,0.0,0.0,0.0
739
+ 737,View phone screen,0,0,0.0,0.0,0.0
740
+ 738,Viewing phone screen,0,0,0.0,0.0,0.0
741
+ 739,Initiate star folding,0,0,0.0,0.0,0.0
742
+ 740,Reach for next canned product,0,0,0.0,0.0,0.0
743
+ 741,Place jar in box,0,11,0.0,0.0,0.0
744
+ 742,Place pickle jar in box,0,6,0.0,0.0,0.0
745
+ 743,Grasp product from shelf,0,0,0.0,0.0,0.0
746
+ 744,Place red button,0,0,0.0,0.0,0.0
747
+ 745,Move and place black buttons,0,9,0.0,0.0,0.0
748
+ 746,Arrange red buttons,0,1,0.0,0.0,0.0
749
+ 747,Adjust red button position,0,0,0.0,0.0,0.0
750
+ 748,Withdraw hand from buttons,0,1,0.0,0.0,0.0
751
+ 749,Arrive at a different workstation,0,0,0.0,0.0,0.0
752
+ 750,Move vacuum cleaner hose,0,2,0.0,0.0,0.0
753
+ 751,Place smartphone on cardboard,0,6,0.0,0.0,0.0
754
+ 752,Reach into bag,0,0,0.0,0.0,0.0
755
+ 753,Organize products,0,1,0.0,0.0,0.0
756
+ 754,Close cardboard box,0,3,0.0,0.0,0.0
757
+ 755,Pick up item,0,0,0.0,0.0,0.0
758
+ 756,Stand up and walk away,0,6,0.0,0.0,0.0
759
+ 757,Interact with colleagues,0,0,0.0,0.0,0.0
760
+ 758,Moving hand towards cardboard stack,0,3,0.0,0.0,0.0
761
+ 759,Put down water bottle,0,0,0.0,0.0,0.0
762
+ 760,Placing piece on stack,0,0,0.0,0.0,0.0
763
+ 761,Reach for and pick up smartphone,0,2,0.0,0.0,0.0
764
+ 762,Move cardboard to pile,0,0,0.0,0.0,0.0
765
+ 763,Fold cardboard sheet,0,0,0.0,0.0,0.0
766
+ 764,Reach for shelving divider,0,2,0.0,0.0,0.0
767
+ 765,Rearrange shelf item,0,2,0.0,0.0,0.0
768
+ 766,Arrange paper strips,0,7,0.0,0.0,0.0
769
+ 767,Place down strip,0,10,0.0,0.0,0.0
770
+ 768,Move puzzle piece,0,2,0.0,0.0,0.0
771
+ 769,Cap marker,0,1,0.0,0.0,0.0
772
+ 770,Combine bead piles,0,0,0.0,0.0,0.0
773
+ 771,Draw lines with pen and ruler,0,0,0.0,0.0,0.0
774
+ 772,Put down phone,0,6,0.0,0.0,0.0
775
+ 773,Pick up pasta box,0,1,0.0,0.0,0.0
776
+ 774,Place gift box into bin,0,0,0.0,0.0,0.0
777
+ 775,Remove plastic container from storage box,0,3,0.0,0.0,0.0
778
+ 776,Hold ruler,0,0,0.0,0.0,0.0
779
+ 777,Move pen away,0,3,0.0,0.0,0.0
780
+ 778,Place crate on floor,0,0,0.0,0.0,0.0
781
+ 779,Place smartphone on table,0,3,0.0,0.0,0.0
782
+ 780,Discard paper towel,0,0,0.0,0.0,0.0
783
+ 781,Release paper star,0,2,0.0,0.0,0.0
784
+ 782,Place phone on table,0,1,0.0,0.0,0.0
785
+ 783,Scrolling or navigating on phone,0,1,0.0,0.0,0.0
786
+ 784,Hold electronic item,0,0,0.0,0.0,0.0
787
+ 785,Inspect electronic item,0,0,0.0,0.0,0.0
788
+ 786,Move pineapple chips,0,0,0.0,0.0,0.0
789
+ 787,Mark paper list,0,0,0.0,0.0,0.0
790
+ 788,Placing phone down,0,4,0.0,0.0,0.0
791
+ 789,Pick up nut bar box,0,0,0.0,0.0,0.0
792
+ 790,Pick up plastic bin,0,0,0.0,0.0,0.0
793
+ 791,Pick up pickle jar,0,4,0.0,0.0,0.0
794
+ 792,Pick up product from shelf,0,2,0.0,0.0,0.0
795
+ 793,Place jar into shelf box,0,7,0.0,0.0,0.0
796
+ 794,Wipe grocery shelf,0,0,0.0,0.0,0.0
797
+ 795,Rearrange buttons,0,1,0.0,0.0,0.0
798
+ 796,Release button,0,0,0.0,0.0,0.0
799
+ 797,Pick up orange button,0,1,0.0,0.0,0.0
800
+ 798,Arrange small buttons,0,7,0.0,0.0,0.0
801
+ 799,Align buttons,0,1,0.0,0.0,0.0
802
+ 800,Look around the table,0,0,0.0,0.0,0.0
803
+ 801,Align red buttons,0,0,0.0,0.0,0.0
804
+ 802,Reach for black button,0,6,0.0,0.0,0.0
805
+ 803,Reach for buttons,0,8,0.0,0.0,0.0
806
+ 804,Place and align button,0,0,0.0,0.0,0.0
807
+ 805,Move hand,0,0,0.0,0.0,0.0
808
+ 806,Move button to line,0,0,0.0,0.0,0.0
809
+ 807,Reach for utility knife,0,5,0.0,0.0,0.0
810
+ 808,Place down paper pieces,0,0,0.0,0.0,0.0
811
+ 809,Switch to scissors,0,0,0.0,0.0,0.0
812
+ 810,Place phone on shelf,0,1,0.0,0.0,0.0
813
+ 811,Inspect product lid,0,7,0.0,0.0,0.0
814
+ 812,Sweep floor debris,0,1,0.0,0.0,0.0
815
+ 813,Adjust grip on container,0,0,0.0,0.0,0.0
816
+ 814,Manipulate paper piece,0,1,0.0,0.0,0.0
817
+ 815,Hold quilled paper coil,0,5,0.0,0.0,0.0
818
+ 816,Place scissors aside,0,0,0.0,0.0,0.0
819
+ 817,Finish placing cardboard cutouts,0,4,0.0,0.0,0.0
820
+ 818,Fold cut cardboard,0,7,0.0,0.0,0.0
821
+ 819,Look away,0,0,0.0,0.0,0.0
822
+ 820,Pick up cut cardboard piece,0,1,0.0,0.0,0.0
823
+ 821,Reposition scissors,0,2,0.0,0.0,0.0
824
+ 822,Hold cardboard piece,7,4,0.0,0.0,0.0
825
+ 823,Picking up stock,0,0,0.0,0.0,0.0
826
+ 824,Carry container,0,3,0.0,0.0,0.0
827
+ 825,Positioning cardboard on workspace,0,1,0.0,0.0,0.0
828
+ 826,Stop sorting stars,0,0,0.0,0.0,0.0
829
+ 827,Place knife down,0,0,0.0,0.0,0.0
830
+ 828,Search for puzzle piece,20,3,0.0,0.0,0.0
831
+ 829,Lift pen and shift ruler,0,0,0.0,0.0,0.0
832
+ 830,Moving ruler,0,0,0.0,0.0,0.0
833
+ 831,Hold beads,19,0,0.0,0.0,0.0
834
+ 832,Adjusting fabric position,0,0,0.0,0.0,0.0
835
+ 833,Pick up new cardboard piece,24,0,0.0,0.0,0.0
836
+ 834,Gather cardboard pieces,0,0,0.0,0.0,0.0
837
+ 835,Hold electronic accessory,0,0,0.0,0.0,0.0
838
+ 836,Pick up electronic accessory,0,0,0.0,0.0,0.0
839
+ 837,Place accessory box,0,0,0.0,0.0,0.0
840
+ 838,Release product on shelf,0,0,0.0,0.0,0.0
841
+ 839,Pick up new product from box,0,0,0.0,0.0,0.0
842
+ 840,Pick up shopping bag,0,0,0.0,0.0,0.0
843
+ 841,Move to shelf,3,0,0.0,0.0,0.0
844
+ 842,Grasp snack package,0,0,0.0,0.0,0.0
845
+ 843,Place snack in box,0,7,0.0,0.0,0.0
846
+ 844,Place snack packages on shelf,0,7,0.0,0.0,0.0
847
+ 845,Reach for snack package,0,0,0.0,0.0,0.0
848
+ 846,Reach for item,0,0,0.0,0.0,0.0
849
+ 847,Organize item on shelf,0,0,0.0,0.0,0.0
850
+ 848,Place pen on cardboard,0,0,0.0,0.0,0.0
851
+ 849,Adjust cardboard divider,0,2,0.0,0.0,0.0
852
+ 850,Place finished star on table,0,4,0.0,0.0,0.0
853
+ 851,Inspect shelf,0,2,0.0,0.0,0.0
854
+ 852,Pick up snack packs,0,4,0.0,0.0,0.0
855
+ 853,Move to shelf base,0,3,0.0,0.0,0.0
856
+ 854,Place gift box on shelf,0,0,0.0,0.0,0.0
857
+ 855,Place snack pouch on shelf,0,0,0.0,0.0,0.0
858
+ 856,Sort Mahjong tiles,0,1,0.0,0.0,0.0
859
+ 857,Pick up charging case,0,8,0.0,0.0,0.0
860
+ 858,Place ruler on cardboard,0,0,0.0,0.0,0.0
861
+ 859,Reposition tools,0,0,0.0,0.0,0.0
862
+ 860,Position scissors for next cut,0,0,0.0,0.0,0.0
863
+ 861,Tapping on smartphone screen,0,0,0.0,0.0,0.0
864
+ 862,Positioning ruler on cardboard,0,5,0.0,0.0,0.0
865
+ 863,Placing labeled square,0,3,0.0,0.0,0.0
866
+ 864,Switching marker,0,4,0.0,0.0,0.0
867
+ 865,Placing pen on table,0,3,0.0,0.0,0.0
868
+ 866,Manipulate cardboard sheet,0,0,0.0,0.0,0.0
869
+ 867,Interact with smartphone,21,2,0.0,0.0,0.0
870
+ 868,Pick up retail item,0,0,0.0,0.0,0.0
871
+ 869,Adjust retail item position,0,0,0.0,0.0,0.0
872
+ 870,Observe surroundings,0,4,0.0,0.0,0.0
873
+ 871,Manipulate paper stars,0,1,0.0,0.0,0.0
874
+ 872,Pick up power bank,0,0,0.0,0.0,0.0
875
+ 873,Rub hands together,0,1,0.0,0.0,0.0
876
+ 874,Place star on table,0,2,0.0,0.0,0.0
877
+ 875,Gather pieces,0,0,0.0,0.0,0.0
878
+ 876,Select another item,0,3,0.0,0.0,0.0
879
+ 877,Place container on floor,0,3,0.0,0.0,0.0
880
+ 878,Place storage container on floor,0,3,0.0,0.0,0.0
881
+ 879,Reorganize bin contents,0,2,0.0,0.0,0.0
882
+ 880,Observe stocking,0,1,0.0,0.0,0.0
883
+ 881,Manipulate quilled paper strips,0,0,0.0,0.0,0.0
884
+ 882,Move blue beads,0,6,0.0,0.0,0.0
885
+ 883,Place controller on table,0,2,0.0,0.0,0.0
886
+ 884,Selecting new paper strip,0,3,0.0,0.0,0.0
887
+ 885,Grasp electronic object,0,1,0.0,0.0,0.0
888
+ 886,Reach for paper strip,0,5,0.0,0.0,0.0
889
+ 887,Reach for canned food,0,0,0.0,0.0,0.0
890
+ 888,Hold blue product box,0,0,0.0,0.0,0.0
891
+ 889,Inspect product,0,0,0.0,0.0,0.0
892
+ 890,Clean shelf,0,0,0.0,0.0,0.0
893
+ 891,Walk towards shelf,0,0,0.0,0.0,0.0
894
+ 892,Select product from box,0,0,0.0,0.0,0.0
895
+ 893,Wipe ketchup bottle,0,0,0.0,0.0,0.0
896
+ 894,Place ketchup bottle on shelf,0,0,0.0,0.0,0.0
897
+ 895,Draw line with marker,0,0,0.0,0.0,0.0
898
+ 896,Draw straight line,0,0,0.0,0.0,0.0
899
+ 897,Mark straight line,0,0,0.0,0.0,0.0
900
+ 898,Pick up small cardboard piece,0,0,0.0,0.0,0.0
901
+ 899,Walk through office,0,0,0.0,0.0,0.0
902
+ 900,Cut cardboard along line,0,0,0.0,0.0,0.0
903
+ 901,Reposition hands and ruler,0,0,0.0,0.0,0.0
904
+ 902,Align ruler with crease,0,0,0.0,0.0,0.0
905
+ 903,Press fold,0,0,0.0,0.0,0.0
906
+ 904,Cut cardboard strip with utility knife,0,0,0.0,0.0,0.0
907
+ 905,Pick up dustpan,17,0,0.0,0.0,0.0
908
+ 906,Hold container lid,25,0,0.0,0.0,0.0
909
+ 907,Move towards the stove,9,0,0.0,0.0,0.0
910
+ 908,Open stove pot lid,20,0,0.0,0.0,0.0
911
+ 909,Closing the door,8,0,0.0,0.0,0.0
912
+ 910,Picking up bottle,11,0,0.0,0.0,0.0
913
+ 911,Wipe kitchen counter,16,0,0.0,0.0,0.0
914
+ 912,Move towards kitchen area,15,0,0.0,0.0,0.0
915
+ 913,Place cloth on floor,6,0,0.0,0.0,0.0
916
+ 914,Reach for cleaning supplies,18,0,0.0,0.0,0.0
917
+ 915,Remove cleaning bottle,11,0,0.0,0.0,0.0
918
+ 916,Washing hands in sink,10,0,0.0,0.0,0.0
919
+ 917,Grasping cleaning cloth,7,0,0.0,0.0,0.0
920
+ 918,Wiping countertop,11,0,0.0,0.0,0.0
921
+ 919,Lift pot lid,9,0,0.0,0.0,0.0
922
+ 920,Stir contents,8,0,0.0,0.0,0.0
923
+ 921,Place lid back,9,0,0.0,0.0,0.0
924
+ 922,Adjust pot position,6,0,0.0,0.0,0.0
925
+ 923,Move pot,7,0,0.0,0.0,0.0
926
+ 924,Place towel,16,0,0.0,0.0,0.0
927
+ 925,Start cutting,7,0,0.0,0.0,0.0
928
+ 926,Cut along the marked line,51,0,0.0,0.0,0.0
929
+ 927,Pick up item from bin,0,0,0.0,0.0,0.0
930
+ 928,Hold item,0,0,0.0,0.0,0.0
931
+ 929,Check smart watch,0,0,0.0,0.0,0.0
932
+ 930,Pick up jar,0,0,0.0,0.0,0.0
933
+ 931,Pick up sauce bottle,0,0,0.0,0.0,0.0
934
+ 932,Place sauce bottle on shelf,0,0,0.0,0.0,0.0
935
+ 933,Hold empty container,0,0,0.0,0.0,0.0
936
+ 934,Assess shelf arrangement,0,0,0.0,0.0,0.0
937
+ 935,Pick up bottle,0,0,0.0,0.0,0.0
938
+ 936,Release foam strip,0,0,0.0,0.0,0.0
939
+ 937,Observe craft layout,0,0,0.0,0.0,0.0
940
+ 938,Reach for foam strips,0,0,0.0,0.0,0.0
941
+ 939,Adjust foam strip,0,0,0.0,0.0,0.0
942
+ 940,Align foam strip,0,0,0.0,0.0,0.0
943
+ 941,Attach foam strip,0,0,0.0,0.0,0.0
944
+ 942,Curve foam strip into loop,0,0,0.0,0.0,0.0
945
+ 943,Press ends of foam strip together,0,0,0.0,0.0,0.0
946
+ 944,Position yellow foam piece on strip,0,0,0.0,0.0,0.0
947
+ 945,Press foam strip,0,0,0.0,0.0,0.0
948
+ 946,Fold foam piece,0,0,0.0,0.0,0.0
949
+ 947,Pinch foam strips,0,0,0.0,0.0,0.0
950
+ 948,Pull blue foam strip,0,0,0.0,0.0,0.0
951
+ 949,Tear blue foam strip,0,0,0.0,0.0,0.0
952
+ 950,Pick up blue foam piece,0,0,0.0,0.0,0.0
953
+ 951,Tear blue foam piece,0,0,0.0,0.0,0.0
954
+ 952,Tear off blue foam piece,0,0,0.0,0.0,0.0
955
+ 953,Peel foam strip,0,0,0.0,0.0,0.0
956
+ 954,Move small blue foam piece towards the strip,0,0,0.0,0.0,0.0
957
+ 955,Align blue strip,0,0,0.0,0.0,0.0
958
+ 956,Press blue strip,0,0,0.0,0.0,0.0
959
+ 957,Position blue strip,0,0,0.0,0.0,0.0
960
+ 958,Lift blue strip,0,0,0.0,0.0,0.0
961
+ 959,Hold blue strip,0,0,0.0,0.0,0.0
962
+ 960,Peel blue strip,0,0,0.0,0.0,0.0
963
+ 961,Align paper strip,0,0,0.0,0.0,0.0
964
+ 962,Interlock paper strips,0,0,0.0,0.0,0.0
965
+ 963,Turn away from table,0,0,0.0,0.0,0.0
966
+ 964,Touch phone and paper strip,0,0,0.0,0.0,0.0
967
+ 965,Attach material to paper strip,0,0,0.0,0.0,0.0
968
+ 966,Pick up tool,0,0,0.0,0.0,0.0
969
+ 967,Walk through the room,0,0,0.0,0.0,0.0
970
+ 968,Walk down hallway,0,0,0.0,0.0,0.0
971
+ 969,Reach for door handle,0,0,0.0,0.0,0.0
972
+ 970,Grasp door handle,0,0,0.0,0.0,0.0
973
+ 971,Walk to table,0,0,0.0,0.0,0.0
974
+ 972,Pick up supplies from box,0,0,0.0,0.0,0.0
975
+ 973,Approach work table,0,0,0.0,0.0,0.0
976
+ 974,Touch colleague's back,0,0,0.0,0.0,0.0
977
+ 975,Position the chair,0,0,0.0,0.0,0.0
978
+ 976,Observe and walk through store,15,0,0.0,0.0,0.0
979
+ 977,Inspect shelf condition,27,0,0.0,0.0,0.0
980
+ 978,Approach boxes,12,0,0.0,0.0,0.0
981
+ 979,Reach for wire hangers,13,0,0.0,0.0,0.0
982
+ 980,Extract wire hangers from box,30,0,0.0,0.0,0.0
983
+ 981,Bundle display hooks,22,0,0.0,0.0,0.0
984
+ 982,Release hook,14,0,0.0,0.0,0.0
985
+ 983,Move through aisle,10,0,0.0,0.0,0.0
986
+ 984,Pick up items from the shopping bag,23,0,0.0,0.0,0.0
987
+ 985,Place items on the shelf,6,0,0.0,0.0,0.0
988
+ 986,Release cardboard piece and gesture,16,0,0.0,0.0,0.0
989
+ 987,Move marker and adjust hand,8,0,0.0,0.0,0.0
990
+ 988,Identify next cardboard piece,21,0,0.0,0.0,0.0
991
+ 989,Observe and pause,11,0,0.0,0.0,0.0
992
+ 990,Resume observation,4,0,0.0,0.0,0.0
993
+ 991,Reach for and examine canned goods,0,0,0.0,0.0,0.0
994
+ 992,Examine canned goods,0,0,0.0,0.0,0.0
995
+ 993,Select and pick up a canned item,0,0,0.0,0.0,0.0
996
+ 994,Place item back on shelf,0,0,0.0,0.0,0.0
997
+ 995,Inspect Dior gift box,0,0,0.0,0.0,0.0
998
+ 996,Move along the shelf,0,0,0.0,0.0,0.0
999
+ 997,Select a bottle,0,0,0.0,0.0,0.0
1000
+ 998,Place bottle back on shelf,0,0,0.0,0.0,0.0
1001
+ 999,Pick up another bottle,0,0,0.0,0.0,0.0
1002
+ 1000,Release bottle,0,0,0.0,0.0,0.0
1003
+ 1001,Inspect bottle,0,0,0.0,0.0,0.0
1004
+ 1002,Inspect almond package,0,0,0.0,0.0,0.0
1005
+ 1003,Scan supermarket shelves,0,0,0.0,0.0,0.0
1006
+ 1004,Move along the supermarket aisle,0,0,0.0,0.0,0.0
1007
+ 1005,Reach for canned goods,0,0,0.0,0.0,0.0
1008
+ 1006,Touch canned goods,0,0,0.0,0.0,0.0
1009
+ 1007,Manipulate cardboard shape,0,0,0.0,0.0,0.0
1010
+ 1008,Hold small cardboard pieces,0,0,0.0,0.0,0.0
1011
+ 1009,Prepare to place cardboard,0,0,0.0,0.0,0.0
1012
+ 1010,Reach for next can,18,0,0.0,0.0,0.0
1013
+ 1011,Hold canned food,24,0,0.0,0.0,0.0
1014
+ 1012,Retrieve next canned food item,17,0,0.0,0.0,0.0
1015
+ 1013,Align canned food on shelf,9,0,0.0,0.0,0.0
1016
+ 1014,Retrieve canned food from box,12,0,0.0,0.0,0.0
1017
+ 1015,Place another canned food on shelf,11,0,0.0,0.0,0.0
1018
+ 1016,Adjust canned food on shelf,9,0,0.0,0.0,0.0
1019
+ 1017,Move hand away from shelf,8,0,0.0,0.0,0.0
1020
+ 1018,Hold earbud case,21,0,0.0,0.0,0.0
1021
+ 1019,sort craft materials,36,0,0.0,0.0,0.0
1022
+ 1020,Manipulate craft piece,38,0,0.0,0.0,0.0
1023
+ 1021,Manipulate craft paper strips,33,0,0.0,0.0,0.0
1024
+ 1022,Operate smartphone,40,0,0.0,0.0,0.0
1025
+ 1023,Release smartphone,7,0,0.0,0.0,0.0
1026
+ 1024,Sort small craft pieces,39,0,0.0,0.0,0.0
1027
+ 1025,Hold product package,0,0,0.0,0.0,0.0
1028
+ 1026,Check phone,0,0,0.0,0.0,0.0
1029
+ 1027,Hold charging cable,0,0,0.0,0.0,0.0
1030
+ 1028,Hold items in hand,0,0,0.0,0.0,0.0
1031
+ 1029,Hold and examine item,0,0,0.0,0.0,0.0
1032
+ 1030,Remove item from bag,0,0,0.0,0.0,0.0
1033
+ 1031,Pick up pack from shelf,0,0,0.0,0.0,0.0
1034
+ 1032,fold purple ribbon,0,0,0.0,0.0,0.0
1035
+ 1033,Fold ribbon,0,0,0.0,0.0,0.0
1036
+ 1034,Hold small piece of ribbon,0,0,0.0,0.0,0.0
1037
+ 1035,Position ribbon piece,0,0,0.0,0.0,0.0
1038
+ 1036,Manipulate ribbon piece,0,0,0.0,0.0,0.0
1039
+ 1037,Place ribbon onto project,0,0,0.0,0.0,0.0
1040
+ 1038,Fold and manipulate ribbon,0,0,0.0,0.0,0.0
1041
+ 1039,Manipulate ribbon knot,0,0,0.0,0.0,0.0
1042
+ 1040,Secure ribbon with needle,0,0,0.0,0.0,0.0
1043
+ 1041,Open paper lantern,29,0,0.0,0.0,0.0
1044
+ 1042,Fold paper lantern,9,0,0.0,0.0,0.0
1045
+ 1043,Grasp lantern,15,0,0.0,0.0,0.0
1046
+ 1044,Grasp lantern component,15,0,0.0,0.0,0.0
1047
+ 1045,Align paper lantern edges,29,0,0.0,0.0,0.0
1048
+ 1046,Release lantern,13,0,0.0,0.0,0.0
1049
+ 1047,Pick up packaged paper lantern component,12,0,0.0,0.0,0.0
1050
+ 1048,Handle paper lantern component,19,0,0.0,0.0,0.0
1051
+ 1049,Open folded paper lantern,21,0,0.0,0.0,0.0
1052
+ 1050,Hold paper lantern,19,0,0.0,0.0,0.0
1053
+ 1051,Apply adhesive tape to lantern,14,0,0.0,0.0,0.0
1054
+ 1052,Remove paper lantern part from packaging,16,0,0.0,0.0,0.0
1055
+ 1053,Remove plastic packaging,8,0,0.0,0.0,0.0
1056
+ 1054,Open paper lantern component,24,0,0.0,0.0,0.0
1057
+ 1055,Expand paper lantern,22,0,0.0,0.0,0.0
1058
+ 1056,Align edges of paper lantern,6,0,0.0,0.0,0.0
1059
+ 1057,Mark cardboard with ruler,0,0,0.0,0.0,0.0
1060
+ 1058,Cut along the line,0,0,0.0,0.0,0.0
1061
+ 1059,Release cardboard,0,0,0.0,0.0,0.0
1062
+ 1060,Reposition utility knife,0,0,0.0,0.0,0.0
1063
+ 1061,Tear off cardboard segment,0,0,0.0,0.0,0.0
1064
+ 1062,Browsing smartphone content,0,0,0.0,0.0,0.0
1065
+ 1063,Manipulate small component,0,0,0.0,0.0,0.0
1066
+ 1064,Manipulate component on strip,0,0,0.0,0.0,0.0
1067
+ 1065,Place strip on table,0,0,0.0,0.0,0.0
1068
+ 1066,Manipulate component,0,0,0.0,0.0,0.0
1069
+ 1067,Reach for craft items,18,0,0.0,0.0,0.0
1070
+ 1068,Place hand on table,33,0,0.0,0.0,0.0
1071
+ 1069,Browse smartphone screen,33,0,0.0,0.0,0.0
1072
+ 1070,Scroll smartphone screen,31,0,0.0,0.0,0.0
1073
+ 1071,Put down smartphone,26,0,0.0,0.0,0.0
1074
+ 1072,Place smartphone down,24,0,0.0,0.0,0.0
1075
+ 1073,Record count on notepad,0,0,0.0,0.0,0.0
1076
+ 1074,Count and record paper stars,0,0,0.0,0.0,0.0
1077
+ 1075,Record star count on paper,0,0,0.0,0.0,0.0
1078
+ 1076,Connect cable to device,0,0,0.0,0.0,0.0
1079
+ 1077,Place device on lap,0,0,0.0,0.0,0.0
1080
+ 1078,Count and arrange paper stars,0,0,0.0,0.0,0.0
1081
+ 1079,Count paper stars,0,0,0.0,0.0,0.0
1082
+ 1080,Move hand to paper stars,0,0,0.0,0.0,0.0
1083
+ 1081,Resume counting stars,0,0,0.0,0.0,0.0
1084
+ 1082,Reviewing count record,0,0,0.0,0.0,0.0
1085
+ 1083,Write on paper record,0,0,0.0,0.0,0.0
1086
+ 1084,Update paper record,0,0,0.0,0.0,0.0
1087
+ 1085,Adjust cardboard,0,0,0.0,0.0,0.0
1088
+ 1086,Set down scissors and pick up power bank,0,0,0.0,0.0,0.0
1089
+ 1087,Reposition cardboard for cutting,0,0,0.0,0.0,0.0
1090
+ 1088,Arrange cardboard pieces,0,0,0.0,0.0,0.0
1091
+ 1089,Mark cardboard strip with pen,0,0,0.0,0.0,0.0
1092
+ 1090,Pick up puzzle piece,18,0,0.0,0.0,0.0
1093
+ 1091,Place piece into puzzle,25,0,0.0,0.0,0.0
1094
+ 1092,Manipulate puzzle piece,38,0,0.0,0.0,0.0
1095
+ 1093,Observe puzzle progress,32,0,0.0,0.0,0.0
1096
+ 1094,Reach for puzzle piece,16,0,0.0,0.0,0.0
1097
+ 1095,Attempt to fit puzzle piece,31,0,0.0,0.0,0.0
1098
+ 1096,Sort puzzle pieces,34,0,0.0,0.0,0.0
1099
+ 1097,Walking across the room,17,0,0.0,0.0,0.0
1100
+ 1098,Approaching the table,9,0,0.0,0.0,0.0
1101
+ 1099,Preparing to craft,10,0,0.0,0.0,0.0
1102
+ 1100,Picking up crafting material,12,0,0.0,0.0,0.0
1103
+ 1101,Manipulate material,16,0,0.0,0.0,0.0
1104
+ 1102,Place material,13,0,0.0,0.0,0.0
1105
+ 1103,Manipulate yellow strip,31,0,0.0,0.0,0.0
1106
+ 1104,Manipulating paper strips,22,0,0.0,0.0,0.0
1107
+ 1105,Manipulate bead,23,0,0.0,0.0,0.0
1108
+ 1106,Manipulate beads,22,0,0.0,0.0,0.0
1109
+ 1107,Hold and manipulate paper strip,31,0,0.0,0.0,0.0
1110
+ 1108,Repositioning ruler,0,0,0.0,0.0,0.0
1111
+ 1109,Place down ruler and pen,0,0,0.0,0.0,0.0
1112
+ 1110,Walk through hallway,0,0,0.0,0.0,0.0
1113
+ 1111,Fold cardboard edge,0,0,0.0,0.0,0.0
1114
+ 1112,Pick up marker,0,0,0.0,0.0,0.0
1115
+ 1113,Drop cardboard square into box,0,0,0.0,0.0,0.0
1116
+ 1114,Retrieve hand to table,0,0,0.0,0.0,0.0
1117
+ 1115,Pick up cardboard stack,0,0,0.0,0.0,0.0
1118
+ 1116,Walk with cardboard,0,0,0.0,0.0,0.0
1119
+ 1117,Deposit cardboard squares,0,0,0.0,0.0,0.0
1120
+ 1118,Move away from collection box,0,0,0.0,0.0,0.0
1121
+ 1119,Walking through office hallway,0,0,0.0,0.0,0.0
1122
+ 1120,Grasp cardboard sheet,0,0,0.0,0.0,0.0
1123
+ 1121,Cut cardboard sheet with scissors,0,0,0.0,0.0,0.0
1124
+ 1122,Sort cut cardboard,0,0,0.0,0.0,0.0
1125
+ 1123,Cut cardboard sheet,0,0,0.0,0.0,0.0
1126
+ 1124,Place cardboard square,0,0,0.0,0.0,0.0
1127
+ 1125,Sort buttons,25,0,0.0,0.0,0.0
1128
+ 1126,Arrange buttons in a line,29,0,0.0,0.0,0.0
1129
+ 1127,Sort and arrange buttons,32,0,0.0,0.0,0.0
1130
+ 1128,Sort button,36,0,0.0,0.0,0.0
1131
+ 1129,Sort and adjust button line,29,0,0.0,0.0,0.0
1132
+ 1130,Sort and place buttons,31,0,0.0,0.0,0.0
1133
+ 1131,Walking in the hallway,13,0,0.0,0.0,0.0
1134
+ 1132,Approaching and pressing the door switch,22,0,0.0,0.0,0.0
1135
+ 1133,Entering the VR training room,16,0,0.0,0.0,0.0
1136
+ 1134,Greeting/acknowledging participants,33,0,0.0,0.0,0.0
1137
+ 1135,Move through the training room,20,0,0.0,0.0,0.0
1138
+ 1136,Manipulate plastic strips,34,0,0.0,0.0,0.0
1139
+ 1137,Manipulate plastic strip,37,0,0.0,0.0,0.0
1140
+ 1138,Hold and bend plastic strip,16,0,0.0,0.0,0.0
1141
+ 1139,Bend and manipulate plastic strip,37,0,0.0,0.0,0.0
1142
+ 1140,Fold plastic strip,57,0,0.0,0.0,0.0
1143
+ 1141,Sort buttons by color,0,0,0.0,0.0,0.0
1144
+ 1142,Sort button by color,0,0,0.0,0.0,0.0
1145
+ 1143,Place button in group,0,0,0.0,0.0,0.0
1146
+ 1144,Move away from table,0,0,0.0,0.0,0.0
1147
+ 1145,Return to sorting,0,0,0.0,0.0,0.0
1148
+ 1146,Manipulate paper decoration,41,0,0.0,0.0,0.0
1149
+ 1147,Manipulate paper edge,35,0,0.0,0.0,0.0
1150
+ 1148,Placing paper strip,44,0,0.0,0.0,0.0
1151
+ 1149,Securing paper structure,37,0,0.0,0.0,0.0
1152
+ 1150,Manipulate adhesive strip,44,0,0.0,0.0,0.0
1153
+ 1151,Secure paper edges with adhesive,40,0,0.0,0.0,0.0
1154
+ 1152,Record count,18,0,0.0,0.0,0.0
1155
+ 1153,Sort beads and write count,18,0,0.0,0.0,0.0
1156
+ 1154,Counting and organizing beads,19,0,0.0,0.0,0.0
1157
+ 1155,Pick up star bead,6,0,0.0,0.0,0.0
1158
+ 1156,Place and count bead,27,0,0.0,0.0,0.0
1159
+ 1157,Arrange star beads,15,0,0.0,0.0,0.0
1160
+ 1158,Counting star beads,23,0,0.0,0.0,0.0
1161
+ 1159,Adjust paper,9,0,0.0,0.0,0.0
1162
+ 1160,Gather star beads,13,0,0.0,0.0,0.0
1163
+ 1161,Arrange star beads for counting,16,0,0.0,0.0,0.0
1164
+ 1162,Sort and count beads,27,0,0.0,0.0,0.0
1165
+ 1163,Rinse cloth in sink,4,0,0.0,0.0,0.0
1166
+ 1164,Reposition hand,7,0,0.0,0.0,0.0
1167
+ 1165,Touch foam strip,0,0,0.0,0.0,0.0
1168
+ 1166,Assemble foam strips,0,0,0.0,0.0,0.0
1169
+ 1167,Press foam piece to strip,0,0,0.0,0.0,0.0
1170
+ 1168,Walk towards other aisles,4,0,0.0,0.0,0.0
1171
+ 1169,Place marked piece down,14,0,0.0,0.0,0.0
1172
+ 1170,Gesturing,2,0,0.0,0.0,0.0
1173
+ 1171,Prepare to resume cutting,0,0,0.0,0.0,0.0
1174
+ 1172,Reach for next canned food,3,0,0.0,0.0,0.0
1175
+ 1173,Move hand away,5,0,0.0,0.0,0.0
1176
+ 1174,Sort craft items,6,0,0.0,0.0,0.0
1177
+ 1175,Retrieving more beads,5,0,0.0,0.0,0.0
1178
+ 1176,Pick up yellow item,0,0,0.0,0.0,0.0
1179
+ 1177,Prepare to place bottle on shelf,0,0,0.0,0.0,0.0
1180
+ 1178,Move ruler and tools,0,0,0.0,0.0,0.0
1181
+ 1179,Transition to cutting,0,0,0.0,0.0,0.0
1182
+ 1180,Position utility knife on cardboard,0,0,0.0,0.0,0.0
1183
+ 1181,Place smartphone on stand,3,0,0.0,0.0,0.0
1184
+ 1182,Move dustpan to side,3,0,0.0,0.0,0.0
1185
+ 1183,Walking towards door,3,0,0.0,0.0,0.0
1186
+ 1184,Grasp cleaning bottle,3,0,0.0,0.0,0.0
1187
+ 1185,Pick up next item from bin,0,0,0.0,0.0,0.0
1188
+ 1186,Inspect and place item on shelf,0,0,0.0,0.0,0.0
1189
+ 1187,Place blue foam piece,0,0,0.0,0.0,0.0
1190
+ 1188,Hold foam pieces,0,0,0.0,0.0,0.0
1191
+ 1189,Fold blue strip,0,0,0.0,0.0,0.0
1192
+ 1190,Pick up craft material,0,0,0.0,0.0,0.0
1193
+ 1191,Enter workspace,0,0,0.0,0.0,0.0
1194
+ 1192,Enter the room,0,0,0.0,0.0,0.0
1195
+ 1193,Pull chair,0,0,0.0,0.0,0.0
1196
+ 1194,Observe colleague and workspace,3,0,0.0,0.0,0.0
1197
+ 1195,Pick up Dior gift box,0,0,0.0,0.0,0.0
1198
+ 1196,Place back Dior gift box,0,0,0.0,0.0,0.0
1199
+ 1197,Pick up canned good,0,0,0.0,0.0,0.0
1200
+ 1198,Open earbud case,3,0,0.0,0.0,0.0
1201
+ 1199,Retrieve items from bag,0,0,0.0,0.0,0.0
1202
+ 1200,Adjust lantern string,3,0,0.0,0.0,0.0
1203
+ 1201,Adjust lantern shape,3,0,0.0,0.0,0.0
1204
+ 1202,Pick up electronic device,0,0,0.0,0.0,0.0
1205
+ 1203,Pick up small piece of material,3,0,0.0,0.0,0.0
1206
+ 1204,Use phone while crafting,3,0,0.0,0.0,0.0
1207
+ 1205,Approaching work table,0,0,0.0,0.0,0.0
1208
+ 1206,Set down utility knife,0,0,0.0,0.0,0.0
1209
+ 1207,Prepare to cut cardboard,0,0,0.0,0.0,0.0
1210
+ 1208,Score cardboard,0,0,0.0,0.0,0.0
1211
+ 1209,Move cardboard sheet,0,0,0.0,0.0,0.0
1212
+ 1210,Trim cardboard,0,0,0.0,0.0,0.0
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/predictions.csv ADDED
The diff for this file is too large to render. See raw diff