cy0307 commited on
Commit
69865f3
·
verified ·
1 Parent(s): 3cff18b

Add files using upload-large-folder tool

Browse files
Files changed (45) hide show
  1. TASK_METHOD_20_GAP_AUDIT.md +11 -27
  2. TASK_METHOD_20_RESULT_MATRIX.md +7 -7
  3. assets/charts/episode128_task_model_radar.svg +24 -12
  4. assets/charts/unified_task_model_radar.svg +19 -7
  5. data/artifact_index.json +29 -29
  6. data/episode128_task_model_radar.json +289 -289
  7. data/mirror_parity.json +0 -0
  8. data/omni_model_comparison.json +1 -1
  9. data/public_surface_qa.json +7 -7
  10. data/publication_audit.json +9 -9
  11. data/quality_gates.json +1 -1
  12. data/qwen3_full_parameter_gates.json +1 -1
  13. data/scope_claims_audit.json +1 -1
  14. data/single_episode_task_model_radar.json +2 -2
  15. data/source_alignment_audit.json +1 -1
  16. data/task_method_20_gap_audit.json +38 -267
  17. data/task_method_20_result_matrix.json +181 -181
  18. data/task_surface_integrity.json +1 -1
  19. data/unified_task_model_radar.json +309 -309
  20. data/website_integrity.json +11 -11
  21. docs/data/episode128_task_model_radar.json +289 -289
  22. docs/data/mirror_parity.json +0 -0
  23. docs/data/omni_model_comparison.json +1 -1
  24. docs/data/public_surface_qa.json +7 -7
  25. docs/data/task_surface_integrity.json +1 -1
  26. metrics/artifact_index.json +29 -29
  27. metrics/episode128_task_model_radar.json +289 -289
  28. metrics/mirror_parity.json +0 -0
  29. metrics/omni_model_comparison.json +1 -1
  30. metrics/public_surface_qa.json +7 -7
  31. metrics/publication_audit.json +9 -9
  32. metrics/quality_gates.json +1 -1
  33. metrics/qwen3_full_parameter_gates.json +1 -1
  34. metrics/scope_claims_audit.json +1 -1
  35. metrics/single_episode_task_model_radar.json +2 -2
  36. metrics/source_alignment_audit.json +1 -1
  37. metrics/task_method_20_gap_audit.json +38 -267
  38. metrics/task_method_20_result_matrix.json +181 -181
  39. metrics/task_surface_integrity.json +1 -1
  40. metrics/unified_task_model_radar.json +309 -309
  41. metrics/website_integrity.json +11 -11
  42. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/ranks.csv +0 -0
  43. results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/predictions.csv +0 -0
  44. scripts/build_unified_task_model_radar.py +19 -19
  45. scripts/omni/run_128_task_baselines.py +19 -3
TASK_METHOD_20_GAP_AUDIT.md CHANGED
@@ -1,6 +1,6 @@
1
  # Task Method 20-Result Gap Audit
2
 
3
- Generated: `2026-06-18T12:07:14+00:00`
4
 
5
  This audit is the explicit gap ledger for the 9-method x 20-task result matrix.
6
  It keeps missing cells visible while preserving the rule that a numeric score
@@ -9,8 +9,8 @@ requires a real task target and source artifact.
9
  ## Score Summary
10
 
11
  - Method-task records: `180`
12
- - Numeric scored records: `127`
13
- - Scoreless records: `53`
14
  - Proxy-scored records: `4`
15
  - Source matrix: [`docs/data/task_method_20_result_matrix.json`](docs/data/task_method_20_result_matrix.json)
16
 
@@ -20,8 +20,8 @@ requires a real task target and source artifact.
20
  | --- | --- | --- | --- | --- | --- |
21
  | Minimal | minimal | 20/20 | 0 | 0 | scored: 20 |
22
  | Neural MLP | neural_mlp | 20/20 | 0 | 0 | scored: 20 |
23
- | 128ep Metadata Simple | metadata128_simple | 13/20 | 7 | 0 | scored: 13, unsupported_without_required_target: 7 |
24
- | 128ep Metadata NN | metadata128_neural_mlp | 7/20 | 13 | 0 | not_supported_by_metadata_only_package: 7, scored: 7, unsupported_without_required_target: 6 |
25
  | 128ep Raw Simple | raw128_simple | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
26
  | 128ep Raw NN | raw128_neural_mlp | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
27
  | Qwen3-Omni v6 LoRA | qwen3_omni_v6_lora | 15/20 | 5 | 0 | not_evaluated_in_verified_package: 5, scored: 15 |
@@ -33,61 +33,45 @@ requires a real task target and source artifact.
33
  | Status | Count | Next step |
34
  | --- | --- | --- |
35
  | not_evaluated_in_verified_package | 33 | Generate verified model outputs for this task contract and score them against the held-out labels. |
36
- | not_supported_by_metadata_only_package | 7 | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
37
- | unsupported_without_required_target | 13 | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
38
 
39
  ## Scoreless Records
40
 
41
  | Task | Task label | Method | Status | Required evidence |
42
  | --- | --- | --- | --- | --- |
43
- | 01 | Action Recognition | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
44
- | 02 | Procedure Step Recognition | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
45
  | 02 | Procedure Step Recognition | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
46
- | 04 | Next-Action Prediction | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
47
- | 05 | Hand Trajectory Forecasting | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
48
- | 05 | Hand Trajectory Forecasting | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
49
  | 05 | Hand Trajectory Forecasting | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
50
  | 05 | Hand Trajectory Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
51
  | 05 | Hand Trajectory Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
52
  | 07 | Object Relevance Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
53
  | 08 | Language Grounding | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
54
  | 08 | Language Grounding | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
55
- | 09 | Cross-Modal Retrieval | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
56
- | 09 | Cross-Modal Retrieval | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
57
  | 09 | Cross-Modal Retrieval | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
58
- | 10 | Cross-Modal Reconstruction | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
59
- | 10 | Cross-Modal Reconstruction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
60
  | 10 | Cross-Modal Reconstruction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
61
  | 10 | Cross-Modal Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
62
  | 10 | Cross-Modal Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
63
  | 11 | Temporal Order Verification | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
64
  | 11 | Temporal Order Verification | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
65
- | 12 | Multimodal Synchronization Detection | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
66
- | 12 | Multimodal Synchronization Detection | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
67
  | 12 | Multimodal Synchronization Detection | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
68
  | 12 | Multimodal Synchronization Detection | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
69
- | 13 | Long-Horizon Next-Action Forecasting | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
70
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
71
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
72
- | 14 | Long-Horizon Next-Subtask Forecasting | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
73
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
74
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
75
- | 15 | Interaction Text Prediction | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
76
- | 15 | Interaction Text Prediction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
77
  | 15 | Interaction Text Prediction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
78
  | 15 | Interaction Text Prediction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
79
  | 15 | Interaction Text Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
80
- | 16 | Action-Object Relation Prediction | 128ep Metadata NN | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
81
  | 16 | Action-Object Relation Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
82
  | 17 | Future Object-Set Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
83
  | 17 | Future Object-Set Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
84
- | 18 | IMU-to-Hand Pose Reconstruction | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
85
- | 18 | IMU-to-Hand Pose Reconstruction | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
86
  | 18 | IMU-to-Hand Pose Reconstruction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
87
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
88
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
89
- | 19 | Camera-View Synchronization Retrieval | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
90
- | 19 | Camera-View Synchronization Retrieval | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
91
  | 19 | Camera-View Synchronization Retrieval | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
92
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
93
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
1
  # Task Method 20-Result Gap Audit
2
 
3
+ Generated: `2026-06-18T12:52:47+00:00`
4
 
5
  This audit is the explicit gap ledger for the 9-method x 20-task result matrix.
6
  It keeps missing cells visible while preserving the rule that a numeric score
 
9
  ## Score Summary
10
 
11
  - Method-task records: `180`
12
+ - Numeric scored records: `143`
13
+ - Scoreless records: `37`
14
  - Proxy-scored records: `4`
15
  - Source matrix: [`docs/data/task_method_20_result_matrix.json`](docs/data/task_method_20_result_matrix.json)
16
 
 
20
  | --- | --- | --- | --- | --- | --- |
21
  | Minimal | minimal | 20/20 | 0 | 0 | scored: 20 |
22
  | Neural MLP | neural_mlp | 20/20 | 0 | 0 | scored: 20 |
23
+ | 128ep Aligned Simple | metadata128_simple | 18/20 | 2 | 0 | scored: 18, unsupported_without_required_target: 2 |
24
+ | 128ep Aligned NN | metadata128_neural_mlp | 18/20 | 2 | 0 | not_supported_by_metadata_only_package: 2, scored: 18 |
25
  | 128ep Raw Simple | raw128_simple | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
26
  | 128ep Raw NN | raw128_neural_mlp | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
27
  | Qwen3-Omni v6 LoRA | qwen3_omni_v6_lora | 15/20 | 5 | 0 | not_evaluated_in_verified_package: 5, scored: 15 |
 
33
  | Status | Count | Next step |
34
  | --- | --- | --- |
35
  | not_evaluated_in_verified_package | 33 | Generate verified model outputs for this task contract and score them against the held-out labels. |
36
+ | not_supported_by_metadata_only_package | 2 | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
37
+ | unsupported_without_required_target | 2 | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
38
 
39
  ## Scoreless Records
40
 
41
  | Task | Task label | Method | Status | Required evidence |
42
  | --- | --- | --- | --- | --- |
 
 
43
  | 02 | Procedure Step Recognition | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
 
 
44
  | 05 | Hand Trajectory Forecasting | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
45
  | 05 | Hand Trajectory Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
46
  | 05 | Hand Trajectory Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
47
  | 07 | Object Relevance Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
48
  | 08 | Language Grounding | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
49
  | 08 | Language Grounding | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
 
50
  | 09 | Cross-Modal Retrieval | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
 
51
  | 10 | Cross-Modal Reconstruction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
52
  | 10 | Cross-Modal Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
53
  | 10 | Cross-Modal Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
54
  | 11 | Temporal Order Verification | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
55
  | 11 | Temporal Order Verification | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
 
56
  | 12 | Multimodal Synchronization Detection | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
57
  | 12 | Multimodal Synchronization Detection | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
58
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
59
  | 13 | Long-Horizon Next-Action Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
60
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
61
  | 14 | Long-Horizon Next-Subtask Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
62
+ | 15 | Interaction Text Prediction | 128ep Aligned Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
63
+ | 15 | Interaction Text Prediction | 128ep Aligned NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
64
  | 15 | Interaction Text Prediction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
65
  | 15 | Interaction Text Prediction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
66
  | 15 | Interaction Text Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
67
  | 16 | Action-Object Relation Prediction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
68
  | 17 | Future Object-Set Forecasting | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
69
  | 17 | Future Object-Set Forecasting | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 
 
70
  | 18 | IMU-to-Hand Pose Reconstruction | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
71
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
72
  | 18 | IMU-to-Hand Pose Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
73
+ | 19 | Camera-View Synchronization Retrieval | 128ep Aligned Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
74
+ | 19 | Camera-View Synchronization Retrieval | 128ep Aligned NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
75
  | 19 | Camera-View Synchronization Retrieval | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
76
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
77
  | 19 | Camera-View Synchronization Retrieval | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
TASK_METHOD_20_RESULT_MATRIX.md CHANGED
@@ -8,8 +8,8 @@ Legend: `score` = numeric task score, `proxy` = documented raw128 compact proxy
8
  | --- | ---: | ---: | ---: | ---: | --- |
9
  | Minimal | 20 | 20 | 0 | 0 | scored 20 |
10
  | Neural MLP | 20 | 20 | 0 | 0 | scored 20 |
11
- | 128ep Metadata Simple | 20 | 13 | 0 | 7 | scored 13, unsupported 7 |
12
- | 128ep Metadata NN | 20 | 13 | 0 | 7 | not supported 7, scored 13 |
13
  | 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
14
  | 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
15
  | Qwen3-Omni v6 LoRA | 20 | 15 | 0 | 5 | not evaluated 5, scored 15 |
@@ -22,20 +22,20 @@ Legend: `score` = numeric task score, `proxy` = documented raw128 compact proxy
22
  | 02 | Procedure Step Recognition | score | score | score | score | score | score | score | score | not evaluated |
23
  | 03 | Action Boundary Detection | score | score | score | score | score | score | score | score | score |
24
  | 04 | Next-Action Prediction | score | score | score | score | score | score | score | score | score |
25
- | 05 | Hand Trajectory Forecasting | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
26
  | 06 | Contact State Prediction | score | score | score | score | score | score | score | score | score |
27
  | 07 | Object Relevance Prediction | score | score | score | score | score | score | score | score | not evaluated |
28
  | 08 | Language Grounding | score | score | score | score | score | score | score | not evaluated | not evaluated |
29
- | 09 | Cross-Modal Retrieval | score | score | unsupported | not supported | score | score | score | not evaluated | score |
30
- | 10 | Cross-Modal Reconstruction | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
31
  | 11 | Temporal Order Verification | score | score | score | score | score | score | score | not evaluated | not evaluated |
32
- | 12 | Multimodal Synchronization Detection | score | score | unsupported | not supported | score | score | score | not evaluated | not evaluated |
33
  | 13 | Long-Horizon Next-Action Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
34
  | 14 | Long-Horizon Next-Subtask Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
35
  | 15 | Interaction Text Prediction | score | score | unsupported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
36
  | 16 | Action-Object Relation Prediction | score | score | score | score | score | score | score | score | not evaluated |
37
  | 17 | Future Object-Set Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
38
- | 18 | IMU-to-Hand Pose Reconstruction | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
39
  | 19 | Camera-View Synchronization Retrieval | score | score | unsupported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
40
  | 20 | Time-to-Next-Transition Regression | score | score | score | score | score | score | score | not evaluated | not evaluated |
41
 
 
8
  | --- | ---: | ---: | ---: | ---: | --- |
9
  | Minimal | 20 | 20 | 0 | 0 | scored 20 |
10
  | Neural MLP | 20 | 20 | 0 | 0 | scored 20 |
11
+ | 128ep Aligned Simple | 20 | 18 | 0 | 2 | scored 18, unsupported 2 |
12
+ | 128ep Aligned NN | 20 | 18 | 0 | 2 | not supported 2, scored 18 |
13
  | 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
14
  | 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
15
  | Qwen3-Omni v6 LoRA | 20 | 15 | 0 | 5 | not evaluated 5, scored 15 |
 
22
  | 02 | Procedure Step Recognition | score | score | score | score | score | score | score | score | not evaluated |
23
  | 03 | Action Boundary Detection | score | score | score | score | score | score | score | score | score |
24
  | 04 | Next-Action Prediction | score | score | score | score | score | score | score | score | score |
25
+ | 05 | Hand Trajectory Forecasting | score | score | score | score | score | score | not evaluated | not evaluated | not evaluated |
26
  | 06 | Contact State Prediction | score | score | score | score | score | score | score | score | score |
27
  | 07 | Object Relevance Prediction | score | score | score | score | score | score | score | score | not evaluated |
28
  | 08 | Language Grounding | score | score | score | score | score | score | score | not evaluated | not evaluated |
29
+ | 09 | Cross-Modal Retrieval | score | score | score | score | score | score | score | not evaluated | score |
30
+ | 10 | Cross-Modal Reconstruction | score | score | score | score | score | score | not evaluated | not evaluated | not evaluated |
31
  | 11 | Temporal Order Verification | score | score | score | score | score | score | score | not evaluated | not evaluated |
32
+ | 12 | Multimodal Synchronization Detection | score | score | score | score | score | score | score | not evaluated | not evaluated |
33
  | 13 | Long-Horizon Next-Action Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
34
  | 14 | Long-Horizon Next-Subtask Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
35
  | 15 | Interaction Text Prediction | score | score | unsupported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
36
  | 16 | Action-Object Relation Prediction | score | score | score | score | score | score | score | score | not evaluated |
37
  | 17 | Future Object-Set Forecasting | score | score | score | score | score | score | score | not evaluated | not evaluated |
38
+ | 18 | IMU-to-Hand Pose Reconstruction | score | score | score | score | score | score | not evaluated | not evaluated | not evaluated |
39
  | 19 | Camera-View Synchronization Retrieval | score | score | unsupported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
40
  | 20 | Time-to-Next-Transition Regression | score | score | score | score | score | score | score | not evaluated | not evaluated |
41
 
assets/charts/episode128_task_model_radar.svg CHANGED
assets/charts/unified_task_model_radar.svg CHANGED
data/artifact_index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-18T12:09:24+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
@@ -290,8 +290,8 @@
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
- "bytes": 73236,
294
- "sha256": "76acae0de25d51413e7e6f11021163e7d9909cfe95d65bf6b02e74043d429e2d"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
@@ -599,7 +599,7 @@
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
- "sha256": "ae089cc0df132b63365e03b2157a488b5d1569567c0374d7621bcd347da62c9e"
603
  },
604
  {
605
  "id": "source_alignment_validator",
@@ -719,8 +719,8 @@
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
- "bytes": 230297,
723
- "sha256": "437874b1633e73165e3300f55580394663a44759c848288e696859b98f8aad32"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
@@ -730,8 +730,8 @@
730
  "surface": "website_hf",
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
- "bytes": 50973,
734
- "sha256": "38cb43512f2ac40feeb62333bdea89b3a55e5b48468beb8982cf22536f794ecf"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
@@ -741,8 +741,8 @@
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
- "bytes": 186443,
745
- "sha256": "55e758e8703f406889022976d0ba055181212305c9a7246e899463e0c3c3b554"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
@@ -752,8 +752,8 @@
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
- "bytes": 129242,
756
- "sha256": "64fb700d51f536edf11291799b6173cf9ae8dd7a41178aac348b8207ed4b1e42"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
@@ -763,8 +763,8 @@
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
- "bytes": 4026,
767
- "sha256": "55e949fc30419a52f7f5ec4dd9544a11b253b076f8e3637ec3e92b3d61a89aab"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
@@ -774,8 +774,8 @@
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
- "bytes": 46902,
778
- "sha256": "2b64dbd013625852679f9b91d25c48d1ed197fec727883b4fe37088b2d594784"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
@@ -785,8 +785,8 @@
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
- "bytes": 13387,
789
- "sha256": "d33461eb704f8e92545b6b54d9fc509e617fbacc9ca9894ac851ca9c3dec0fec"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
@@ -796,8 +796,8 @@
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
- "bytes": 51953,
800
- "sha256": "19c001f10319946ef0e4921064f8a012836f29e7c8b272f900c257169faf46a1"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
@@ -818,8 +818,8 @@
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
- "bytes": 45937,
822
- "sha256": "b504b1b9c5cad0caa8c822d5bb2971c1b708251cf7b9ef587a92db2c12751e97"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
@@ -829,8 +829,8 @@
829
  "surface": "repo_hf",
830
  "shows": "Regenerates the direction-aware radar chart and machine-readable metric overlay JSON.",
831
  "exists": true,
832
- "bytes": 52388,
833
- "sha256": "f4803360cfd02383a1942a93a5845308db936b479a5b906719e46e192f3ef142"
834
  },
835
  {
836
  "id": "task_method_20_gap_audit_builder",
@@ -906,8 +906,8 @@
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
- "bytes": 109248,
910
- "sha256": "5e7f3085be5012eb3dda46f9c7b5b7c0ae22d6a0fbce71d6e99dd317fecc12af"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
@@ -1310,7 +1310,7 @@
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
- "bytes": 994053,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
@@ -1620,7 +1620,7 @@
1620
  "shows": "Reader-facing comparison of the single-episode task suite, 128-episode aligned baselines, Qwen3-Omni packages, and Cosmos3 future-window branch.",
1621
  "exists": true,
1622
  "bytes": 15999,
1623
- "sha256": "30053bdea6c417ab02f98d99d8e80cd7e304bc3a9dfacbf599139d3221c02c8f"
1624
  },
1625
  {
1626
  "id": "omni_model_comparison_json",
@@ -1631,7 +1631,7 @@
1631
  "shows": "Machine-readable comparison of the current result versions, per-task aligned baselines, verified Qwen3 packages, and Cosmos3 package.",
1632
  "exists": true,
1633
  "bytes": 81866,
1634
- "sha256": "1c9d4ba370661b0e0cb7104e9a51abdc3fe91a440ae86e748b10b719d1d613cc"
1635
  },
1636
  {
1637
  "id": "cosmos3_nano_verified_summary",
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-18T12:52:48+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
 
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
+ "bytes": 74368,
294
+ "sha256": "6f54bfb963d5102ebd61eb8f8b6d8f6919db673378c9d5940d89ec5ea6f3d4b2"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
 
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
+ "sha256": "8ddadfe15ba8779e82879f965ff50bceb9c573bc942c3ecf176fbf20e5faeaea"
603
  },
604
  {
605
  "id": "source_alignment_validator",
 
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
+ "bytes": 229299,
723
+ "sha256": "30f338139df391c36941da0b759cc237366ee43d006bfff2d2e43481cc2d2a63"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
 
730
  "surface": "website_hf",
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
+ "bytes": 51064,
734
+ "sha256": "52001c8ac081b14827a8a55cae21da8fd32516f81365d7dda1047ef68096eef8"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
 
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
+ "bytes": 185447,
745
+ "sha256": "e9994f42a1e086411748e1233761c84a8dcd564898c216454a8872c2f4d4f213"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
 
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
+ "bytes": 128794,
756
+ "sha256": "1bce6001518b314fc8a5e86eab56521aa9718d09d787765d10caee4d791e9809"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
 
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
+ "bytes": 3954,
767
+ "sha256": "01b21d83954f700e4b061e96b1f58c6af474d79a2caaff1bfcff4854b66722ca"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
 
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
+ "bytes": 35883,
778
+ "sha256": "9336756d67d2488a28c4bb9c282f65230031eeb8dddd087a11fd441d8e61539b"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
 
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
+ "bytes": 10286,
789
+ "sha256": "45969b72e9a3ff8c40d958ea819e725fd4df5d90424ccdffd1c64fd1a5152063"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
 
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
+ "bytes": 53553,
800
+ "sha256": "ec9a8bf0f5814106ddb8e62d0941c7cc07d1b8a29323a61a400319ffe6bd3485"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
 
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
+ "bytes": 47540,
822
+ "sha256": "0c2283a04fe401851b8b313de3ba383d24185262f4c6500d12fa0a3b8c0c4443"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
 
829
  "surface": "repo_hf",
830
  "shows": "Regenerates the direction-aware radar chart and machine-readable metric overlay JSON.",
831
  "exists": true,
832
+ "bytes": 52743,
833
+ "sha256": "e081f88e9f31934b24820c5cbffb957bb235a3275f553e573ab44e5c3d03c99a"
834
  },
835
  {
836
  "id": "task_method_20_gap_audit_builder",
 
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
+ "bytes": 124232,
910
+ "sha256": "dba221a6ed8a6a84602dc21a1055cbb4444c03775f74b55e5d72861941820ac8"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
 
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
+ "bytes": 1059014,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
 
1620
  "shows": "Reader-facing comparison of the single-episode task suite, 128-episode aligned baselines, Qwen3-Omni packages, and Cosmos3 future-window branch.",
1621
  "exists": true,
1622
  "bytes": 15999,
1623
+ "sha256": "dd65ae9077acbce91870b182d701db367a9c79eb287aeee2a1e165ec4915e5f3"
1624
  },
1625
  {
1626
  "id": "omni_model_comparison_json",
 
1631
  "shows": "Machine-readable comparison of the current result versions, per-task aligned baselines, verified Qwen3 packages, and Cosmos3 package.",
1632
  "exists": true,
1633
  "bytes": 81866,
1634
+ "sha256": "dd7a599117defcc1fd783c3134b6b3fc92f2ec2190ea517624cb215b931bd87a"
1635
  },
1636
  {
1637
  "id": "cosmos3_nano_verified_summary",
data/episode128_task_model_radar.json CHANGED
@@ -1,19 +1,19 @@
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
- "scored_method_task_count": 93,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
@@ -21,50 +21,50 @@
21
  "series": [
22
  {
23
  "id": "metadata128_simple",
24
- "label": "128ep Metadata Simple",
25
  "short_label": "128-S",
26
  "color": "#ffd166",
27
- "kind": "partial_128_episode_metadata_baseline",
28
- "scope": "128 selected episodes, JSONL metadata/text only",
29
  "stroke_dasharray": "9 6",
30
- "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
- "scored_task_count": 13,
34
- "covered_task_count": 13,
35
  "proxy_scored_task_count": 0,
36
- "scoreless_task_count": 7,
37
- "unsupported_task_count": 7,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
- "scored": 13,
41
- "unsupported_without_required_target": 7
42
  },
43
- "coverage_fraction": 0.65,
44
  "result_record_fraction": 1.0
45
  },
46
  {
47
  "id": "metadata128_neural_mlp",
48
- "label": "128ep Metadata NN",
49
  "short_label": "128-NN",
50
  "color": "#f472b6",
51
- "kind": "partial_128_episode_metadata_baseline",
52
- "scope": "128 selected episodes, JSONL metadata/text only",
53
  "stroke_dasharray": "3 6",
54
- "method_detail": "128-episode JSONL metadata/text MLP baselines.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
- "scored_task_count": 13,
58
- "covered_task_count": 13,
59
  "proxy_scored_task_count": 0,
60
- "scoreless_task_count": 7,
61
- "unsupported_task_count": 7,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
- "not_supported_by_metadata_only_package": 7,
65
- "scored": 13
66
  },
67
- "coverage_fraction": 0.65,
68
  "result_record_fraction": 1.0
69
  },
70
  {
@@ -205,7 +205,7 @@
205
  "raw": 0.008252821966746326,
206
  "metric_key": "macro_f1",
207
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
208
- "scope": "multi_episode_128_metadata_baseline",
209
  "status": "scored",
210
  "reason": null,
211
  "normalized_score": 0.008252821966746326,
@@ -216,7 +216,7 @@
216
  "raw": 0.004175793689174209,
217
  "metric_key": "macro_f1",
218
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
219
- "scope": "multi_episode_128_metadata_baseline",
220
  "status": "scored",
221
  "reason": null,
222
  "normalized_score": 0.004175793689174209,
@@ -296,7 +296,7 @@
296
  "raw": 0.00019512195121951218,
297
  "metric_key": "macro_f1",
298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
299
- "scope": "multi_episode_128_metadata_baseline",
300
  "status": "scored",
301
  "reason": null,
302
  "normalized_score": 0.00019512195121951218,
@@ -307,7 +307,7 @@
307
  "raw": 7.207207207207208e-05,
308
  "metric_key": "macro_f1",
309
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
310
- "scope": "multi_episode_128_metadata_baseline",
311
  "status": "scored",
312
  "reason": null,
313
  "normalized_score": 7.207207207207208e-05,
@@ -387,7 +387,7 @@
387
  "raw": 0.29652162550029315,
388
  "metric_key": "macro_f1",
389
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
390
- "scope": "multi_episode_128_metadata_baseline",
391
  "status": "scored",
392
  "reason": null,
393
  "normalized_score": 0.29652162550029315,
@@ -398,7 +398,7 @@
398
  "raw": 0.4841733292368365,
399
  "metric_key": "macro_f1",
400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
401
- "scope": "multi_episode_128_metadata_baseline",
402
  "status": "scored",
403
  "reason": null,
404
  "normalized_score": 0.4841733292368365,
@@ -478,7 +478,7 @@
478
  "raw": 0.006514774539765508,
479
  "metric_key": "macro_f1",
480
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
481
- "scope": "multi_episode_128_metadata_baseline",
482
  "status": "scored",
483
  "reason": null,
484
  "normalized_score": 0.006514774539765508,
@@ -489,7 +489,7 @@
489
  "raw": 0.004910507980164745,
490
  "metric_key": "macro_f1",
491
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
492
- "scope": "multi_episode_128_metadata_baseline",
493
  "status": "scored",
494
  "reason": null,
495
  "normalized_score": 0.004910507980164745,
@@ -566,26 +566,26 @@
566
  "raw128_proxy_axis": false,
567
  "values": {
568
  "metadata128_simple": {
569
- "raw": null,
570
  "metric_key": "mpjpe",
571
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
572
- "scope": "multi_episode_128_metadata_baseline",
573
- "status": "unsupported_without_required_target",
574
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package",
575
- "normalized_score": null,
576
- "raw_text": "n/a",
577
- "status_label": "unsupported"
578
  },
579
  "metadata128_neural_mlp": {
580
- "raw": null,
581
  "metric_key": "mpjpe",
582
- "source": null,
583
- "scope": "multi_episode_128_metadata_baseline",
584
- "status": "not_supported_by_metadata_only_package",
585
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
586
- "normalized_score": null,
587
- "raw_text": "n/a",
588
- "status_label": "not supported"
589
  },
590
  "raw128_simple": {
591
  "raw": 0.2729249894618988,
@@ -660,7 +660,7 @@
660
  "raw": 0.4381481308057444,
661
  "metric_key": "macro_f1",
662
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
663
- "scope": "multi_episode_128_metadata_baseline",
664
  "status": "scored",
665
  "reason": null,
666
  "normalized_score": 0.4381481308057444,
@@ -671,7 +671,7 @@
671
  "raw": 0.5682695682695682,
672
  "metric_key": "macro_f1",
673
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
674
- "scope": "multi_episode_128_metadata_baseline",
675
  "status": "scored",
676
  "reason": null,
677
  "normalized_score": 0.5682695682695682,
@@ -751,7 +751,7 @@
751
  "raw": 0.17764578833693304,
752
  "metric_key": "micro_f1",
753
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
754
- "scope": "multi_episode_128_metadata_baseline",
755
  "status": "scored",
756
  "reason": null,
757
  "normalized_score": 0.17764578833693304,
@@ -762,7 +762,7 @@
762
  "raw": 0.18662723837686876,
763
  "metric_key": "micro_f1",
764
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
765
- "scope": "multi_episode_128_metadata_baseline",
766
  "status": "scored",
767
  "reason": null,
768
  "normalized_score": 0.18662723837686876,
@@ -842,7 +842,7 @@
842
  "raw": 0.002332374220713973,
843
  "metric_key": "mrr",
844
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
845
- "scope": "multi_episode_128_metadata_baseline",
846
  "status": "scored",
847
  "reason": null,
848
  "normalized_score": 0.002332374220713973,
@@ -853,7 +853,7 @@
853
  "raw": 0.008236799389123917,
854
  "metric_key": "mrr",
855
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
856
- "scope": "multi_episode_128_metadata_baseline",
857
  "status": "scored",
858
  "reason": null,
859
  "normalized_score": 0.008236799389123917,
@@ -930,26 +930,26 @@
930
  "raw128_proxy_axis": false,
931
  "values": {
932
  "metadata128_simple": {
933
- "raw": null,
934
  "metric_key": "mrr",
935
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
936
- "scope": "multi_episode_128_metadata_baseline",
937
- "status": "unsupported_without_required_target",
938
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package",
939
- "normalized_score": null,
940
- "raw_text": "n/a",
941
- "status_label": "unsupported"
942
  },
943
  "metadata128_neural_mlp": {
944
- "raw": null,
945
  "metric_key": "mrr",
946
- "source": null,
947
- "scope": "multi_episode_128_metadata_baseline",
948
- "status": "not_supported_by_metadata_only_package",
949
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
950
- "normalized_score": null,
951
- "raw_text": "n/a",
952
- "status_label": "not supported"
953
  },
954
  "raw128_simple": {
955
  "raw": 0.003459817497059703,
@@ -1021,26 +1021,26 @@
1021
  "raw128_proxy_axis": false,
1022
  "values": {
1023
  "metadata128_simple": {
1024
- "raw": null,
1025
  "metric_key": "r2",
1026
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1027
- "scope": "multi_episode_128_metadata_baseline",
1028
- "status": "unsupported_without_required_target",
1029
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package",
1030
- "normalized_score": null,
1031
- "raw_text": "n/a",
1032
- "status_label": "unsupported"
1033
  },
1034
  "metadata128_neural_mlp": {
1035
- "raw": null,
1036
  "metric_key": "r2",
1037
- "source": null,
1038
- "scope": "multi_episode_128_metadata_baseline",
1039
- "status": "not_supported_by_metadata_only_package",
1040
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1041
- "normalized_score": null,
1042
- "raw_text": "n/a",
1043
- "status_label": "not supported"
1044
  },
1045
  "raw128_simple": {
1046
  "raw": -1.3450960391924882,
@@ -1115,7 +1115,7 @@
1115
  "raw": 0.4198864140782312,
1116
  "metric_key": "f1",
1117
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1118
- "scope": "multi_episode_128_metadata_baseline",
1119
  "status": "scored",
1120
  "reason": null,
1121
  "normalized_score": 0.4198864140782312,
@@ -1126,7 +1126,7 @@
1126
  "raw": 0.8252408266656923,
1127
  "metric_key": "f1",
1128
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1129
- "scope": "multi_episode_128_metadata_baseline",
1130
  "status": "scored",
1131
  "reason": null,
1132
  "normalized_score": 0.8252408266656923,
@@ -1203,26 +1203,26 @@
1203
  "raw128_proxy_axis": false,
1204
  "values": {
1205
  "metadata128_simple": {
1206
- "raw": null,
1207
  "metric_key": "f1",
1208
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1209
- "scope": "multi_episode_128_metadata_baseline",
1210
- "status": "unsupported_without_required_target",
1211
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone",
1212
- "normalized_score": null,
1213
- "raw_text": "n/a",
1214
- "status_label": "unsupported"
1215
  },
1216
  "metadata128_neural_mlp": {
1217
- "raw": null,
1218
  "metric_key": "f1",
1219
- "source": null,
1220
- "scope": "multi_episode_128_metadata_baseline",
1221
- "status": "not_supported_by_metadata_only_package",
1222
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1223
- "normalized_score": null,
1224
- "raw_text": "n/a",
1225
- "status_label": "not supported"
1226
  },
1227
  "raw128_simple": {
1228
  "raw": 0.4958867673901769,
@@ -1297,7 +1297,7 @@
1297
  "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
- "scope": "multi_episode_128_metadata_baseline",
1301
  "status": "scored",
1302
  "reason": null,
1303
  "normalized_score": 0.004579592783699693,
@@ -1308,7 +1308,7 @@
1308
  "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
- "scope": "multi_episode_128_metadata_baseline",
1312
  "status": "scored",
1313
  "reason": null,
1314
  "normalized_score": 0.0029821307969142615,
@@ -1388,7 +1388,7 @@
1388
  "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
- "scope": "multi_episode_128_metadata_baseline",
1392
  "status": "scored",
1393
  "reason": null,
1394
  "normalized_score": 0.0001206030150753769,
@@ -1399,7 +1399,7 @@
1399
  "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
- "scope": "multi_episode_128_metadata_baseline",
1403
  "status": "scored",
1404
  "reason": null,
1405
  "normalized_score": 2.086049543676662e-05,
@@ -1479,7 +1479,7 @@
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
- "scope": "multi_episode_128_metadata_baseline",
1483
  "status": "unsupported_without_required_target",
1484
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
@@ -1490,9 +1490,9 @@
1490
  "raw": null,
1491
  "metric_key": "macro_f1",
1492
  "source": null,
1493
- "scope": "multi_episode_128_metadata_baseline",
1494
  "status": "not_supported_by_metadata_only_package",
1495
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1496
  "normalized_score": null,
1497
  "raw_text": "n/a",
1498
  "status_label": "not supported"
@@ -1570,7 +1570,7 @@
1570
  "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
- "scope": "multi_episode_128_metadata_baseline",
1574
  "status": "scored",
1575
  "reason": null,
1576
  "normalized_score": 0.0,
@@ -1581,7 +1581,7 @@
1581
  "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
- "scope": "multi_episode_128_metadata_baseline",
1585
  "status": "scored",
1586
  "reason": null,
1587
  "normalized_score": 0.0,
@@ -1661,7 +1661,7 @@
1661
  "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
- "scope": "multi_episode_128_metadata_baseline",
1665
  "status": "scored",
1666
  "reason": null,
1667
  "normalized_score": 0.17656983343047333,
@@ -1672,7 +1672,7 @@
1672
  "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
- "scope": "multi_episode_128_metadata_baseline",
1676
  "status": "scored",
1677
  "reason": null,
1678
  "normalized_score": 0.17418550827844048,
@@ -1749,26 +1749,26 @@
1749
  "raw128_proxy_axis": false,
1750
  "values": {
1751
  "metadata128_simple": {
1752
- "raw": null,
1753
  "metric_key": "mae",
1754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
- "scope": "multi_episode_128_metadata_baseline",
1756
- "status": "unsupported_without_required_target",
1757
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
1758
- "normalized_score": null,
1759
- "raw_text": "n/a",
1760
- "status_label": "unsupported"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
- "raw": null,
1764
  "metric_key": "mae",
1765
- "source": null,
1766
- "scope": "multi_episode_128_metadata_baseline",
1767
- "status": "not_supported_by_metadata_only_package",
1768
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1769
- "normalized_score": null,
1770
- "raw_text": "n/a",
1771
- "status_label": "not supported"
1772
  },
1773
  "raw128_simple": {
1774
  "raw": 0.22941437363624573,
@@ -1843,7 +1843,7 @@
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
- "scope": "multi_episode_128_metadata_baseline",
1847
  "status": "unsupported_without_required_target",
1848
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
@@ -1854,9 +1854,9 @@
1854
  "raw": null,
1855
  "metric_key": "mrr",
1856
  "source": null,
1857
- "scope": "multi_episode_128_metadata_baseline",
1858
  "status": "not_supported_by_metadata_only_package",
1859
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1860
  "normalized_score": null,
1861
  "raw_text": "n/a",
1862
  "status_label": "not supported"
@@ -1934,7 +1934,7 @@
1934
  "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
- "scope": "multi_episode_128_metadata_baseline",
1938
  "status": "scored",
1939
  "reason": null,
1940
  "normalized_score": 0.016864874132806403,
@@ -1945,7 +1945,7 @@
1945
  "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
- "scope": "multi_episode_128_metadata_baseline",
1949
  "status": "scored",
1950
  "reason": null,
1951
  "normalized_score": 0.25411768748242325,
@@ -2016,7 +2016,7 @@
2016
  "task_id": "timeline_action",
2017
  "task_label": "Action Recognition",
2018
  "series_id": "metadata128_simple",
2019
- "method": "128ep Metadata Simple",
2020
  "status": "scored",
2021
  "status_label": "scored",
2022
  "scored": true,
@@ -2026,7 +2026,7 @@
2026
  "normalized_score": 0.008252821966746326,
2027
  "metric_key": "macro_f1",
2028
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2029
- "scope": "multi_episode_128_metadata_baseline",
2030
  "reason": null
2031
  },
2032
  {
@@ -2034,7 +2034,7 @@
2034
  "task_id": "timeline_action",
2035
  "task_label": "Action Recognition",
2036
  "series_id": "metadata128_neural_mlp",
2037
- "method": "128ep Metadata NN",
2038
  "status": "scored",
2039
  "status_label": "scored",
2040
  "scored": true,
@@ -2044,7 +2044,7 @@
2044
  "normalized_score": 0.004175793689174209,
2045
  "metric_key": "macro_f1",
2046
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2047
- "scope": "multi_episode_128_metadata_baseline",
2048
  "reason": null
2049
  },
2050
  {
@@ -2142,7 +2142,7 @@
2142
  "task_id": "timeline_subtask",
2143
  "task_label": "Procedure Step Recognition",
2144
  "series_id": "metadata128_simple",
2145
- "method": "128ep Metadata Simple",
2146
  "status": "scored",
2147
  "status_label": "scored",
2148
  "scored": true,
@@ -2152,7 +2152,7 @@
2152
  "normalized_score": 0.00019512195121951218,
2153
  "metric_key": "macro_f1",
2154
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2155
- "scope": "multi_episode_128_metadata_baseline",
2156
  "reason": null
2157
  },
2158
  {
@@ -2160,7 +2160,7 @@
2160
  "task_id": "timeline_subtask",
2161
  "task_label": "Procedure Step Recognition",
2162
  "series_id": "metadata128_neural_mlp",
2163
- "method": "128ep Metadata NN",
2164
  "status": "scored",
2165
  "status_label": "scored",
2166
  "scored": true,
@@ -2170,7 +2170,7 @@
2170
  "normalized_score": 7.207207207207208e-05,
2171
  "metric_key": "macro_f1",
2172
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2173
- "scope": "multi_episode_128_metadata_baseline",
2174
  "reason": null
2175
  },
2176
  {
@@ -2268,7 +2268,7 @@
2268
  "task_id": "transition_detection",
2269
  "task_label": "Action Boundary Detection",
2270
  "series_id": "metadata128_simple",
2271
- "method": "128ep Metadata Simple",
2272
  "status": "scored",
2273
  "status_label": "scored",
2274
  "scored": true,
@@ -2278,7 +2278,7 @@
2278
  "normalized_score": 0.29652162550029315,
2279
  "metric_key": "macro_f1",
2280
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2281
- "scope": "multi_episode_128_metadata_baseline",
2282
  "reason": null
2283
  },
2284
  {
@@ -2286,7 +2286,7 @@
2286
  "task_id": "transition_detection",
2287
  "task_label": "Action Boundary Detection",
2288
  "series_id": "metadata128_neural_mlp",
2289
- "method": "128ep Metadata NN",
2290
  "status": "scored",
2291
  "status_label": "scored",
2292
  "scored": true,
@@ -2296,7 +2296,7 @@
2296
  "normalized_score": 0.4841733292368365,
2297
  "metric_key": "macro_f1",
2298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2299
- "scope": "multi_episode_128_metadata_baseline",
2300
  "reason": null
2301
  },
2302
  {
@@ -2394,7 +2394,7 @@
2394
  "task_id": "next_action",
2395
  "task_label": "Next-Action Prediction",
2396
  "series_id": "metadata128_simple",
2397
- "method": "128ep Metadata Simple",
2398
  "status": "scored",
2399
  "status_label": "scored",
2400
  "scored": true,
@@ -2404,7 +2404,7 @@
2404
  "normalized_score": 0.006514774539765508,
2405
  "metric_key": "macro_f1",
2406
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
2407
- "scope": "multi_episode_128_metadata_baseline",
2408
  "reason": null
2409
  },
2410
  {
@@ -2412,7 +2412,7 @@
2412
  "task_id": "next_action",
2413
  "task_label": "Next-Action Prediction",
2414
  "series_id": "metadata128_neural_mlp",
2415
- "method": "128ep Metadata NN",
2416
  "status": "scored",
2417
  "status_label": "scored",
2418
  "scored": true,
@@ -2422,7 +2422,7 @@
2422
  "normalized_score": 0.004910507980164745,
2423
  "metric_key": "macro_f1",
2424
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
2425
- "scope": "multi_episode_128_metadata_baseline",
2426
  "reason": null
2427
  },
2428
  {
@@ -2520,36 +2520,36 @@
2520
  "task_id": "hand_trajectory_forecast",
2521
  "task_label": "Hand Trajectory Forecasting",
2522
  "series_id": "metadata128_simple",
2523
- "method": "128ep Metadata Simple",
2524
- "status": "unsupported_without_required_target",
2525
- "status_label": "unsupported",
2526
- "scored": false,
2527
  "proxy_scored": false,
2528
- "raw": null,
2529
- "raw_text": "n/a",
2530
- "normalized_score": null,
2531
  "metric_key": "mpjpe",
2532
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
2533
- "scope": "multi_episode_128_metadata_baseline",
2534
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package"
2535
  },
2536
  {
2537
  "task_number": 5,
2538
  "task_id": "hand_trajectory_forecast",
2539
  "task_label": "Hand Trajectory Forecasting",
2540
  "series_id": "metadata128_neural_mlp",
2541
- "method": "128ep Metadata NN",
2542
- "status": "not_supported_by_metadata_only_package",
2543
- "status_label": "not supported",
2544
- "scored": false,
2545
  "proxy_scored": false,
2546
- "raw": null,
2547
- "raw_text": "n/a",
2548
- "normalized_score": null,
2549
  "metric_key": "mpjpe",
2550
- "source": null,
2551
- "scope": "multi_episode_128_metadata_baseline",
2552
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2553
  },
2554
  {
2555
  "task_number": 5,
@@ -2646,7 +2646,7 @@
2646
  "task_id": "contact_prediction",
2647
  "task_label": "Contact State Prediction",
2648
  "series_id": "metadata128_simple",
2649
- "method": "128ep Metadata Simple",
2650
  "status": "scored",
2651
  "status_label": "scored",
2652
  "scored": true,
@@ -2656,7 +2656,7 @@
2656
  "normalized_score": 0.4381481308057444,
2657
  "metric_key": "macro_f1",
2658
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
2659
- "scope": "multi_episode_128_metadata_baseline",
2660
  "reason": null
2661
  },
2662
  {
@@ -2664,7 +2664,7 @@
2664
  "task_id": "contact_prediction",
2665
  "task_label": "Contact State Prediction",
2666
  "series_id": "metadata128_neural_mlp",
2667
- "method": "128ep Metadata NN",
2668
  "status": "scored",
2669
  "status_label": "scored",
2670
  "scored": true,
@@ -2674,7 +2674,7 @@
2674
  "normalized_score": 0.5682695682695682,
2675
  "metric_key": "macro_f1",
2676
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
2677
- "scope": "multi_episode_128_metadata_baseline",
2678
  "reason": null
2679
  },
2680
  {
@@ -2772,7 +2772,7 @@
2772
  "task_id": "object_relevance",
2773
  "task_label": "Object Relevance Prediction",
2774
  "series_id": "metadata128_simple",
2775
- "method": "128ep Metadata Simple",
2776
  "status": "scored",
2777
  "status_label": "scored",
2778
  "scored": true,
@@ -2782,7 +2782,7 @@
2782
  "normalized_score": 0.17764578833693304,
2783
  "metric_key": "micro_f1",
2784
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
2785
- "scope": "multi_episode_128_metadata_baseline",
2786
  "reason": null
2787
  },
2788
  {
@@ -2790,7 +2790,7 @@
2790
  "task_id": "object_relevance",
2791
  "task_label": "Object Relevance Prediction",
2792
  "series_id": "metadata128_neural_mlp",
2793
- "method": "128ep Metadata NN",
2794
  "status": "scored",
2795
  "status_label": "scored",
2796
  "scored": true,
@@ -2800,7 +2800,7 @@
2800
  "normalized_score": 0.18662723837686876,
2801
  "metric_key": "micro_f1",
2802
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
2803
- "scope": "multi_episode_128_metadata_baseline",
2804
  "reason": null
2805
  },
2806
  {
@@ -2898,7 +2898,7 @@
2898
  "task_id": "caption_grounding",
2899
  "task_label": "Language Grounding",
2900
  "series_id": "metadata128_simple",
2901
- "method": "128ep Metadata Simple",
2902
  "status": "scored",
2903
  "status_label": "scored",
2904
  "scored": true,
@@ -2908,7 +2908,7 @@
2908
  "normalized_score": 0.002332374220713973,
2909
  "metric_key": "mrr",
2910
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
2911
- "scope": "multi_episode_128_metadata_baseline",
2912
  "reason": null
2913
  },
2914
  {
@@ -2916,7 +2916,7 @@
2916
  "task_id": "caption_grounding",
2917
  "task_label": "Language Grounding",
2918
  "series_id": "metadata128_neural_mlp",
2919
- "method": "128ep Metadata NN",
2920
  "status": "scored",
2921
  "status_label": "scored",
2922
  "scored": true,
@@ -2926,7 +2926,7 @@
2926
  "normalized_score": 0.008236799389123917,
2927
  "metric_key": "mrr",
2928
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
2929
- "scope": "multi_episode_128_metadata_baseline",
2930
  "reason": null
2931
  },
2932
  {
@@ -3024,36 +3024,36 @@
3024
  "task_id": "cross_modal_retrieval",
3025
  "task_label": "Cross-Modal Retrieval",
3026
  "series_id": "metadata128_simple",
3027
- "method": "128ep Metadata Simple",
3028
- "status": "unsupported_without_required_target",
3029
- "status_label": "unsupported",
3030
- "scored": false,
3031
  "proxy_scored": false,
3032
- "raw": null,
3033
- "raw_text": "n/a",
3034
- "normalized_score": null,
3035
  "metric_key": "mrr",
3036
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3037
- "scope": "multi_episode_128_metadata_baseline",
3038
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package"
3039
  },
3040
  {
3041
  "task_number": 9,
3042
  "task_id": "cross_modal_retrieval",
3043
  "task_label": "Cross-Modal Retrieval",
3044
  "series_id": "metadata128_neural_mlp",
3045
- "method": "128ep Metadata NN",
3046
- "status": "not_supported_by_metadata_only_package",
3047
- "status_label": "not supported",
3048
- "scored": false,
3049
  "proxy_scored": false,
3050
- "raw": null,
3051
- "raw_text": "n/a",
3052
- "normalized_score": null,
3053
  "metric_key": "mrr",
3054
- "source": null,
3055
- "scope": "multi_episode_128_metadata_baseline",
3056
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3057
  },
3058
  {
3059
  "task_number": 9,
@@ -3150,36 +3150,36 @@
3150
  "task_id": "modality_reconstruction",
3151
  "task_label": "Cross-Modal Reconstruction",
3152
  "series_id": "metadata128_simple",
3153
- "method": "128ep Metadata Simple",
3154
- "status": "unsupported_without_required_target",
3155
- "status_label": "unsupported",
3156
- "scored": false,
3157
  "proxy_scored": false,
3158
- "raw": null,
3159
- "raw_text": "n/a",
3160
- "normalized_score": null,
3161
  "metric_key": "r2",
3162
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
3163
- "scope": "multi_episode_128_metadata_baseline",
3164
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package"
3165
  },
3166
  {
3167
  "task_number": 10,
3168
  "task_id": "modality_reconstruction",
3169
  "task_label": "Cross-Modal Reconstruction",
3170
  "series_id": "metadata128_neural_mlp",
3171
- "method": "128ep Metadata NN",
3172
- "status": "not_supported_by_metadata_only_package",
3173
- "status_label": "not supported",
3174
- "scored": false,
3175
  "proxy_scored": false,
3176
- "raw": null,
3177
- "raw_text": "n/a",
3178
- "normalized_score": null,
3179
  "metric_key": "r2",
3180
- "source": null,
3181
- "scope": "multi_episode_128_metadata_baseline",
3182
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3183
  },
3184
  {
3185
  "task_number": 10,
@@ -3276,7 +3276,7 @@
3276
  "task_id": "temporal_order",
3277
  "task_label": "Temporal Order Verification",
3278
  "series_id": "metadata128_simple",
3279
- "method": "128ep Metadata Simple",
3280
  "status": "scored",
3281
  "status_label": "scored",
3282
  "scored": true,
@@ -3286,7 +3286,7 @@
3286
  "normalized_score": 0.4198864140782312,
3287
  "metric_key": "f1",
3288
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
3289
- "scope": "multi_episode_128_metadata_baseline",
3290
  "reason": null
3291
  },
3292
  {
@@ -3294,7 +3294,7 @@
3294
  "task_id": "temporal_order",
3295
  "task_label": "Temporal Order Verification",
3296
  "series_id": "metadata128_neural_mlp",
3297
- "method": "128ep Metadata NN",
3298
  "status": "scored",
3299
  "status_label": "scored",
3300
  "scored": true,
@@ -3304,7 +3304,7 @@
3304
  "normalized_score": 0.8252408266656923,
3305
  "metric_key": "f1",
3306
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
3307
- "scope": "multi_episode_128_metadata_baseline",
3308
  "reason": null
3309
  },
3310
  {
@@ -3402,36 +3402,36 @@
3402
  "task_id": "misalignment_detection",
3403
  "task_label": "Multimodal Synchronization Detection",
3404
  "series_id": "metadata128_simple",
3405
- "method": "128ep Metadata Simple",
3406
- "status": "unsupported_without_required_target",
3407
- "status_label": "unsupported",
3408
- "scored": false,
3409
  "proxy_scored": false,
3410
- "raw": null,
3411
- "raw_text": "n/a",
3412
- "normalized_score": null,
3413
  "metric_key": "f1",
3414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
3415
- "scope": "multi_episode_128_metadata_baseline",
3416
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone"
3417
  },
3418
  {
3419
  "task_number": 12,
3420
  "task_id": "misalignment_detection",
3421
  "task_label": "Multimodal Synchronization Detection",
3422
  "series_id": "metadata128_neural_mlp",
3423
- "method": "128ep Metadata NN",
3424
- "status": "not_supported_by_metadata_only_package",
3425
- "status_label": "not supported",
3426
- "scored": false,
3427
  "proxy_scored": false,
3428
- "raw": null,
3429
- "raw_text": "n/a",
3430
- "normalized_score": null,
3431
  "metric_key": "f1",
3432
- "source": null,
3433
- "scope": "multi_episode_128_metadata_baseline",
3434
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3435
  },
3436
  {
3437
  "task_number": 12,
@@ -3528,7 +3528,7 @@
3528
  "task_id": "long_horizon_next_action",
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
- "method": "128ep Metadata Simple",
3532
  "status": "scored",
3533
  "status_label": "scored",
3534
  "scored": true,
@@ -3538,7 +3538,7 @@
3538
  "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
- "scope": "multi_episode_128_metadata_baseline",
3542
  "reason": null
3543
  },
3544
  {
@@ -3546,7 +3546,7 @@
3546
  "task_id": "long_horizon_next_action",
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
- "method": "128ep Metadata NN",
3550
  "status": "scored",
3551
  "status_label": "scored",
3552
  "scored": true,
@@ -3556,7 +3556,7 @@
3556
  "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
- "scope": "multi_episode_128_metadata_baseline",
3560
  "reason": null
3561
  },
3562
  {
@@ -3654,7 +3654,7 @@
3654
  "task_id": "next_subtask_forecast",
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
- "method": "128ep Metadata Simple",
3658
  "status": "scored",
3659
  "status_label": "scored",
3660
  "scored": true,
@@ -3664,7 +3664,7 @@
3664
  "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
- "scope": "multi_episode_128_metadata_baseline",
3668
  "reason": null
3669
  },
3670
  {
@@ -3672,7 +3672,7 @@
3672
  "task_id": "next_subtask_forecast",
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
- "method": "128ep Metadata NN",
3676
  "status": "scored",
3677
  "status_label": "scored",
3678
  "scored": true,
@@ -3682,7 +3682,7 @@
3682
  "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
- "scope": "multi_episode_128_metadata_baseline",
3686
  "reason": null
3687
  },
3688
  {
@@ -3780,7 +3780,7 @@
3780
  "task_id": "interaction_text_prediction",
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
- "method": "128ep Metadata Simple",
3784
  "status": "unsupported_without_required_target",
3785
  "status_label": "unsupported",
3786
  "scored": false,
@@ -3790,7 +3790,7 @@
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
- "scope": "multi_episode_128_metadata_baseline",
3794
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
@@ -3798,7 +3798,7 @@
3798
  "task_id": "interaction_text_prediction",
3799
  "task_label": "Interaction Text Prediction",
3800
  "series_id": "metadata128_neural_mlp",
3801
- "method": "128ep Metadata NN",
3802
  "status": "not_supported_by_metadata_only_package",
3803
  "status_label": "not supported",
3804
  "scored": false,
@@ -3808,8 +3808,8 @@
3808
  "normalized_score": null,
3809
  "metric_key": "macro_f1",
3810
  "source": null,
3811
- "scope": "multi_episode_128_metadata_baseline",
3812
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3813
  },
3814
  {
3815
  "task_number": 15,
@@ -3906,7 +3906,7 @@
3906
  "task_id": "action_object_relation",
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
- "method": "128ep Metadata Simple",
3910
  "status": "scored",
3911
  "status_label": "scored",
3912
  "scored": true,
@@ -3916,7 +3916,7 @@
3916
  "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
- "scope": "multi_episode_128_metadata_baseline",
3920
  "reason": null
3921
  },
3922
  {
@@ -3924,7 +3924,7 @@
3924
  "task_id": "action_object_relation",
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
- "method": "128ep Metadata NN",
3928
  "status": "scored",
3929
  "status_label": "scored",
3930
  "scored": true,
@@ -3934,7 +3934,7 @@
3934
  "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
- "scope": "multi_episode_128_metadata_baseline",
3938
  "reason": null
3939
  },
3940
  {
@@ -4032,7 +4032,7 @@
4032
  "task_id": "object_set_forecast",
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
- "method": "128ep Metadata Simple",
4036
  "status": "scored",
4037
  "status_label": "scored",
4038
  "scored": true,
@@ -4042,7 +4042,7 @@
4042
  "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
- "scope": "multi_episode_128_metadata_baseline",
4046
  "reason": null
4047
  },
4048
  {
@@ -4050,7 +4050,7 @@
4050
  "task_id": "object_set_forecast",
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
- "method": "128ep Metadata NN",
4054
  "status": "scored",
4055
  "status_label": "scored",
4056
  "scored": true,
@@ -4060,7 +4060,7 @@
4060
  "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
- "scope": "multi_episode_128_metadata_baseline",
4064
  "reason": null
4065
  },
4066
  {
@@ -4158,36 +4158,36 @@
4158
  "task_id": "imu_to_hand_pose",
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
- "method": "128ep Metadata Simple",
4162
- "status": "unsupported_without_required_target",
4163
- "status_label": "unsupported",
4164
- "scored": false,
4165
  "proxy_scored": false,
4166
- "raw": null,
4167
- "raw_text": "n/a",
4168
- "normalized_score": null,
4169
  "metric_key": "mae",
4170
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
- "scope": "multi_episode_128_metadata_baseline",
4172
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
4173
  },
4174
  {
4175
  "task_number": 18,
4176
  "task_id": "imu_to_hand_pose",
4177
  "task_label": "IMU-to-Hand Pose Reconstruction",
4178
  "series_id": "metadata128_neural_mlp",
4179
- "method": "128ep Metadata NN",
4180
- "status": "not_supported_by_metadata_only_package",
4181
- "status_label": "not supported",
4182
- "scored": false,
4183
  "proxy_scored": false,
4184
- "raw": null,
4185
- "raw_text": "n/a",
4186
- "normalized_score": null,
4187
  "metric_key": "mae",
4188
- "source": null,
4189
- "scope": "multi_episode_128_metadata_baseline",
4190
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4191
  },
4192
  {
4193
  "task_number": 18,
@@ -4284,7 +4284,7 @@
4284
  "task_id": "camera_view_sync_retrieval",
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
- "method": "128ep Metadata Simple",
4288
  "status": "unsupported_without_required_target",
4289
  "status_label": "unsupported",
4290
  "scored": false,
@@ -4294,7 +4294,7 @@
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
- "scope": "multi_episode_128_metadata_baseline",
4298
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
@@ -4302,7 +4302,7 @@
4302
  "task_id": "camera_view_sync_retrieval",
4303
  "task_label": "Camera-View Synchronization Retrieval",
4304
  "series_id": "metadata128_neural_mlp",
4305
- "method": "128ep Metadata NN",
4306
  "status": "not_supported_by_metadata_only_package",
4307
  "status_label": "not supported",
4308
  "scored": false,
@@ -4312,8 +4312,8 @@
4312
  "normalized_score": null,
4313
  "metric_key": "mrr",
4314
  "source": null,
4315
- "scope": "multi_episode_128_metadata_baseline",
4316
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4317
  },
4318
  {
4319
  "task_number": 19,
@@ -4410,7 +4410,7 @@
4410
  "task_id": "time_to_transition",
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
- "method": "128ep Metadata Simple",
4414
  "status": "scored",
4415
  "status_label": "scored",
4416
  "scored": true,
@@ -4420,7 +4420,7 @@
4420
  "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
- "scope": "multi_episode_128_metadata_baseline",
4424
  "reason": null
4425
  },
4426
  {
@@ -4428,7 +4428,7 @@
4428
  "task_id": "time_to_transition",
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
- "method": "128ep Metadata NN",
4432
  "status": "scored",
4433
  "status_label": "scored",
4434
  "scored": true,
@@ -4438,7 +4438,7 @@
4438
  "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
- "scope": "multi_episode_128_metadata_baseline",
4442
  "reason": null
4443
  },
4444
  {
 
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
+ "scored_method_task_count": 103,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
 
21
  "series": [
22
  {
23
  "id": "metadata128_simple",
24
+ "label": "128ep Aligned Simple",
25
  "short_label": "128-S",
26
  "color": "#ffd166",
27
+ "kind": "partial_128_episode_aligned_baseline",
28
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
29
  "stroke_dasharray": "9 6",
30
+ "method_detail": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
+ "scored_task_count": 18,
34
+ "covered_task_count": 18,
35
  "proxy_scored_task_count": 0,
36
+ "scoreless_task_count": 2,
37
+ "unsupported_task_count": 2,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
+ "scored": 18,
41
+ "unsupported_without_required_target": 2
42
  },
43
+ "coverage_fraction": 0.9,
44
  "result_record_fraction": 1.0
45
  },
46
  {
47
  "id": "metadata128_neural_mlp",
48
+ "label": "128ep Aligned NN",
49
  "short_label": "128-NN",
50
  "color": "#f472b6",
51
+ "kind": "partial_128_episode_aligned_baseline",
52
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
53
  "stroke_dasharray": "3 6",
54
+ "method_detail": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
+ "scored_task_count": 18,
58
+ "covered_task_count": 18,
59
  "proxy_scored_task_count": 0,
60
+ "scoreless_task_count": 2,
61
+ "unsupported_task_count": 2,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
+ "not_supported_by_metadata_only_package": 2,
65
+ "scored": 18
66
  },
67
+ "coverage_fraction": 0.9,
68
  "result_record_fraction": 1.0
69
  },
70
  {
 
205
  "raw": 0.008252821966746326,
206
  "metric_key": "macro_f1",
207
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
208
+ "scope": "multi_episode_128_aligned_baseline",
209
  "status": "scored",
210
  "reason": null,
211
  "normalized_score": 0.008252821966746326,
 
216
  "raw": 0.004175793689174209,
217
  "metric_key": "macro_f1",
218
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
219
+ "scope": "multi_episode_128_aligned_baseline",
220
  "status": "scored",
221
  "reason": null,
222
  "normalized_score": 0.004175793689174209,
 
296
  "raw": 0.00019512195121951218,
297
  "metric_key": "macro_f1",
298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
299
+ "scope": "multi_episode_128_aligned_baseline",
300
  "status": "scored",
301
  "reason": null,
302
  "normalized_score": 0.00019512195121951218,
 
307
  "raw": 7.207207207207208e-05,
308
  "metric_key": "macro_f1",
309
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
310
+ "scope": "multi_episode_128_aligned_baseline",
311
  "status": "scored",
312
  "reason": null,
313
  "normalized_score": 7.207207207207208e-05,
 
387
  "raw": 0.29652162550029315,
388
  "metric_key": "macro_f1",
389
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
390
+ "scope": "multi_episode_128_aligned_baseline",
391
  "status": "scored",
392
  "reason": null,
393
  "normalized_score": 0.29652162550029315,
 
398
  "raw": 0.4841733292368365,
399
  "metric_key": "macro_f1",
400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
401
+ "scope": "multi_episode_128_aligned_baseline",
402
  "status": "scored",
403
  "reason": null,
404
  "normalized_score": 0.4841733292368365,
 
478
  "raw": 0.006514774539765508,
479
  "metric_key": "macro_f1",
480
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
481
+ "scope": "multi_episode_128_aligned_baseline",
482
  "status": "scored",
483
  "reason": null,
484
  "normalized_score": 0.006514774539765508,
 
489
  "raw": 0.004910507980164745,
490
  "metric_key": "macro_f1",
491
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
492
+ "scope": "multi_episode_128_aligned_baseline",
493
  "status": "scored",
494
  "reason": null,
495
  "normalized_score": 0.004910507980164745,
 
566
  "raw128_proxy_axis": false,
567
  "values": {
568
  "metadata128_simple": {
569
+ "raw": 8.817333221435547,
570
  "metric_key": "mpjpe",
571
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
572
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
573
+ "status": "scored",
574
+ "reason": null,
575
+ "normalized_score": 0.012231610603598841,
576
+ "raw_text": "8.817",
577
+ "status_label": "scored"
578
  },
579
  "metadata128_neural_mlp": {
580
+ "raw": 0.429434210062027,
581
  "metric_key": "mpjpe",
582
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
583
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
584
+ "status": "scored",
585
+ "reason": null,
586
+ "normalized_score": 0.25114484128127007,
587
+ "raw_text": "0.4294",
588
+ "status_label": "scored"
589
  },
590
  "raw128_simple": {
591
  "raw": 0.2729249894618988,
 
660
  "raw": 0.4381481308057444,
661
  "metric_key": "macro_f1",
662
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
663
+ "scope": "multi_episode_128_aligned_baseline",
664
  "status": "scored",
665
  "reason": null,
666
  "normalized_score": 0.4381481308057444,
 
671
  "raw": 0.5682695682695682,
672
  "metric_key": "macro_f1",
673
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
674
+ "scope": "multi_episode_128_aligned_baseline",
675
  "status": "scored",
676
  "reason": null,
677
  "normalized_score": 0.5682695682695682,
 
751
  "raw": 0.17764578833693304,
752
  "metric_key": "micro_f1",
753
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
754
+ "scope": "multi_episode_128_aligned_baseline",
755
  "status": "scored",
756
  "reason": null,
757
  "normalized_score": 0.17764578833693304,
 
762
  "raw": 0.18662723837686876,
763
  "metric_key": "micro_f1",
764
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
765
+ "scope": "multi_episode_128_aligned_baseline",
766
  "status": "scored",
767
  "reason": null,
768
  "normalized_score": 0.18662723837686876,
 
842
  "raw": 0.002332374220713973,
843
  "metric_key": "mrr",
844
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
845
+ "scope": "multi_episode_128_aligned_baseline",
846
  "status": "scored",
847
  "reason": null,
848
  "normalized_score": 0.002332374220713973,
 
853
  "raw": 0.008236799389123917,
854
  "metric_key": "mrr",
855
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
856
+ "scope": "multi_episode_128_aligned_baseline",
857
  "status": "scored",
858
  "reason": null,
859
  "normalized_score": 0.008236799389123917,
 
930
  "raw128_proxy_axis": false,
931
  "values": {
932
  "metadata128_simple": {
933
+ "raw": 0.002587692579254508,
934
  "metric_key": "mrr",
935
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
936
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
937
+ "status": "scored",
938
+ "reason": null,
939
+ "normalized_score": 0.002587692579254508,
940
+ "raw_text": "0.0026",
941
+ "status_label": "scored"
942
  },
943
  "metadata128_neural_mlp": {
944
+ "raw": 0.0026067993603646755,
945
  "metric_key": "mrr",
946
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
947
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
948
+ "status": "scored",
949
+ "reason": null,
950
+ "normalized_score": 0.0026067993603646755,
951
+ "raw_text": "0.0026",
952
+ "status_label": "scored"
953
  },
954
  "raw128_simple": {
955
  "raw": 0.003459817497059703,
 
1021
  "raw128_proxy_axis": false,
1022
  "values": {
1023
  "metadata128_simple": {
1024
+ "raw": -190.66106203944798,
1025
  "metric_key": "r2",
1026
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1027
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1028
+ "status": "scored",
1029
+ "reason": null,
1030
+ "normalized_score": 0.0,
1031
+ "raw_text": "-190.66",
1032
+ "status_label": "scored"
1033
  },
1034
  "metadata128_neural_mlp": {
1035
+ "raw": -0.43481132003942147,
1036
  "metric_key": "r2",
1037
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
1038
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1039
+ "status": "scored",
1040
+ "reason": null,
1041
+ "normalized_score": 0.0,
1042
+ "raw_text": "-0.4348",
1043
+ "status_label": "scored"
1044
  },
1045
  "raw128_simple": {
1046
  "raw": -1.3450960391924882,
 
1115
  "raw": 0.4198864140782312,
1116
  "metric_key": "f1",
1117
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1118
+ "scope": "multi_episode_128_aligned_baseline",
1119
  "status": "scored",
1120
  "reason": null,
1121
  "normalized_score": 0.4198864140782312,
 
1126
  "raw": 0.8252408266656923,
1127
  "metric_key": "f1",
1128
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1129
+ "scope": "multi_episode_128_aligned_baseline",
1130
  "status": "scored",
1131
  "reason": null,
1132
  "normalized_score": 0.8252408266656923,
 
1203
  "raw128_proxy_axis": false,
1204
  "values": {
1205
  "metadata128_simple": {
1206
+ "raw": 0.49980060227663614,
1207
  "metric_key": "f1",
1208
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1209
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1210
+ "status": "scored",
1211
+ "reason": null,
1212
+ "normalized_score": 0.49980060227663614,
1213
+ "raw_text": "0.4998",
1214
+ "status_label": "scored"
1215
  },
1216
  "metadata128_neural_mlp": {
1217
+ "raw": 0.7773773780941162,
1218
  "metric_key": "f1",
1219
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
1220
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1221
+ "status": "scored",
1222
+ "reason": null,
1223
+ "normalized_score": 0.7773773780941162,
1224
+ "raw_text": "0.7774",
1225
+ "status_label": "scored"
1226
  },
1227
  "raw128_simple": {
1228
  "raw": 0.4958867673901769,
 
1297
  "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
+ "scope": "multi_episode_128_aligned_baseline",
1301
  "status": "scored",
1302
  "reason": null,
1303
  "normalized_score": 0.004579592783699693,
 
1308
  "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
+ "scope": "multi_episode_128_aligned_baseline",
1312
  "status": "scored",
1313
  "reason": null,
1314
  "normalized_score": 0.0029821307969142615,
 
1388
  "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
+ "scope": "multi_episode_128_aligned_baseline",
1392
  "status": "scored",
1393
  "reason": null,
1394
  "normalized_score": 0.0001206030150753769,
 
1399
  "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
+ "scope": "multi_episode_128_aligned_baseline",
1403
  "status": "scored",
1404
  "reason": null,
1405
  "normalized_score": 2.086049543676662e-05,
 
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
+ "scope": "multi_episode_128_aligned_baseline",
1483
  "status": "unsupported_without_required_target",
1484
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
 
1490
  "raw": null,
1491
  "metric_key": "macro_f1",
1492
  "source": null,
1493
+ "scope": "multi_episode_128_aligned_baseline",
1494
  "status": "not_supported_by_metadata_only_package",
1495
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1496
  "normalized_score": null,
1497
  "raw_text": "n/a",
1498
  "status_label": "not supported"
 
1570
  "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
+ "scope": "multi_episode_128_aligned_baseline",
1574
  "status": "scored",
1575
  "reason": null,
1576
  "normalized_score": 0.0,
 
1581
  "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
+ "scope": "multi_episode_128_aligned_baseline",
1585
  "status": "scored",
1586
  "reason": null,
1587
  "normalized_score": 0.0,
 
1661
  "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
+ "scope": "multi_episode_128_aligned_baseline",
1665
  "status": "scored",
1666
  "reason": null,
1667
  "normalized_score": 0.17656983343047333,
 
1672
  "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
+ "scope": "multi_episode_128_aligned_baseline",
1676
  "status": "scored",
1677
  "reason": null,
1678
  "normalized_score": 0.17418550827844048,
 
1749
  "raw128_proxy_axis": false,
1750
  "values": {
1751
  "metadata128_simple": {
1752
+ "raw": 0.2294670194387436,
1753
  "metric_key": "mae",
1754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1756
+ "status": "scored",
1757
+ "reason": null,
1758
+ "normalized_score": 0.18324815505876868,
1759
+ "raw_text": "0.2295",
1760
+ "status_label": "scored"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
+ "raw": 0.2555866539478302,
1764
  "metric_key": "mae",
1765
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
1766
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1767
+ "status": "scored",
1768
+ "reason": null,
1769
+ "normalized_score": 0.16452114110609004,
1770
+ "raw_text": "0.2556",
1771
+ "status_label": "scored"
1772
  },
1773
  "raw128_simple": {
1774
  "raw": 0.22941437363624573,
 
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
+ "scope": "multi_episode_128_aligned_baseline",
1847
  "status": "unsupported_without_required_target",
1848
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
 
1854
  "raw": null,
1855
  "metric_key": "mrr",
1856
  "source": null,
1857
+ "scope": "multi_episode_128_aligned_baseline",
1858
  "status": "not_supported_by_metadata_only_package",
1859
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1860
  "normalized_score": null,
1861
  "raw_text": "n/a",
1862
  "status_label": "not supported"
 
1934
  "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
+ "scope": "multi_episode_128_aligned_baseline",
1938
  "status": "scored",
1939
  "reason": null,
1940
  "normalized_score": 0.016864874132806403,
 
1945
  "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
+ "scope": "multi_episode_128_aligned_baseline",
1949
  "status": "scored",
1950
  "reason": null,
1951
  "normalized_score": 0.25411768748242325,
 
2016
  "task_id": "timeline_action",
2017
  "task_label": "Action Recognition",
2018
  "series_id": "metadata128_simple",
2019
+ "method": "128ep Aligned Simple",
2020
  "status": "scored",
2021
  "status_label": "scored",
2022
  "scored": true,
 
2026
  "normalized_score": 0.008252821966746326,
2027
  "metric_key": "macro_f1",
2028
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2029
+ "scope": "multi_episode_128_aligned_baseline",
2030
  "reason": null
2031
  },
2032
  {
 
2034
  "task_id": "timeline_action",
2035
  "task_label": "Action Recognition",
2036
  "series_id": "metadata128_neural_mlp",
2037
+ "method": "128ep Aligned NN",
2038
  "status": "scored",
2039
  "status_label": "scored",
2040
  "scored": true,
 
2044
  "normalized_score": 0.004175793689174209,
2045
  "metric_key": "macro_f1",
2046
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2047
+ "scope": "multi_episode_128_aligned_baseline",
2048
  "reason": null
2049
  },
2050
  {
 
2142
  "task_id": "timeline_subtask",
2143
  "task_label": "Procedure Step Recognition",
2144
  "series_id": "metadata128_simple",
2145
+ "method": "128ep Aligned Simple",
2146
  "status": "scored",
2147
  "status_label": "scored",
2148
  "scored": true,
 
2152
  "normalized_score": 0.00019512195121951218,
2153
  "metric_key": "macro_f1",
2154
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2155
+ "scope": "multi_episode_128_aligned_baseline",
2156
  "reason": null
2157
  },
2158
  {
 
2160
  "task_id": "timeline_subtask",
2161
  "task_label": "Procedure Step Recognition",
2162
  "series_id": "metadata128_neural_mlp",
2163
+ "method": "128ep Aligned NN",
2164
  "status": "scored",
2165
  "status_label": "scored",
2166
  "scored": true,
 
2170
  "normalized_score": 7.207207207207208e-05,
2171
  "metric_key": "macro_f1",
2172
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2173
+ "scope": "multi_episode_128_aligned_baseline",
2174
  "reason": null
2175
  },
2176
  {
 
2268
  "task_id": "transition_detection",
2269
  "task_label": "Action Boundary Detection",
2270
  "series_id": "metadata128_simple",
2271
+ "method": "128ep Aligned Simple",
2272
  "status": "scored",
2273
  "status_label": "scored",
2274
  "scored": true,
 
2278
  "normalized_score": 0.29652162550029315,
2279
  "metric_key": "macro_f1",
2280
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2281
+ "scope": "multi_episode_128_aligned_baseline",
2282
  "reason": null
2283
  },
2284
  {
 
2286
  "task_id": "transition_detection",
2287
  "task_label": "Action Boundary Detection",
2288
  "series_id": "metadata128_neural_mlp",
2289
+ "method": "128ep Aligned NN",
2290
  "status": "scored",
2291
  "status_label": "scored",
2292
  "scored": true,
 
2296
  "normalized_score": 0.4841733292368365,
2297
  "metric_key": "macro_f1",
2298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2299
+ "scope": "multi_episode_128_aligned_baseline",
2300
  "reason": null
2301
  },
2302
  {
 
2394
  "task_id": "next_action",
2395
  "task_label": "Next-Action Prediction",
2396
  "series_id": "metadata128_simple",
2397
+ "method": "128ep Aligned Simple",
2398
  "status": "scored",
2399
  "status_label": "scored",
2400
  "scored": true,
 
2404
  "normalized_score": 0.006514774539765508,
2405
  "metric_key": "macro_f1",
2406
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
2407
+ "scope": "multi_episode_128_aligned_baseline",
2408
  "reason": null
2409
  },
2410
  {
 
2412
  "task_id": "next_action",
2413
  "task_label": "Next-Action Prediction",
2414
  "series_id": "metadata128_neural_mlp",
2415
+ "method": "128ep Aligned NN",
2416
  "status": "scored",
2417
  "status_label": "scored",
2418
  "scored": true,
 
2422
  "normalized_score": 0.004910507980164745,
2423
  "metric_key": "macro_f1",
2424
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
2425
+ "scope": "multi_episode_128_aligned_baseline",
2426
  "reason": null
2427
  },
2428
  {
 
2520
  "task_id": "hand_trajectory_forecast",
2521
  "task_label": "Hand Trajectory Forecasting",
2522
  "series_id": "metadata128_simple",
2523
+ "method": "128ep Aligned Simple",
2524
+ "status": "scored",
2525
+ "status_label": "scored",
2526
+ "scored": true,
2527
  "proxy_scored": false,
2528
+ "raw": 8.817333221435547,
2529
+ "raw_text": "8.817",
2530
+ "normalized_score": 0.012231610603598841,
2531
  "metric_key": "mpjpe",
2532
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
2533
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2534
+ "reason": null
2535
  },
2536
  {
2537
  "task_number": 5,
2538
  "task_id": "hand_trajectory_forecast",
2539
  "task_label": "Hand Trajectory Forecasting",
2540
  "series_id": "metadata128_neural_mlp",
2541
+ "method": "128ep Aligned NN",
2542
+ "status": "scored",
2543
+ "status_label": "scored",
2544
+ "scored": true,
2545
  "proxy_scored": false,
2546
+ "raw": 0.429434210062027,
2547
+ "raw_text": "0.4294",
2548
+ "normalized_score": 0.25114484128127007,
2549
  "metric_key": "mpjpe",
2550
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
2551
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2552
+ "reason": null
2553
  },
2554
  {
2555
  "task_number": 5,
 
2646
  "task_id": "contact_prediction",
2647
  "task_label": "Contact State Prediction",
2648
  "series_id": "metadata128_simple",
2649
+ "method": "128ep Aligned Simple",
2650
  "status": "scored",
2651
  "status_label": "scored",
2652
  "scored": true,
 
2656
  "normalized_score": 0.4381481308057444,
2657
  "metric_key": "macro_f1",
2658
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
2659
+ "scope": "multi_episode_128_aligned_baseline",
2660
  "reason": null
2661
  },
2662
  {
 
2664
  "task_id": "contact_prediction",
2665
  "task_label": "Contact State Prediction",
2666
  "series_id": "metadata128_neural_mlp",
2667
+ "method": "128ep Aligned NN",
2668
  "status": "scored",
2669
  "status_label": "scored",
2670
  "scored": true,
 
2674
  "normalized_score": 0.5682695682695682,
2675
  "metric_key": "macro_f1",
2676
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
2677
+ "scope": "multi_episode_128_aligned_baseline",
2678
  "reason": null
2679
  },
2680
  {
 
2772
  "task_id": "object_relevance",
2773
  "task_label": "Object Relevance Prediction",
2774
  "series_id": "metadata128_simple",
2775
+ "method": "128ep Aligned Simple",
2776
  "status": "scored",
2777
  "status_label": "scored",
2778
  "scored": true,
 
2782
  "normalized_score": 0.17764578833693304,
2783
  "metric_key": "micro_f1",
2784
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
2785
+ "scope": "multi_episode_128_aligned_baseline",
2786
  "reason": null
2787
  },
2788
  {
 
2790
  "task_id": "object_relevance",
2791
  "task_label": "Object Relevance Prediction",
2792
  "series_id": "metadata128_neural_mlp",
2793
+ "method": "128ep Aligned NN",
2794
  "status": "scored",
2795
  "status_label": "scored",
2796
  "scored": true,
 
2800
  "normalized_score": 0.18662723837686876,
2801
  "metric_key": "micro_f1",
2802
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
2803
+ "scope": "multi_episode_128_aligned_baseline",
2804
  "reason": null
2805
  },
2806
  {
 
2898
  "task_id": "caption_grounding",
2899
  "task_label": "Language Grounding",
2900
  "series_id": "metadata128_simple",
2901
+ "method": "128ep Aligned Simple",
2902
  "status": "scored",
2903
  "status_label": "scored",
2904
  "scored": true,
 
2908
  "normalized_score": 0.002332374220713973,
2909
  "metric_key": "mrr",
2910
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
2911
+ "scope": "multi_episode_128_aligned_baseline",
2912
  "reason": null
2913
  },
2914
  {
 
2916
  "task_id": "caption_grounding",
2917
  "task_label": "Language Grounding",
2918
  "series_id": "metadata128_neural_mlp",
2919
+ "method": "128ep Aligned NN",
2920
  "status": "scored",
2921
  "status_label": "scored",
2922
  "scored": true,
 
2926
  "normalized_score": 0.008236799389123917,
2927
  "metric_key": "mrr",
2928
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
2929
+ "scope": "multi_episode_128_aligned_baseline",
2930
  "reason": null
2931
  },
2932
  {
 
3024
  "task_id": "cross_modal_retrieval",
3025
  "task_label": "Cross-Modal Retrieval",
3026
  "series_id": "metadata128_simple",
3027
+ "method": "128ep Aligned Simple",
3028
+ "status": "scored",
3029
+ "status_label": "scored",
3030
+ "scored": true,
3031
  "proxy_scored": false,
3032
+ "raw": 0.002587692579254508,
3033
+ "raw_text": "0.0026",
3034
+ "normalized_score": 0.002587692579254508,
3035
  "metric_key": "mrr",
3036
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3037
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3038
+ "reason": null
3039
  },
3040
  {
3041
  "task_number": 9,
3042
  "task_id": "cross_modal_retrieval",
3043
  "task_label": "Cross-Modal Retrieval",
3044
  "series_id": "metadata128_neural_mlp",
3045
+ "method": "128ep Aligned NN",
3046
+ "status": "scored",
3047
+ "status_label": "scored",
3048
+ "scored": true,
3049
  "proxy_scored": false,
3050
+ "raw": 0.0026067993603646755,
3051
+ "raw_text": "0.0026",
3052
+ "normalized_score": 0.0026067993603646755,
3053
  "metric_key": "mrr",
3054
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
3055
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3056
+ "reason": null
3057
  },
3058
  {
3059
  "task_number": 9,
 
3150
  "task_id": "modality_reconstruction",
3151
  "task_label": "Cross-Modal Reconstruction",
3152
  "series_id": "metadata128_simple",
3153
+ "method": "128ep Aligned Simple",
3154
+ "status": "scored",
3155
+ "status_label": "scored",
3156
+ "scored": true,
3157
  "proxy_scored": false,
3158
+ "raw": -190.66106203944798,
3159
+ "raw_text": "-190.66",
3160
+ "normalized_score": 0.0,
3161
  "metric_key": "r2",
3162
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
3163
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3164
+ "reason": null
3165
  },
3166
  {
3167
  "task_number": 10,
3168
  "task_id": "modality_reconstruction",
3169
  "task_label": "Cross-Modal Reconstruction",
3170
  "series_id": "metadata128_neural_mlp",
3171
+ "method": "128ep Aligned NN",
3172
+ "status": "scored",
3173
+ "status_label": "scored",
3174
+ "scored": true,
3175
  "proxy_scored": false,
3176
+ "raw": -0.43481132003942147,
3177
+ "raw_text": "-0.4348",
3178
+ "normalized_score": 0.0,
3179
  "metric_key": "r2",
3180
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
3181
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3182
+ "reason": null
3183
  },
3184
  {
3185
  "task_number": 10,
 
3276
  "task_id": "temporal_order",
3277
  "task_label": "Temporal Order Verification",
3278
  "series_id": "metadata128_simple",
3279
+ "method": "128ep Aligned Simple",
3280
  "status": "scored",
3281
  "status_label": "scored",
3282
  "scored": true,
 
3286
  "normalized_score": 0.4198864140782312,
3287
  "metric_key": "f1",
3288
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
3289
+ "scope": "multi_episode_128_aligned_baseline",
3290
  "reason": null
3291
  },
3292
  {
 
3294
  "task_id": "temporal_order",
3295
  "task_label": "Temporal Order Verification",
3296
  "series_id": "metadata128_neural_mlp",
3297
+ "method": "128ep Aligned NN",
3298
  "status": "scored",
3299
  "status_label": "scored",
3300
  "scored": true,
 
3304
  "normalized_score": 0.8252408266656923,
3305
  "metric_key": "f1",
3306
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
3307
+ "scope": "multi_episode_128_aligned_baseline",
3308
  "reason": null
3309
  },
3310
  {
 
3402
  "task_id": "misalignment_detection",
3403
  "task_label": "Multimodal Synchronization Detection",
3404
  "series_id": "metadata128_simple",
3405
+ "method": "128ep Aligned Simple",
3406
+ "status": "scored",
3407
+ "status_label": "scored",
3408
+ "scored": true,
3409
  "proxy_scored": false,
3410
+ "raw": 0.49980060227663614,
3411
+ "raw_text": "0.4998",
3412
+ "normalized_score": 0.49980060227663614,
3413
  "metric_key": "f1",
3414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
3415
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3416
+ "reason": null
3417
  },
3418
  {
3419
  "task_number": 12,
3420
  "task_id": "misalignment_detection",
3421
  "task_label": "Multimodal Synchronization Detection",
3422
  "series_id": "metadata128_neural_mlp",
3423
+ "method": "128ep Aligned NN",
3424
+ "status": "scored",
3425
+ "status_label": "scored",
3426
+ "scored": true,
3427
  "proxy_scored": false,
3428
+ "raw": 0.7773773780941162,
3429
+ "raw_text": "0.7774",
3430
+ "normalized_score": 0.7773773780941162,
3431
  "metric_key": "f1",
3432
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
3433
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3434
+ "reason": null
3435
  },
3436
  {
3437
  "task_number": 12,
 
3528
  "task_id": "long_horizon_next_action",
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
+ "method": "128ep Aligned Simple",
3532
  "status": "scored",
3533
  "status_label": "scored",
3534
  "scored": true,
 
3538
  "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
+ "scope": "multi_episode_128_aligned_baseline",
3542
  "reason": null
3543
  },
3544
  {
 
3546
  "task_id": "long_horizon_next_action",
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
+ "method": "128ep Aligned NN",
3550
  "status": "scored",
3551
  "status_label": "scored",
3552
  "scored": true,
 
3556
  "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
+ "scope": "multi_episode_128_aligned_baseline",
3560
  "reason": null
3561
  },
3562
  {
 
3654
  "task_id": "next_subtask_forecast",
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
+ "method": "128ep Aligned Simple",
3658
  "status": "scored",
3659
  "status_label": "scored",
3660
  "scored": true,
 
3664
  "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
+ "scope": "multi_episode_128_aligned_baseline",
3668
  "reason": null
3669
  },
3670
  {
 
3672
  "task_id": "next_subtask_forecast",
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
+ "method": "128ep Aligned NN",
3676
  "status": "scored",
3677
  "status_label": "scored",
3678
  "scored": true,
 
3682
  "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
+ "scope": "multi_episode_128_aligned_baseline",
3686
  "reason": null
3687
  },
3688
  {
 
3780
  "task_id": "interaction_text_prediction",
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
+ "method": "128ep Aligned Simple",
3784
  "status": "unsupported_without_required_target",
3785
  "status_label": "unsupported",
3786
  "scored": false,
 
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
+ "scope": "multi_episode_128_aligned_baseline",
3794
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
 
3798
  "task_id": "interaction_text_prediction",
3799
  "task_label": "Interaction Text Prediction",
3800
  "series_id": "metadata128_neural_mlp",
3801
+ "method": "128ep Aligned NN",
3802
  "status": "not_supported_by_metadata_only_package",
3803
  "status_label": "not supported",
3804
  "scored": false,
 
3808
  "normalized_score": null,
3809
  "metric_key": "macro_f1",
3810
  "source": null,
3811
+ "scope": "multi_episode_128_aligned_baseline",
3812
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
3813
  },
3814
  {
3815
  "task_number": 15,
 
3906
  "task_id": "action_object_relation",
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
+ "method": "128ep Aligned Simple",
3910
  "status": "scored",
3911
  "status_label": "scored",
3912
  "scored": true,
 
3916
  "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
+ "scope": "multi_episode_128_aligned_baseline",
3920
  "reason": null
3921
  },
3922
  {
 
3924
  "task_id": "action_object_relation",
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
+ "method": "128ep Aligned NN",
3928
  "status": "scored",
3929
  "status_label": "scored",
3930
  "scored": true,
 
3934
  "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
+ "scope": "multi_episode_128_aligned_baseline",
3938
  "reason": null
3939
  },
3940
  {
 
4032
  "task_id": "object_set_forecast",
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
+ "method": "128ep Aligned Simple",
4036
  "status": "scored",
4037
  "status_label": "scored",
4038
  "scored": true,
 
4042
  "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
+ "scope": "multi_episode_128_aligned_baseline",
4046
  "reason": null
4047
  },
4048
  {
 
4050
  "task_id": "object_set_forecast",
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
+ "method": "128ep Aligned NN",
4054
  "status": "scored",
4055
  "status_label": "scored",
4056
  "scored": true,
 
4060
  "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
+ "scope": "multi_episode_128_aligned_baseline",
4064
  "reason": null
4065
  },
4066
  {
 
4158
  "task_id": "imu_to_hand_pose",
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
+ "method": "128ep Aligned Simple",
4162
+ "status": "scored",
4163
+ "status_label": "scored",
4164
+ "scored": true,
4165
  "proxy_scored": false,
4166
+ "raw": 0.2294670194387436,
4167
+ "raw_text": "0.2295",
4168
+ "normalized_score": 0.18324815505876868,
4169
  "metric_key": "mae",
4170
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4172
+ "reason": null
4173
  },
4174
  {
4175
  "task_number": 18,
4176
  "task_id": "imu_to_hand_pose",
4177
  "task_label": "IMU-to-Hand Pose Reconstruction",
4178
  "series_id": "metadata128_neural_mlp",
4179
+ "method": "128ep Aligned NN",
4180
+ "status": "scored",
4181
+ "status_label": "scored",
4182
+ "scored": true,
4183
  "proxy_scored": false,
4184
+ "raw": 0.2555866539478302,
4185
+ "raw_text": "0.2556",
4186
+ "normalized_score": 0.16452114110609004,
4187
  "metric_key": "mae",
4188
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
4189
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4190
+ "reason": null
4191
  },
4192
  {
4193
  "task_number": 18,
 
4284
  "task_id": "camera_view_sync_retrieval",
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
+ "method": "128ep Aligned Simple",
4288
  "status": "unsupported_without_required_target",
4289
  "status_label": "unsupported",
4290
  "scored": false,
 
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
+ "scope": "multi_episode_128_aligned_baseline",
4298
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
 
4302
  "task_id": "camera_view_sync_retrieval",
4303
  "task_label": "Camera-View Synchronization Retrieval",
4304
  "series_id": "metadata128_neural_mlp",
4305
+ "method": "128ep Aligned NN",
4306
  "status": "not_supported_by_metadata_only_package",
4307
  "status_label": "not supported",
4308
  "scored": false,
 
4312
  "normalized_score": null,
4313
  "metric_key": "mrr",
4314
  "source": null,
4315
+ "scope": "multi_episode_128_aligned_baseline",
4316
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
4317
  },
4318
  {
4319
  "task_number": 19,
 
4410
  "task_id": "time_to_transition",
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
+ "method": "128ep Aligned Simple",
4414
  "status": "scored",
4415
  "status_label": "scored",
4416
  "scored": true,
 
4420
  "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
+ "scope": "multi_episode_128_aligned_baseline",
4424
  "reason": null
4425
  },
4426
  {
 
4428
  "task_id": "time_to_transition",
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
+ "method": "128ep Aligned NN",
4432
  "status": "scored",
4433
  "status_label": "scored",
4434
  "scored": true,
 
4438
  "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
+ "scope": "multi_episode_128_aligned_baseline",
4442
  "reason": null
4443
  },
4444
  {
data/mirror_parity.json CHANGED
The diff for this file is too large to render. See raw diff
 
data/omni_model_comparison.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
- "generated_at_utc": "2026-06-13T18:14:42+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
 
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
+ "generated_at_utc": "2026-06-18T12:52:47+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
data/public_surface_qa.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
@@ -18,7 +18,7 @@
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
- "generated_at_utc": "2026-06-18T11:41:43+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
@@ -28,27 +28,27 @@
28
  "task_surface_integrity": {
29
  "exists": true,
30
  "status": "pass",
31
- "generated_at_utc": "2026-06-18T11:18:04+00:00"
32
  },
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
- "generated_at_utc": "2026-06-18T11:18:04+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
- "generated_at_utc": "2026-06-18T11:18:06+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
- "generated_at_utc": "2026-06-18T11:42:48+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
- "generated_at_utc": "2026-06-18T11:43:59+00:00"
52
  }
53
  },
54
  "failures": {}
 
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:53:13+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
 
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
+ "generated_at_utc": "2026-06-18T12:09:46+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
 
28
  "task_surface_integrity": {
29
  "exists": true,
30
  "status": "pass",
31
+ "generated_at_utc": "2026-06-18T12:09:25+00:00"
32
  },
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
+ "generated_at_utc": "2026-06-18T12:09:45+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
+ "generated_at_utc": "2026-06-18T12:09:48+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
+ "generated_at_utc": "2026-06-18T12:24:04+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
+ "generated_at_utc": "2026-06-18T12:24:00+00:00"
52
  }
53
  },
54
  "failures": {}
data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:10:47+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -215,8 +215,8 @@
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
- "file_count": 1321,
219
- "text_file_count": 1108,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
@@ -226,8 +226,8 @@
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
- "file_count": 1103,
230
- "text_file_count": 915,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
@@ -237,8 +237,8 @@
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
- "file_count": 2582,
241
- "text_file_count": 1121,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
@@ -248,8 +248,8 @@
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
- "file_count": 3001,
252
- "text_file_count": 1283,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T13:02:10+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
+ "file_count": 1352,
219
+ "text_file_count": 1129,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
 
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
+ "file_count": 1221,
230
+ "text_file_count": 992,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
 
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
+ "file_count": 2648,
241
+ "text_file_count": 1141,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
 
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
+ "file_count": 3112,
252
+ "text_file_count": 1309,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
data/quality_gates.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:53:13+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
data/qwen3_full_parameter_gates.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Qwen3-Omni Full-Parameter Feasibility Gates",
3
- "generated_at_utc": "2026-06-13T18:14:32+00:00",
4
  "status": "pass",
5
  "decision": "full_parameter_feasible_for_guarded_short_runs_not_promoted",
6
  "interpretation": "The full-parameter gates prove that Qwen3-Omni full-parameter FSDP can load, prepare, run backward/optimizer steps, and complete guarded pilots up to 256 optimizer steps on an 8-GPU remote worker. They do not prove a production full-parameter fine-tune, and they intentionally save no full checkpoints or public weights.",
 
1
  {
2
  "title": "Qwen3-Omni Full-Parameter Feasibility Gates",
3
+ "generated_at_utc": "2026-06-18T12:53:13+00:00",
4
  "status": "pass",
5
  "decision": "full_parameter_feasible_for_guarded_short_runs_not_promoted",
6
  "interpretation": "The full-parameter gates prove that Qwen3-Omni full-parameter FSDP can load, prepare, run backward/optimizer steps, and complete guarded pilots up to 256 optimizer steps on an 8-GPU remote worker. They do not prove a production full-parameter fine-tune, and they intentionally save no full checkpoints or public weights.",
data/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:09:48+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:54:20+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
data/single_episode_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
@@ -13,7 +13,7 @@
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
 
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
 
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
data/source_alignment_audit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:09:45+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
 
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:54:18+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
data/task_method_20_gap_audit.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "generated_at_utc": "2026-06-18T12:07:14+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
- "purpose": "Keep the 53 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
@@ -45,30 +45,29 @@
45
  }
46
  },
47
  "metadata128_neural_mlp": {
48
- "kind": "partial_128_episode_metadata_baseline",
49
- "label": "128ep Metadata NN",
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
- "scope": "128 selected episodes, JSONL metadata/text only",
53
- "scored_task_count": 7,
54
- "scoreless_task_count": 13,
55
  "status_counts": {
56
- "not_supported_by_metadata_only_package": 7,
57
- "scored": 7,
58
- "unsupported_without_required_target": 6
59
  }
60
  },
61
  "metadata128_simple": {
62
- "kind": "partial_128_episode_metadata_baseline",
63
- "label": "128ep Metadata Simple",
64
  "proxy_scored_task_count": 0,
65
  "result_record_count": 20,
66
- "scope": "128 selected episodes, JSONL metadata/text only",
67
- "scored_task_count": 13,
68
- "scoreless_task_count": 7,
69
  "status_counts": {
70
- "scored": 13,
71
- "unsupported_without_required_target": 7
72
  }
73
  },
74
  "minimal": {
@@ -138,31 +137,22 @@
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
- "metadata128_neural_mlp": 13,
142
- "metadata128_simple": 7,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
- "not_supported_by_metadata_only_package": 7,
148
- "unsupported_without_required_target": 13
149
  },
150
  "missing_by_task": {
151
- "01 Action Recognition": [
152
- "metadata128_neural_mlp"
153
- ],
154
  "02 Procedure Step Recognition": [
155
- "cosmos3_nano_future_window",
156
- "metadata128_neural_mlp"
157
- ],
158
- "04 Next-Action Prediction": [
159
- "metadata128_neural_mlp"
160
  ],
161
  "05 Hand Trajectory Forecasting": [
162
  "cosmos3_nano_future_window",
163
  "cosmos3_super_reasoner",
164
- "metadata128_neural_mlp",
165
- "metadata128_simple",
166
  "qwen3_omni_v6_lora"
167
  ],
168
  "07 Object Relevance Prediction": [
@@ -173,15 +163,11 @@
173
  "cosmos3_super_reasoner"
174
  ],
175
  "09 Cross-Modal Retrieval": [
176
- "cosmos3_super_reasoner",
177
- "metadata128_neural_mlp",
178
- "metadata128_simple"
179
  ],
180
  "10 Cross-Modal Reconstruction": [
181
  "cosmos3_nano_future_window",
182
  "cosmos3_super_reasoner",
183
- "metadata128_neural_mlp",
184
- "metadata128_simple",
185
  "qwen3_omni_v6_lora"
186
  ],
187
  "11 Temporal Order Verification": [
@@ -190,19 +176,15 @@
190
  ],
191
  "12 Multimodal Synchronization Detection": [
192
  "cosmos3_nano_future_window",
193
- "cosmos3_super_reasoner",
194
- "metadata128_neural_mlp",
195
- "metadata128_simple"
196
  ],
197
  "13 Long-Horizon Next-Action Forecasting": [
198
  "cosmos3_nano_future_window",
199
- "cosmos3_super_reasoner",
200
- "metadata128_neural_mlp"
201
  ],
202
  "14 Long-Horizon Next-Subtask Forecasting": [
203
  "cosmos3_nano_future_window",
204
- "cosmos3_super_reasoner",
205
- "metadata128_neural_mlp"
206
  ],
207
  "15 Interaction Text Prediction": [
208
  "cosmos3_nano_future_window",
@@ -212,8 +194,7 @@
212
  "qwen3_omni_v6_lora"
213
  ],
214
  "16 Action-Object Relation Prediction": [
215
- "cosmos3_nano_future_window",
216
- "metadata128_neural_mlp"
217
  ],
218
  "17 Future Object-Set Forecasting": [
219
  "cosmos3_nano_future_window",
@@ -222,8 +203,6 @@
222
  "18 IMU-to-Hand Pose Reconstruction": [
223
  "cosmos3_nano_future_window",
224
  "cosmos3_super_reasoner",
225
- "metadata128_neural_mlp",
226
- "metadata128_simple",
227
  "qwen3_omni_v6_lora"
228
  ],
229
  "19 Camera-View Synchronization Retrieval": [
@@ -239,32 +218,6 @@
239
  ]
240
  },
241
  "missing_records": [
242
- {
243
- "method": "128ep Metadata NN",
244
- "metric_key": "macro_f1",
245
- "reason": "train class count 896 exceeds --max-neural-classes 512",
246
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
247
- "scope": "multi_episode_128_metadata_baseline",
248
- "series_id": "metadata128_neural_mlp",
249
- "status": "unsupported_without_required_target",
250
- "status_label": "unsupported",
251
- "task_id": "timeline_action",
252
- "task_label": "Action Recognition",
253
- "task_number": 1
254
- },
255
- {
256
- "method": "128ep Metadata NN",
257
- "metric_key": "macro_f1",
258
- "reason": "train class count 652 exceeds --max-neural-classes 512",
259
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
260
- "scope": "multi_episode_128_metadata_baseline",
261
- "series_id": "metadata128_neural_mlp",
262
- "status": "unsupported_without_required_target",
263
- "status_label": "unsupported",
264
- "task_id": "timeline_subtask",
265
- "task_label": "Procedure Step Recognition",
266
- "task_number": 2
267
- },
268
  {
269
  "method": "Cosmos3-Nano Future Window",
270
  "metric_key": "macro_f1",
@@ -278,45 +231,6 @@
278
  "task_label": "Procedure Step Recognition",
279
  "task_number": 2
280
  },
281
- {
282
- "method": "128ep Metadata NN",
283
- "metric_key": "macro_f1",
284
- "reason": "train class count 891 exceeds --max-neural-classes 512",
285
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
286
- "scope": "multi_episode_128_metadata_baseline",
287
- "series_id": "metadata128_neural_mlp",
288
- "status": "unsupported_without_required_target",
289
- "status_label": "unsupported",
290
- "task_id": "next_action",
291
- "task_label": "Next-Action Prediction",
292
- "task_number": 4
293
- },
294
- {
295
- "method": "128ep Metadata Simple",
296
- "metric_key": "mpjpe",
297
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package",
298
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
299
- "scope": "multi_episode_128_metadata_baseline",
300
- "series_id": "metadata128_simple",
301
- "status": "unsupported_without_required_target",
302
- "status_label": "unsupported",
303
- "task_id": "hand_trajectory_forecast",
304
- "task_label": "Hand Trajectory Forecasting",
305
- "task_number": 5
306
- },
307
- {
308
- "method": "128ep Metadata NN",
309
- "metric_key": "mpjpe",
310
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
311
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
312
- "scope": "multi_episode_128_metadata_baseline",
313
- "series_id": "metadata128_neural_mlp",
314
- "status": "not_supported_by_metadata_only_package",
315
- "status_label": "not supported",
316
- "task_id": "hand_trajectory_forecast",
317
- "task_label": "Hand Trajectory Forecasting",
318
- "task_number": 5
319
- },
320
  {
321
  "method": "Qwen3-Omni v6 LoRA",
322
  "metric_key": "mpjpe",
@@ -395,32 +309,6 @@
395
  "task_label": "Language Grounding",
396
  "task_number": 8
397
  },
398
- {
399
- "method": "128ep Metadata Simple",
400
- "metric_key": "mrr",
401
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package",
402
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
403
- "scope": "multi_episode_128_metadata_baseline",
404
- "series_id": "metadata128_simple",
405
- "status": "unsupported_without_required_target",
406
- "status_label": "unsupported",
407
- "task_id": "cross_modal_retrieval",
408
- "task_label": "Cross-Modal Retrieval",
409
- "task_number": 9
410
- },
411
- {
412
- "method": "128ep Metadata NN",
413
- "metric_key": "mrr",
414
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
415
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
416
- "scope": "multi_episode_128_metadata_baseline",
417
- "series_id": "metadata128_neural_mlp",
418
- "status": "not_supported_by_metadata_only_package",
419
- "status_label": "not supported",
420
- "task_id": "cross_modal_retrieval",
421
- "task_label": "Cross-Modal Retrieval",
422
- "task_number": 9
423
- },
424
  {
425
  "method": "Cosmos3-Super Reasoner",
426
  "metric_key": "mrr",
@@ -434,32 +322,6 @@
434
  "task_label": "Cross-Modal Retrieval",
435
  "task_number": 9
436
  },
437
- {
438
- "method": "128ep Metadata Simple",
439
- "metric_key": "r2",
440
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package",
441
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
442
- "scope": "multi_episode_128_metadata_baseline",
443
- "series_id": "metadata128_simple",
444
- "status": "unsupported_without_required_target",
445
- "status_label": "unsupported",
446
- "task_id": "modality_reconstruction",
447
- "task_label": "Cross-Modal Reconstruction",
448
- "task_number": 10
449
- },
450
- {
451
- "method": "128ep Metadata NN",
452
- "metric_key": "r2",
453
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
454
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
455
- "scope": "multi_episode_128_metadata_baseline",
456
- "series_id": "metadata128_neural_mlp",
457
- "status": "not_supported_by_metadata_only_package",
458
- "status_label": "not supported",
459
- "task_id": "modality_reconstruction",
460
- "task_label": "Cross-Modal Reconstruction",
461
- "task_number": 10
462
- },
463
  {
464
  "method": "Qwen3-Omni v6 LoRA",
465
  "metric_key": "r2",
@@ -525,32 +387,6 @@
525
  "task_label": "Temporal Order Verification",
526
  "task_number": 11
527
  },
528
- {
529
- "method": "128ep Metadata Simple",
530
- "metric_key": "f1",
531
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone",
532
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
533
- "scope": "multi_episode_128_metadata_baseline",
534
- "series_id": "metadata128_simple",
535
- "status": "unsupported_without_required_target",
536
- "status_label": "unsupported",
537
- "task_id": "misalignment_detection",
538
- "task_label": "Multimodal Synchronization Detection",
539
- "task_number": 12
540
- },
541
- {
542
- "method": "128ep Metadata NN",
543
- "metric_key": "f1",
544
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
545
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
546
- "scope": "multi_episode_128_metadata_baseline",
547
- "series_id": "metadata128_neural_mlp",
548
- "status": "not_supported_by_metadata_only_package",
549
- "status_label": "not supported",
550
- "task_id": "misalignment_detection",
551
- "task_label": "Multimodal Synchronization Detection",
552
- "task_number": 12
553
- },
554
  {
555
  "method": "Cosmos3-Super Reasoner",
556
  "metric_key": "f1",
@@ -577,19 +413,6 @@
577
  "task_label": "Multimodal Synchronization Detection",
578
  "task_number": 12
579
  },
580
- {
581
- "method": "128ep Metadata NN",
582
- "metric_key": "macro_f1",
583
- "reason": "train class count 887 exceeds --max-neural-classes 512",
584
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
585
- "scope": "multi_episode_128_metadata_baseline",
586
- "series_id": "metadata128_neural_mlp",
587
- "status": "unsupported_without_required_target",
588
- "status_label": "unsupported",
589
- "task_id": "long_horizon_next_action",
590
- "task_label": "Long-Horizon Next-Action Forecasting",
591
- "task_number": 13
592
- },
593
  {
594
  "method": "Cosmos3-Super Reasoner",
595
  "metric_key": "macro_f1",
@@ -616,19 +439,6 @@
616
  "task_label": "Long-Horizon Next-Action Forecasting",
617
  "task_number": 13
618
  },
619
- {
620
- "method": "128ep Metadata NN",
621
- "metric_key": "macro_f1",
622
- "reason": "train class count 651 exceeds --max-neural-classes 512",
623
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
624
- "scope": "multi_episode_128_metadata_baseline",
625
- "series_id": "metadata128_neural_mlp",
626
- "status": "unsupported_without_required_target",
627
- "status_label": "unsupported",
628
- "task_id": "next_subtask_forecast",
629
- "task_label": "Long-Horizon Next-Subtask Forecasting",
630
- "task_number": 14
631
- },
632
  {
633
  "method": "Cosmos3-Super Reasoner",
634
  "metric_key": "macro_f1",
@@ -656,11 +466,11 @@
656
  "task_number": 14
657
  },
658
  {
659
- "method": "128ep Metadata Simple",
660
  "metric_key": "macro_f1",
661
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
662
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
663
- "scope": "multi_episode_128_metadata_baseline",
664
  "series_id": "metadata128_simple",
665
  "status": "unsupported_without_required_target",
666
  "status_label": "unsupported",
@@ -669,11 +479,11 @@
669
  "task_number": 15
670
  },
671
  {
672
- "method": "128ep Metadata NN",
673
  "metric_key": "macro_f1",
674
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
675
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
676
- "scope": "multi_episode_128_metadata_baseline",
677
  "series_id": "metadata128_neural_mlp",
678
  "status": "not_supported_by_metadata_only_package",
679
  "status_label": "not supported",
@@ -720,19 +530,6 @@
720
  "task_label": "Interaction Text Prediction",
721
  "task_number": 15
722
  },
723
- {
724
- "method": "128ep Metadata NN",
725
- "metric_key": "macro_f1",
726
- "reason": "train class count 3058 exceeds --max-neural-classes 512",
727
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
728
- "scope": "multi_episode_128_metadata_baseline",
729
- "series_id": "metadata128_neural_mlp",
730
- "status": "unsupported_without_required_target",
731
- "status_label": "unsupported",
732
- "task_id": "action_object_relation",
733
- "task_label": "Action-Object Relation Prediction",
734
- "task_number": 16
735
- },
736
  {
737
  "method": "Cosmos3-Nano Future Window",
738
  "metric_key": "macro_f1",
@@ -772,32 +569,6 @@
772
  "task_label": "Future Object-Set Forecasting",
773
  "task_number": 17
774
  },
775
- {
776
- "method": "128ep Metadata Simple",
777
- "metric_key": "mae",
778
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
779
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
780
- "scope": "multi_episode_128_metadata_baseline",
781
- "series_id": "metadata128_simple",
782
- "status": "unsupported_without_required_target",
783
- "status_label": "unsupported",
784
- "task_id": "imu_to_hand_pose",
785
- "task_label": "IMU-to-Hand Pose Reconstruction",
786
- "task_number": 18
787
- },
788
- {
789
- "method": "128ep Metadata NN",
790
- "metric_key": "mae",
791
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
792
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
793
- "scope": "multi_episode_128_metadata_baseline",
794
- "series_id": "metadata128_neural_mlp",
795
- "status": "not_supported_by_metadata_only_package",
796
- "status_label": "not supported",
797
- "task_id": "imu_to_hand_pose",
798
- "task_label": "IMU-to-Hand Pose Reconstruction",
799
- "task_number": 18
800
- },
801
  {
802
  "method": "Qwen3-Omni v6 LoRA",
803
  "metric_key": "mae",
@@ -838,11 +609,11 @@
838
  "task_number": 18
839
  },
840
  {
841
- "method": "128ep Metadata Simple",
842
  "metric_key": "mrr",
843
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
844
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
845
- "scope": "multi_episode_128_metadata_baseline",
846
  "series_id": "metadata128_simple",
847
  "status": "unsupported_without_required_target",
848
  "status_label": "unsupported",
@@ -851,11 +622,11 @@
851
  "task_number": 19
852
  },
853
  {
854
- "method": "128ep Metadata NN",
855
  "metric_key": "mrr",
856
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
857
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
858
- "scope": "multi_episode_128_metadata_baseline",
859
  "series_id": "metadata128_neural_mlp",
860
  "status": "not_supported_by_metadata_only_package",
861
  "status_label": "not supported",
@@ -975,8 +746,8 @@
975
  "method_count": 9,
976
  "method_task_record_count": 180,
977
  "proxy_scored_method_task_count": 4,
978
- "scored_method_task_count": 127,
979
- "scoreless_method_task_count": 53,
980
  "task_count": 20
981
  },
982
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
 
1
  {
2
+ "generated_at_utc": "2026-06-18T12:52:47+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
+ "purpose": "Keep the 37 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
 
45
  }
46
  },
47
  "metadata128_neural_mlp": {
48
+ "kind": "partial_128_episode_aligned_baseline",
49
+ "label": "128ep Aligned NN",
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
53
+ "scored_task_count": 18,
54
+ "scoreless_task_count": 2,
55
  "status_counts": {
56
+ "not_supported_by_metadata_only_package": 2,
57
+ "scored": 18
 
58
  }
59
  },
60
  "metadata128_simple": {
61
+ "kind": "partial_128_episode_aligned_baseline",
62
+ "label": "128ep Aligned Simple",
63
  "proxy_scored_task_count": 0,
64
  "result_record_count": 20,
65
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
66
+ "scored_task_count": 18,
67
+ "scoreless_task_count": 2,
68
  "status_counts": {
69
+ "scored": 18,
70
+ "unsupported_without_required_target": 2
71
  }
72
  },
73
  "minimal": {
 
137
  "missing_by_method": {
138
  "cosmos3_nano_future_window": 15,
139
  "cosmos3_super_reasoner": 13,
140
+ "metadata128_neural_mlp": 2,
141
+ "metadata128_simple": 2,
142
  "qwen3_omni_v6_lora": 5
143
  },
144
  "missing_by_status": {
145
  "not_evaluated_in_verified_package": 33,
146
+ "not_supported_by_metadata_only_package": 2,
147
+ "unsupported_without_required_target": 2
148
  },
149
  "missing_by_task": {
 
 
 
150
  "02 Procedure Step Recognition": [
151
+ "cosmos3_nano_future_window"
 
 
 
 
152
  ],
153
  "05 Hand Trajectory Forecasting": [
154
  "cosmos3_nano_future_window",
155
  "cosmos3_super_reasoner",
 
 
156
  "qwen3_omni_v6_lora"
157
  ],
158
  "07 Object Relevance Prediction": [
 
163
  "cosmos3_super_reasoner"
164
  ],
165
  "09 Cross-Modal Retrieval": [
166
+ "cosmos3_super_reasoner"
 
 
167
  ],
168
  "10 Cross-Modal Reconstruction": [
169
  "cosmos3_nano_future_window",
170
  "cosmos3_super_reasoner",
 
 
171
  "qwen3_omni_v6_lora"
172
  ],
173
  "11 Temporal Order Verification": [
 
176
  ],
177
  "12 Multimodal Synchronization Detection": [
178
  "cosmos3_nano_future_window",
179
+ "cosmos3_super_reasoner"
 
 
180
  ],
181
  "13 Long-Horizon Next-Action Forecasting": [
182
  "cosmos3_nano_future_window",
183
+ "cosmos3_super_reasoner"
 
184
  ],
185
  "14 Long-Horizon Next-Subtask Forecasting": [
186
  "cosmos3_nano_future_window",
187
+ "cosmos3_super_reasoner"
 
188
  ],
189
  "15 Interaction Text Prediction": [
190
  "cosmos3_nano_future_window",
 
194
  "qwen3_omni_v6_lora"
195
  ],
196
  "16 Action-Object Relation Prediction": [
197
+ "cosmos3_nano_future_window"
 
198
  ],
199
  "17 Future Object-Set Forecasting": [
200
  "cosmos3_nano_future_window",
 
203
  "18 IMU-to-Hand Pose Reconstruction": [
204
  "cosmos3_nano_future_window",
205
  "cosmos3_super_reasoner",
 
 
206
  "qwen3_omni_v6_lora"
207
  ],
208
  "19 Camera-View Synchronization Retrieval": [
 
218
  ]
219
  },
220
  "missing_records": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
  {
222
  "method": "Cosmos3-Nano Future Window",
223
  "metric_key": "macro_f1",
 
231
  "task_label": "Procedure Step Recognition",
232
  "task_number": 2
233
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
  {
235
  "method": "Qwen3-Omni v6 LoRA",
236
  "metric_key": "mpjpe",
 
309
  "task_label": "Language Grounding",
310
  "task_number": 8
311
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
312
  {
313
  "method": "Cosmos3-Super Reasoner",
314
  "metric_key": "mrr",
 
322
  "task_label": "Cross-Modal Retrieval",
323
  "task_number": 9
324
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
325
  {
326
  "method": "Qwen3-Omni v6 LoRA",
327
  "metric_key": "r2",
 
387
  "task_label": "Temporal Order Verification",
388
  "task_number": 11
389
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
390
  {
391
  "method": "Cosmos3-Super Reasoner",
392
  "metric_key": "f1",
 
413
  "task_label": "Multimodal Synchronization Detection",
414
  "task_number": 12
415
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
416
  {
417
  "method": "Cosmos3-Super Reasoner",
418
  "metric_key": "macro_f1",
 
439
  "task_label": "Long-Horizon Next-Action Forecasting",
440
  "task_number": 13
441
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
442
  {
443
  "method": "Cosmos3-Super Reasoner",
444
  "metric_key": "macro_f1",
 
466
  "task_number": 14
467
  },
468
  {
469
+ "method": "128ep Aligned Simple",
470
  "metric_key": "macro_f1",
471
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
472
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
473
+ "scope": "multi_episode_128_aligned_baseline",
474
  "series_id": "metadata128_simple",
475
  "status": "unsupported_without_required_target",
476
  "status_label": "unsupported",
 
479
  "task_number": 15
480
  },
481
  {
482
+ "method": "128ep Aligned NN",
483
  "metric_key": "macro_f1",
484
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
485
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
486
+ "scope": "multi_episode_128_aligned_baseline",
487
  "series_id": "metadata128_neural_mlp",
488
  "status": "not_supported_by_metadata_only_package",
489
  "status_label": "not supported",
 
530
  "task_label": "Interaction Text Prediction",
531
  "task_number": 15
532
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
533
  {
534
  "method": "Cosmos3-Nano Future Window",
535
  "metric_key": "macro_f1",
 
569
  "task_label": "Future Object-Set Forecasting",
570
  "task_number": 17
571
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
572
  {
573
  "method": "Qwen3-Omni v6 LoRA",
574
  "metric_key": "mae",
 
609
  "task_number": 18
610
  },
611
  {
612
+ "method": "128ep Aligned Simple",
613
  "metric_key": "mrr",
614
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
615
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
616
+ "scope": "multi_episode_128_aligned_baseline",
617
  "series_id": "metadata128_simple",
618
  "status": "unsupported_without_required_target",
619
  "status_label": "unsupported",
 
622
  "task_number": 19
623
  },
624
  {
625
+ "method": "128ep Aligned NN",
626
  "metric_key": "mrr",
627
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
628
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
629
+ "scope": "multi_episode_128_aligned_baseline",
630
  "series_id": "metadata128_neural_mlp",
631
  "status": "not_supported_by_metadata_only_package",
632
  "status_label": "not supported",
 
746
  "method_count": 9,
747
  "method_task_record_count": 180,
748
  "proxy_scored_method_task_count": 4,
749
+ "scored_method_task_count": 143,
750
+ "scoreless_method_task_count": 37,
751
  "task_count": 20
752
  },
753
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
data/task_method_20_result_matrix.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 133,
9
  "series": [
10
  {
11
  "id": "minimal",
@@ -55,50 +55,50 @@
55
  },
56
  {
57
  "id": "metadata128_simple",
58
- "label": "128ep Metadata Simple",
59
  "short_label": "128-S",
60
  "color": "#ffd166",
61
- "kind": "partial_128_episode_metadata_baseline",
62
- "scope": "128 selected episodes, JSONL metadata/text only",
63
  "stroke_dasharray": "9 6",
64
- "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
- "scored_task_count": 13,
68
- "covered_task_count": 13,
69
  "proxy_scored_task_count": 0,
70
- "scoreless_task_count": 7,
71
- "unsupported_task_count": 7,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
- "scored": 13,
75
- "unsupported_without_required_target": 7
76
  },
77
- "coverage_fraction": 0.65,
78
  "result_record_fraction": 1.0
79
  },
80
  {
81
  "id": "metadata128_neural_mlp",
82
- "label": "128ep Metadata NN",
83
  "short_label": "128-NN",
84
  "color": "#f472b6",
85
- "kind": "partial_128_episode_metadata_baseline",
86
- "scope": "128 selected episodes, JSONL metadata/text only",
87
  "stroke_dasharray": "3 6",
88
- "method_detail": "128-episode JSONL metadata/text MLP baselines.",
89
  "plotted_as": "colored point overlay",
90
  "result_record_count": 20,
91
- "scored_task_count": 13,
92
- "covered_task_count": 13,
93
  "proxy_scored_task_count": 0,
94
- "scoreless_task_count": 7,
95
- "unsupported_task_count": 7,
96
  "not_evaluated_task_count": 0,
97
  "status_counts": {
98
- "not_supported_by_metadata_only_package": 7,
99
- "scored": 13
100
  },
101
- "coverage_fraction": 0.65,
102
  "result_record_fraction": 1.0
103
  },
104
  {
@@ -264,7 +264,7 @@
264
  "task_id": "timeline_action",
265
  "task_label": "Action Recognition",
266
  "series_id": "metadata128_simple",
267
- "method": "128ep Metadata Simple",
268
  "status": "scored",
269
  "status_label": "scored",
270
  "scored": true,
@@ -274,7 +274,7 @@
274
  "normalized_score": 0.008252821966746326,
275
  "metric_key": "macro_f1",
276
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
277
- "scope": "multi_episode_128_metadata_baseline",
278
  "reason": null
279
  },
280
  {
@@ -282,7 +282,7 @@
282
  "task_id": "timeline_action",
283
  "task_label": "Action Recognition",
284
  "series_id": "metadata128_neural_mlp",
285
- "method": "128ep Metadata NN",
286
  "status": "scored",
287
  "status_label": "scored",
288
  "scored": true,
@@ -292,7 +292,7 @@
292
  "normalized_score": 0.004175793689174209,
293
  "metric_key": "macro_f1",
294
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
295
- "scope": "multi_episode_128_metadata_baseline",
296
  "reason": null
297
  },
298
  {
@@ -426,7 +426,7 @@
426
  "task_id": "timeline_subtask",
427
  "task_label": "Procedure Step Recognition",
428
  "series_id": "metadata128_simple",
429
- "method": "128ep Metadata Simple",
430
  "status": "scored",
431
  "status_label": "scored",
432
  "scored": true,
@@ -436,7 +436,7 @@
436
  "normalized_score": 0.00019512195121951218,
437
  "metric_key": "macro_f1",
438
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
439
- "scope": "multi_episode_128_metadata_baseline",
440
  "reason": null
441
  },
442
  {
@@ -444,7 +444,7 @@
444
  "task_id": "timeline_subtask",
445
  "task_label": "Procedure Step Recognition",
446
  "series_id": "metadata128_neural_mlp",
447
- "method": "128ep Metadata NN",
448
  "status": "scored",
449
  "status_label": "scored",
450
  "scored": true,
@@ -454,7 +454,7 @@
454
  "normalized_score": 7.207207207207208e-05,
455
  "metric_key": "macro_f1",
456
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
457
- "scope": "multi_episode_128_metadata_baseline",
458
  "reason": null
459
  },
460
  {
@@ -588,7 +588,7 @@
588
  "task_id": "transition_detection",
589
  "task_label": "Action Boundary Detection",
590
  "series_id": "metadata128_simple",
591
- "method": "128ep Metadata Simple",
592
  "status": "scored",
593
  "status_label": "scored",
594
  "scored": true,
@@ -598,7 +598,7 @@
598
  "normalized_score": 0.29652162550029315,
599
  "metric_key": "macro_f1",
600
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
601
- "scope": "multi_episode_128_metadata_baseline",
602
  "reason": null
603
  },
604
  {
@@ -606,7 +606,7 @@
606
  "task_id": "transition_detection",
607
  "task_label": "Action Boundary Detection",
608
  "series_id": "metadata128_neural_mlp",
609
- "method": "128ep Metadata NN",
610
  "status": "scored",
611
  "status_label": "scored",
612
  "scored": true,
@@ -616,7 +616,7 @@
616
  "normalized_score": 0.4841733292368365,
617
  "metric_key": "macro_f1",
618
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
619
- "scope": "multi_episode_128_metadata_baseline",
620
  "reason": null
621
  },
622
  {
@@ -750,7 +750,7 @@
750
  "task_id": "next_action",
751
  "task_label": "Next-Action Prediction",
752
  "series_id": "metadata128_simple",
753
- "method": "128ep Metadata Simple",
754
  "status": "scored",
755
  "status_label": "scored",
756
  "scored": true,
@@ -760,7 +760,7 @@
760
  "normalized_score": 0.006514774539765508,
761
  "metric_key": "macro_f1",
762
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
763
- "scope": "multi_episode_128_metadata_baseline",
764
  "reason": null
765
  },
766
  {
@@ -768,7 +768,7 @@
768
  "task_id": "next_action",
769
  "task_label": "Next-Action Prediction",
770
  "series_id": "metadata128_neural_mlp",
771
- "method": "128ep Metadata NN",
772
  "status": "scored",
773
  "status_label": "scored",
774
  "scored": true,
@@ -778,7 +778,7 @@
778
  "normalized_score": 0.004910507980164745,
779
  "metric_key": "macro_f1",
780
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
781
- "scope": "multi_episode_128_metadata_baseline",
782
  "reason": null
783
  },
784
  {
@@ -912,36 +912,36 @@
912
  "task_id": "hand_trajectory_forecast",
913
  "task_label": "Hand Trajectory Forecasting",
914
  "series_id": "metadata128_simple",
915
- "method": "128ep Metadata Simple",
916
- "status": "unsupported_without_required_target",
917
- "status_label": "unsupported",
918
- "scored": false,
919
  "proxy_scored": false,
920
- "raw": null,
921
- "raw_text": "n/a",
922
- "normalized_score": null,
923
  "metric_key": "mpjpe",
924
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
925
- "scope": "multi_episode_128_metadata_baseline",
926
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package"
927
  },
928
  {
929
  "task_number": 5,
930
  "task_id": "hand_trajectory_forecast",
931
  "task_label": "Hand Trajectory Forecasting",
932
  "series_id": "metadata128_neural_mlp",
933
- "method": "128ep Metadata NN",
934
- "status": "not_supported_by_metadata_only_package",
935
- "status_label": "not supported",
936
- "scored": false,
937
  "proxy_scored": false,
938
- "raw": null,
939
- "raw_text": "n/a",
940
- "normalized_score": null,
941
  "metric_key": "mpjpe",
942
- "source": null,
943
- "scope": "multi_episode_128_metadata_baseline",
944
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
945
  },
946
  {
947
  "task_number": 5,
@@ -1074,7 +1074,7 @@
1074
  "task_id": "contact_prediction",
1075
  "task_label": "Contact State Prediction",
1076
  "series_id": "metadata128_simple",
1077
- "method": "128ep Metadata Simple",
1078
  "status": "scored",
1079
  "status_label": "scored",
1080
  "scored": true,
@@ -1084,7 +1084,7 @@
1084
  "normalized_score": 0.4381481308057444,
1085
  "metric_key": "macro_f1",
1086
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
1087
- "scope": "multi_episode_128_metadata_baseline",
1088
  "reason": null
1089
  },
1090
  {
@@ -1092,7 +1092,7 @@
1092
  "task_id": "contact_prediction",
1093
  "task_label": "Contact State Prediction",
1094
  "series_id": "metadata128_neural_mlp",
1095
- "method": "128ep Metadata NN",
1096
  "status": "scored",
1097
  "status_label": "scored",
1098
  "scored": true,
@@ -1102,7 +1102,7 @@
1102
  "normalized_score": 0.5682695682695682,
1103
  "metric_key": "macro_f1",
1104
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
1105
- "scope": "multi_episode_128_metadata_baseline",
1106
  "reason": null
1107
  },
1108
  {
@@ -1236,7 +1236,7 @@
1236
  "task_id": "object_relevance",
1237
  "task_label": "Object Relevance Prediction",
1238
  "series_id": "metadata128_simple",
1239
- "method": "128ep Metadata Simple",
1240
  "status": "scored",
1241
  "status_label": "scored",
1242
  "scored": true,
@@ -1246,7 +1246,7 @@
1246
  "normalized_score": 0.17764578833693304,
1247
  "metric_key": "micro_f1",
1248
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
1249
- "scope": "multi_episode_128_metadata_baseline",
1250
  "reason": null
1251
  },
1252
  {
@@ -1254,7 +1254,7 @@
1254
  "task_id": "object_relevance",
1255
  "task_label": "Object Relevance Prediction",
1256
  "series_id": "metadata128_neural_mlp",
1257
- "method": "128ep Metadata NN",
1258
  "status": "scored",
1259
  "status_label": "scored",
1260
  "scored": true,
@@ -1264,7 +1264,7 @@
1264
  "normalized_score": 0.18662723837686876,
1265
  "metric_key": "micro_f1",
1266
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
1267
- "scope": "multi_episode_128_metadata_baseline",
1268
  "reason": null
1269
  },
1270
  {
@@ -1398,7 +1398,7 @@
1398
  "task_id": "caption_grounding",
1399
  "task_label": "Language Grounding",
1400
  "series_id": "metadata128_simple",
1401
- "method": "128ep Metadata Simple",
1402
  "status": "scored",
1403
  "status_label": "scored",
1404
  "scored": true,
@@ -1408,7 +1408,7 @@
1408
  "normalized_score": 0.002332374220713973,
1409
  "metric_key": "mrr",
1410
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1411
- "scope": "multi_episode_128_metadata_baseline",
1412
  "reason": null
1413
  },
1414
  {
@@ -1416,7 +1416,7 @@
1416
  "task_id": "caption_grounding",
1417
  "task_label": "Language Grounding",
1418
  "series_id": "metadata128_neural_mlp",
1419
- "method": "128ep Metadata NN",
1420
  "status": "scored",
1421
  "status_label": "scored",
1422
  "scored": true,
@@ -1426,7 +1426,7 @@
1426
  "normalized_score": 0.008236799389123917,
1427
  "metric_key": "mrr",
1428
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1429
- "scope": "multi_episode_128_metadata_baseline",
1430
  "reason": null
1431
  },
1432
  {
@@ -1560,36 +1560,36 @@
1560
  "task_id": "cross_modal_retrieval",
1561
  "task_label": "Cross-Modal Retrieval",
1562
  "series_id": "metadata128_simple",
1563
- "method": "128ep Metadata Simple",
1564
- "status": "unsupported_without_required_target",
1565
- "status_label": "unsupported",
1566
- "scored": false,
1567
  "proxy_scored": false,
1568
- "raw": null,
1569
- "raw_text": "n/a",
1570
- "normalized_score": null,
1571
  "metric_key": "mrr",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1573
- "scope": "multi_episode_128_metadata_baseline",
1574
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package"
1575
  },
1576
  {
1577
  "task_number": 9,
1578
  "task_id": "cross_modal_retrieval",
1579
  "task_label": "Cross-Modal Retrieval",
1580
  "series_id": "metadata128_neural_mlp",
1581
- "method": "128ep Metadata NN",
1582
- "status": "not_supported_by_metadata_only_package",
1583
- "status_label": "not supported",
1584
- "scored": false,
1585
  "proxy_scored": false,
1586
- "raw": null,
1587
- "raw_text": "n/a",
1588
- "normalized_score": null,
1589
  "metric_key": "mrr",
1590
- "source": null,
1591
- "scope": "multi_episode_128_metadata_baseline",
1592
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
1593
  },
1594
  {
1595
  "task_number": 9,
@@ -1722,36 +1722,36 @@
1722
  "task_id": "modality_reconstruction",
1723
  "task_label": "Cross-Modal Reconstruction",
1724
  "series_id": "metadata128_simple",
1725
- "method": "128ep Metadata Simple",
1726
- "status": "unsupported_without_required_target",
1727
- "status_label": "unsupported",
1728
- "scored": false,
1729
  "proxy_scored": false,
1730
- "raw": null,
1731
- "raw_text": "n/a",
1732
- "normalized_score": null,
1733
  "metric_key": "r2",
1734
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1735
- "scope": "multi_episode_128_metadata_baseline",
1736
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package"
1737
  },
1738
  {
1739
  "task_number": 10,
1740
  "task_id": "modality_reconstruction",
1741
  "task_label": "Cross-Modal Reconstruction",
1742
  "series_id": "metadata128_neural_mlp",
1743
- "method": "128ep Metadata NN",
1744
- "status": "not_supported_by_metadata_only_package",
1745
- "status_label": "not supported",
1746
- "scored": false,
1747
  "proxy_scored": false,
1748
- "raw": null,
1749
- "raw_text": "n/a",
1750
- "normalized_score": null,
1751
  "metric_key": "r2",
1752
- "source": null,
1753
- "scope": "multi_episode_128_metadata_baseline",
1754
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
1755
  },
1756
  {
1757
  "task_number": 10,
@@ -1884,7 +1884,7 @@
1884
  "task_id": "temporal_order",
1885
  "task_label": "Temporal Order Verification",
1886
  "series_id": "metadata128_simple",
1887
- "method": "128ep Metadata Simple",
1888
  "status": "scored",
1889
  "status_label": "scored",
1890
  "scored": true,
@@ -1894,7 +1894,7 @@
1894
  "normalized_score": 0.4198864140782312,
1895
  "metric_key": "f1",
1896
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1897
- "scope": "multi_episode_128_metadata_baseline",
1898
  "reason": null
1899
  },
1900
  {
@@ -1902,7 +1902,7 @@
1902
  "task_id": "temporal_order",
1903
  "task_label": "Temporal Order Verification",
1904
  "series_id": "metadata128_neural_mlp",
1905
- "method": "128ep Metadata NN",
1906
  "status": "scored",
1907
  "status_label": "scored",
1908
  "scored": true,
@@ -1912,7 +1912,7 @@
1912
  "normalized_score": 0.8252408266656923,
1913
  "metric_key": "f1",
1914
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1915
- "scope": "multi_episode_128_metadata_baseline",
1916
  "reason": null
1917
  },
1918
  {
@@ -2046,36 +2046,36 @@
2046
  "task_id": "misalignment_detection",
2047
  "task_label": "Multimodal Synchronization Detection",
2048
  "series_id": "metadata128_simple",
2049
- "method": "128ep Metadata Simple",
2050
- "status": "unsupported_without_required_target",
2051
- "status_label": "unsupported",
2052
- "scored": false,
2053
  "proxy_scored": false,
2054
- "raw": null,
2055
- "raw_text": "n/a",
2056
- "normalized_score": null,
2057
  "metric_key": "f1",
2058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
2059
- "scope": "multi_episode_128_metadata_baseline",
2060
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone"
2061
  },
2062
  {
2063
  "task_number": 12,
2064
  "task_id": "misalignment_detection",
2065
  "task_label": "Multimodal Synchronization Detection",
2066
  "series_id": "metadata128_neural_mlp",
2067
- "method": "128ep Metadata NN",
2068
- "status": "not_supported_by_metadata_only_package",
2069
- "status_label": "not supported",
2070
- "scored": false,
2071
  "proxy_scored": false,
2072
- "raw": null,
2073
- "raw_text": "n/a",
2074
- "normalized_score": null,
2075
  "metric_key": "f1",
2076
- "source": null,
2077
- "scope": "multi_episode_128_metadata_baseline",
2078
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2079
  },
2080
  {
2081
  "task_number": 12,
@@ -2208,7 +2208,7 @@
2208
  "task_id": "long_horizon_next_action",
2209
  "task_label": "Long-Horizon Next-Action Forecasting",
2210
  "series_id": "metadata128_simple",
2211
- "method": "128ep Metadata Simple",
2212
  "status": "scored",
2213
  "status_label": "scored",
2214
  "scored": true,
@@ -2218,7 +2218,7 @@
2218
  "normalized_score": 0.004579592783699693,
2219
  "metric_key": "macro_f1",
2220
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
2221
- "scope": "multi_episode_128_metadata_baseline",
2222
  "reason": null
2223
  },
2224
  {
@@ -2226,7 +2226,7 @@
2226
  "task_id": "long_horizon_next_action",
2227
  "task_label": "Long-Horizon Next-Action Forecasting",
2228
  "series_id": "metadata128_neural_mlp",
2229
- "method": "128ep Metadata NN",
2230
  "status": "scored",
2231
  "status_label": "scored",
2232
  "scored": true,
@@ -2236,7 +2236,7 @@
2236
  "normalized_score": 0.0029821307969142615,
2237
  "metric_key": "macro_f1",
2238
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
2239
- "scope": "multi_episode_128_metadata_baseline",
2240
  "reason": null
2241
  },
2242
  {
@@ -2370,7 +2370,7 @@
2370
  "task_id": "next_subtask_forecast",
2371
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2372
  "series_id": "metadata128_simple",
2373
- "method": "128ep Metadata Simple",
2374
  "status": "scored",
2375
  "status_label": "scored",
2376
  "scored": true,
@@ -2380,7 +2380,7 @@
2380
  "normalized_score": 0.0001206030150753769,
2381
  "metric_key": "macro_f1",
2382
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
2383
- "scope": "multi_episode_128_metadata_baseline",
2384
  "reason": null
2385
  },
2386
  {
@@ -2388,7 +2388,7 @@
2388
  "task_id": "next_subtask_forecast",
2389
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2390
  "series_id": "metadata128_neural_mlp",
2391
- "method": "128ep Metadata NN",
2392
  "status": "scored",
2393
  "status_label": "scored",
2394
  "scored": true,
@@ -2398,7 +2398,7 @@
2398
  "normalized_score": 2.086049543676662e-05,
2399
  "metric_key": "macro_f1",
2400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
2401
- "scope": "multi_episode_128_metadata_baseline",
2402
  "reason": null
2403
  },
2404
  {
@@ -2532,7 +2532,7 @@
2532
  "task_id": "interaction_text_prediction",
2533
  "task_label": "Interaction Text Prediction",
2534
  "series_id": "metadata128_simple",
2535
- "method": "128ep Metadata Simple",
2536
  "status": "unsupported_without_required_target",
2537
  "status_label": "unsupported",
2538
  "scored": false,
@@ -2542,7 +2542,7 @@
2542
  "normalized_score": null,
2543
  "metric_key": "macro_f1",
2544
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
2545
- "scope": "multi_episode_128_metadata_baseline",
2546
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
2547
  },
2548
  {
@@ -2550,7 +2550,7 @@
2550
  "task_id": "interaction_text_prediction",
2551
  "task_label": "Interaction Text Prediction",
2552
  "series_id": "metadata128_neural_mlp",
2553
- "method": "128ep Metadata NN",
2554
  "status": "not_supported_by_metadata_only_package",
2555
  "status_label": "not supported",
2556
  "scored": false,
@@ -2560,8 +2560,8 @@
2560
  "normalized_score": null,
2561
  "metric_key": "macro_f1",
2562
  "source": null,
2563
- "scope": "multi_episode_128_metadata_baseline",
2564
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2565
  },
2566
  {
2567
  "task_number": 15,
@@ -2694,7 +2694,7 @@
2694
  "task_id": "action_object_relation",
2695
  "task_label": "Action-Object Relation Prediction",
2696
  "series_id": "metadata128_simple",
2697
- "method": "128ep Metadata Simple",
2698
  "status": "scored",
2699
  "status_label": "scored",
2700
  "scored": true,
@@ -2704,7 +2704,7 @@
2704
  "normalized_score": 0.0,
2705
  "metric_key": "macro_f1",
2706
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
2707
- "scope": "multi_episode_128_metadata_baseline",
2708
  "reason": null
2709
  },
2710
  {
@@ -2712,7 +2712,7 @@
2712
  "task_id": "action_object_relation",
2713
  "task_label": "Action-Object Relation Prediction",
2714
  "series_id": "metadata128_neural_mlp",
2715
- "method": "128ep Metadata NN",
2716
  "status": "scored",
2717
  "status_label": "scored",
2718
  "scored": true,
@@ -2722,7 +2722,7 @@
2722
  "normalized_score": 0.0,
2723
  "metric_key": "macro_f1",
2724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
2725
- "scope": "multi_episode_128_metadata_baseline",
2726
  "reason": null
2727
  },
2728
  {
@@ -2856,7 +2856,7 @@
2856
  "task_id": "object_set_forecast",
2857
  "task_label": "Future Object-Set Forecasting",
2858
  "series_id": "metadata128_simple",
2859
- "method": "128ep Metadata Simple",
2860
  "status": "scored",
2861
  "status_label": "scored",
2862
  "scored": true,
@@ -2866,7 +2866,7 @@
2866
  "normalized_score": 0.17656983343047333,
2867
  "metric_key": "micro_f1",
2868
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2869
- "scope": "multi_episode_128_metadata_baseline",
2870
  "reason": null
2871
  },
2872
  {
@@ -2874,7 +2874,7 @@
2874
  "task_id": "object_set_forecast",
2875
  "task_label": "Future Object-Set Forecasting",
2876
  "series_id": "metadata128_neural_mlp",
2877
- "method": "128ep Metadata NN",
2878
  "status": "scored",
2879
  "status_label": "scored",
2880
  "scored": true,
@@ -2884,7 +2884,7 @@
2884
  "normalized_score": 0.17418550827844048,
2885
  "metric_key": "micro_f1",
2886
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2887
- "scope": "multi_episode_128_metadata_baseline",
2888
  "reason": null
2889
  },
2890
  {
@@ -3018,36 +3018,36 @@
3018
  "task_id": "imu_to_hand_pose",
3019
  "task_label": "IMU-to-Hand Pose Reconstruction",
3020
  "series_id": "metadata128_simple",
3021
- "method": "128ep Metadata Simple",
3022
- "status": "unsupported_without_required_target",
3023
- "status_label": "unsupported",
3024
- "scored": false,
3025
  "proxy_scored": false,
3026
- "raw": null,
3027
- "raw_text": "n/a",
3028
- "normalized_score": null,
3029
  "metric_key": "mae",
3030
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
3031
- "scope": "multi_episode_128_metadata_baseline",
3032
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
3033
  },
3034
  {
3035
  "task_number": 18,
3036
  "task_id": "imu_to_hand_pose",
3037
  "task_label": "IMU-to-Hand Pose Reconstruction",
3038
  "series_id": "metadata128_neural_mlp",
3039
- "method": "128ep Metadata NN",
3040
- "status": "not_supported_by_metadata_only_package",
3041
- "status_label": "not supported",
3042
- "scored": false,
3043
  "proxy_scored": false,
3044
- "raw": null,
3045
- "raw_text": "n/a",
3046
- "normalized_score": null,
3047
  "metric_key": "mae",
3048
- "source": null,
3049
- "scope": "multi_episode_128_metadata_baseline",
3050
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3051
  },
3052
  {
3053
  "task_number": 18,
@@ -3180,7 +3180,7 @@
3180
  "task_id": "camera_view_sync_retrieval",
3181
  "task_label": "Camera-View Synchronization Retrieval",
3182
  "series_id": "metadata128_simple",
3183
- "method": "128ep Metadata Simple",
3184
  "status": "unsupported_without_required_target",
3185
  "status_label": "unsupported",
3186
  "scored": false,
@@ -3190,7 +3190,7 @@
3190
  "normalized_score": null,
3191
  "metric_key": "mrr",
3192
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
3193
- "scope": "multi_episode_128_metadata_baseline",
3194
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
3195
  },
3196
  {
@@ -3198,7 +3198,7 @@
3198
  "task_id": "camera_view_sync_retrieval",
3199
  "task_label": "Camera-View Synchronization Retrieval",
3200
  "series_id": "metadata128_neural_mlp",
3201
- "method": "128ep Metadata NN",
3202
  "status": "not_supported_by_metadata_only_package",
3203
  "status_label": "not supported",
3204
  "scored": false,
@@ -3208,8 +3208,8 @@
3208
  "normalized_score": null,
3209
  "metric_key": "mrr",
3210
  "source": null,
3211
- "scope": "multi_episode_128_metadata_baseline",
3212
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3213
  },
3214
  {
3215
  "task_number": 19,
@@ -3342,7 +3342,7 @@
3342
  "task_id": "time_to_transition",
3343
  "task_label": "Time-to-Next-Transition Regression",
3344
  "series_id": "metadata128_simple",
3345
- "method": "128ep Metadata Simple",
3346
  "status": "scored",
3347
  "status_label": "scored",
3348
  "scored": true,
@@ -3352,7 +3352,7 @@
3352
  "normalized_score": 0.016864874132806403,
3353
  "metric_key": "mae",
3354
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
3355
- "scope": "multi_episode_128_metadata_baseline",
3356
  "reason": null
3357
  },
3358
  {
@@ -3360,7 +3360,7 @@
3360
  "task_id": "time_to_transition",
3361
  "task_label": "Time-to-Next-Transition Regression",
3362
  "series_id": "metadata128_neural_mlp",
3363
- "method": "128ep Metadata NN",
3364
  "status": "scored",
3365
  "status_label": "scored",
3366
  "scored": true,
@@ -3370,7 +3370,7 @@
3370
  "normalized_score": 0.25411768748242325,
3371
  "metric_key": "mae",
3372
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
3373
- "scope": "multi_episode_128_metadata_baseline",
3374
  "reason": null
3375
  },
3376
  {
 
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 143,
9
  "series": [
10
  {
11
  "id": "minimal",
 
55
  },
56
  {
57
  "id": "metadata128_simple",
58
+ "label": "128ep Aligned Simple",
59
  "short_label": "128-S",
60
  "color": "#ffd166",
61
+ "kind": "partial_128_episode_aligned_baseline",
62
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
63
  "stroke_dasharray": "9 6",
64
+ "method_detail": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
+ "scored_task_count": 18,
68
+ "covered_task_count": 18,
69
  "proxy_scored_task_count": 0,
70
+ "scoreless_task_count": 2,
71
+ "unsupported_task_count": 2,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
+ "scored": 18,
75
+ "unsupported_without_required_target": 2
76
  },
77
+ "coverage_fraction": 0.9,
78
  "result_record_fraction": 1.0
79
  },
80
  {
81
  "id": "metadata128_neural_mlp",
82
+ "label": "128ep Aligned NN",
83
  "short_label": "128-NN",
84
  "color": "#f472b6",
85
+ "kind": "partial_128_episode_aligned_baseline",
86
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
87
  "stroke_dasharray": "3 6",
88
+ "method_detail": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
89
  "plotted_as": "colored point overlay",
90
  "result_record_count": 20,
91
+ "scored_task_count": 18,
92
+ "covered_task_count": 18,
93
  "proxy_scored_task_count": 0,
94
+ "scoreless_task_count": 2,
95
+ "unsupported_task_count": 2,
96
  "not_evaluated_task_count": 0,
97
  "status_counts": {
98
+ "not_supported_by_metadata_only_package": 2,
99
+ "scored": 18
100
  },
101
+ "coverage_fraction": 0.9,
102
  "result_record_fraction": 1.0
103
  },
104
  {
 
264
  "task_id": "timeline_action",
265
  "task_label": "Action Recognition",
266
  "series_id": "metadata128_simple",
267
+ "method": "128ep Aligned Simple",
268
  "status": "scored",
269
  "status_label": "scored",
270
  "scored": true,
 
274
  "normalized_score": 0.008252821966746326,
275
  "metric_key": "macro_f1",
276
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
277
+ "scope": "multi_episode_128_aligned_baseline",
278
  "reason": null
279
  },
280
  {
 
282
  "task_id": "timeline_action",
283
  "task_label": "Action Recognition",
284
  "series_id": "metadata128_neural_mlp",
285
+ "method": "128ep Aligned NN",
286
  "status": "scored",
287
  "status_label": "scored",
288
  "scored": true,
 
292
  "normalized_score": 0.004175793689174209,
293
  "metric_key": "macro_f1",
294
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
295
+ "scope": "multi_episode_128_aligned_baseline",
296
  "reason": null
297
  },
298
  {
 
426
  "task_id": "timeline_subtask",
427
  "task_label": "Procedure Step Recognition",
428
  "series_id": "metadata128_simple",
429
+ "method": "128ep Aligned Simple",
430
  "status": "scored",
431
  "status_label": "scored",
432
  "scored": true,
 
436
  "normalized_score": 0.00019512195121951218,
437
  "metric_key": "macro_f1",
438
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
439
+ "scope": "multi_episode_128_aligned_baseline",
440
  "reason": null
441
  },
442
  {
 
444
  "task_id": "timeline_subtask",
445
  "task_label": "Procedure Step Recognition",
446
  "series_id": "metadata128_neural_mlp",
447
+ "method": "128ep Aligned NN",
448
  "status": "scored",
449
  "status_label": "scored",
450
  "scored": true,
 
454
  "normalized_score": 7.207207207207208e-05,
455
  "metric_key": "macro_f1",
456
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
457
+ "scope": "multi_episode_128_aligned_baseline",
458
  "reason": null
459
  },
460
  {
 
588
  "task_id": "transition_detection",
589
  "task_label": "Action Boundary Detection",
590
  "series_id": "metadata128_simple",
591
+ "method": "128ep Aligned Simple",
592
  "status": "scored",
593
  "status_label": "scored",
594
  "scored": true,
 
598
  "normalized_score": 0.29652162550029315,
599
  "metric_key": "macro_f1",
600
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
601
+ "scope": "multi_episode_128_aligned_baseline",
602
  "reason": null
603
  },
604
  {
 
606
  "task_id": "transition_detection",
607
  "task_label": "Action Boundary Detection",
608
  "series_id": "metadata128_neural_mlp",
609
+ "method": "128ep Aligned NN",
610
  "status": "scored",
611
  "status_label": "scored",
612
  "scored": true,
 
616
  "normalized_score": 0.4841733292368365,
617
  "metric_key": "macro_f1",
618
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
619
+ "scope": "multi_episode_128_aligned_baseline",
620
  "reason": null
621
  },
622
  {
 
750
  "task_id": "next_action",
751
  "task_label": "Next-Action Prediction",
752
  "series_id": "metadata128_simple",
753
+ "method": "128ep Aligned Simple",
754
  "status": "scored",
755
  "status_label": "scored",
756
  "scored": true,
 
760
  "normalized_score": 0.006514774539765508,
761
  "metric_key": "macro_f1",
762
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
763
+ "scope": "multi_episode_128_aligned_baseline",
764
  "reason": null
765
  },
766
  {
 
768
  "task_id": "next_action",
769
  "task_label": "Next-Action Prediction",
770
  "series_id": "metadata128_neural_mlp",
771
+ "method": "128ep Aligned NN",
772
  "status": "scored",
773
  "status_label": "scored",
774
  "scored": true,
 
778
  "normalized_score": 0.004910507980164745,
779
  "metric_key": "macro_f1",
780
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
781
+ "scope": "multi_episode_128_aligned_baseline",
782
  "reason": null
783
  },
784
  {
 
912
  "task_id": "hand_trajectory_forecast",
913
  "task_label": "Hand Trajectory Forecasting",
914
  "series_id": "metadata128_simple",
915
+ "method": "128ep Aligned Simple",
916
+ "status": "scored",
917
+ "status_label": "scored",
918
+ "scored": true,
919
  "proxy_scored": false,
920
+ "raw": 8.817333221435547,
921
+ "raw_text": "8.817",
922
+ "normalized_score": 0.012231610603598841,
923
  "metric_key": "mpjpe",
924
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
925
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
926
+ "reason": null
927
  },
928
  {
929
  "task_number": 5,
930
  "task_id": "hand_trajectory_forecast",
931
  "task_label": "Hand Trajectory Forecasting",
932
  "series_id": "metadata128_neural_mlp",
933
+ "method": "128ep Aligned NN",
934
+ "status": "scored",
935
+ "status_label": "scored",
936
+ "scored": true,
937
  "proxy_scored": false,
938
+ "raw": 0.429434210062027,
939
+ "raw_text": "0.4294",
940
+ "normalized_score": 0.25114484128127007,
941
  "metric_key": "mpjpe",
942
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
943
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
944
+ "reason": null
945
  },
946
  {
947
  "task_number": 5,
 
1074
  "task_id": "contact_prediction",
1075
  "task_label": "Contact State Prediction",
1076
  "series_id": "metadata128_simple",
1077
+ "method": "128ep Aligned Simple",
1078
  "status": "scored",
1079
  "status_label": "scored",
1080
  "scored": true,
 
1084
  "normalized_score": 0.4381481308057444,
1085
  "metric_key": "macro_f1",
1086
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
1087
+ "scope": "multi_episode_128_aligned_baseline",
1088
  "reason": null
1089
  },
1090
  {
 
1092
  "task_id": "contact_prediction",
1093
  "task_label": "Contact State Prediction",
1094
  "series_id": "metadata128_neural_mlp",
1095
+ "method": "128ep Aligned NN",
1096
  "status": "scored",
1097
  "status_label": "scored",
1098
  "scored": true,
 
1102
  "normalized_score": 0.5682695682695682,
1103
  "metric_key": "macro_f1",
1104
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
1105
+ "scope": "multi_episode_128_aligned_baseline",
1106
  "reason": null
1107
  },
1108
  {
 
1236
  "task_id": "object_relevance",
1237
  "task_label": "Object Relevance Prediction",
1238
  "series_id": "metadata128_simple",
1239
+ "method": "128ep Aligned Simple",
1240
  "status": "scored",
1241
  "status_label": "scored",
1242
  "scored": true,
 
1246
  "normalized_score": 0.17764578833693304,
1247
  "metric_key": "micro_f1",
1248
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
1249
+ "scope": "multi_episode_128_aligned_baseline",
1250
  "reason": null
1251
  },
1252
  {
 
1254
  "task_id": "object_relevance",
1255
  "task_label": "Object Relevance Prediction",
1256
  "series_id": "metadata128_neural_mlp",
1257
+ "method": "128ep Aligned NN",
1258
  "status": "scored",
1259
  "status_label": "scored",
1260
  "scored": true,
 
1264
  "normalized_score": 0.18662723837686876,
1265
  "metric_key": "micro_f1",
1266
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
1267
+ "scope": "multi_episode_128_aligned_baseline",
1268
  "reason": null
1269
  },
1270
  {
 
1398
  "task_id": "caption_grounding",
1399
  "task_label": "Language Grounding",
1400
  "series_id": "metadata128_simple",
1401
+ "method": "128ep Aligned Simple",
1402
  "status": "scored",
1403
  "status_label": "scored",
1404
  "scored": true,
 
1408
  "normalized_score": 0.002332374220713973,
1409
  "metric_key": "mrr",
1410
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1411
+ "scope": "multi_episode_128_aligned_baseline",
1412
  "reason": null
1413
  },
1414
  {
 
1416
  "task_id": "caption_grounding",
1417
  "task_label": "Language Grounding",
1418
  "series_id": "metadata128_neural_mlp",
1419
+ "method": "128ep Aligned NN",
1420
  "status": "scored",
1421
  "status_label": "scored",
1422
  "scored": true,
 
1426
  "normalized_score": 0.008236799389123917,
1427
  "metric_key": "mrr",
1428
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1429
+ "scope": "multi_episode_128_aligned_baseline",
1430
  "reason": null
1431
  },
1432
  {
 
1560
  "task_id": "cross_modal_retrieval",
1561
  "task_label": "Cross-Modal Retrieval",
1562
  "series_id": "metadata128_simple",
1563
+ "method": "128ep Aligned Simple",
1564
+ "status": "scored",
1565
+ "status_label": "scored",
1566
+ "scored": true,
1567
  "proxy_scored": false,
1568
+ "raw": 0.002587692579254508,
1569
+ "raw_text": "0.0026",
1570
+ "normalized_score": 0.002587692579254508,
1571
  "metric_key": "mrr",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1573
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1574
+ "reason": null
1575
  },
1576
  {
1577
  "task_number": 9,
1578
  "task_id": "cross_modal_retrieval",
1579
  "task_label": "Cross-Modal Retrieval",
1580
  "series_id": "metadata128_neural_mlp",
1581
+ "method": "128ep Aligned NN",
1582
+ "status": "scored",
1583
+ "status_label": "scored",
1584
+ "scored": true,
1585
  "proxy_scored": false,
1586
+ "raw": 0.0026067993603646755,
1587
+ "raw_text": "0.0026",
1588
+ "normalized_score": 0.0026067993603646755,
1589
  "metric_key": "mrr",
1590
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
1591
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1592
+ "reason": null
1593
  },
1594
  {
1595
  "task_number": 9,
 
1722
  "task_id": "modality_reconstruction",
1723
  "task_label": "Cross-Modal Reconstruction",
1724
  "series_id": "metadata128_simple",
1725
+ "method": "128ep Aligned Simple",
1726
+ "status": "scored",
1727
+ "status_label": "scored",
1728
+ "scored": true,
1729
  "proxy_scored": false,
1730
+ "raw": -190.66106203944798,
1731
+ "raw_text": "-190.66",
1732
+ "normalized_score": 0.0,
1733
  "metric_key": "r2",
1734
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1735
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1736
+ "reason": null
1737
  },
1738
  {
1739
  "task_number": 10,
1740
  "task_id": "modality_reconstruction",
1741
  "task_label": "Cross-Modal Reconstruction",
1742
  "series_id": "metadata128_neural_mlp",
1743
+ "method": "128ep Aligned NN",
1744
+ "status": "scored",
1745
+ "status_label": "scored",
1746
+ "scored": true,
1747
  "proxy_scored": false,
1748
+ "raw": -0.43481132003942147,
1749
+ "raw_text": "-0.4348",
1750
+ "normalized_score": 0.0,
1751
  "metric_key": "r2",
1752
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
1753
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1754
+ "reason": null
1755
  },
1756
  {
1757
  "task_number": 10,
 
1884
  "task_id": "temporal_order",
1885
  "task_label": "Temporal Order Verification",
1886
  "series_id": "metadata128_simple",
1887
+ "method": "128ep Aligned Simple",
1888
  "status": "scored",
1889
  "status_label": "scored",
1890
  "scored": true,
 
1894
  "normalized_score": 0.4198864140782312,
1895
  "metric_key": "f1",
1896
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1897
+ "scope": "multi_episode_128_aligned_baseline",
1898
  "reason": null
1899
  },
1900
  {
 
1902
  "task_id": "temporal_order",
1903
  "task_label": "Temporal Order Verification",
1904
  "series_id": "metadata128_neural_mlp",
1905
+ "method": "128ep Aligned NN",
1906
  "status": "scored",
1907
  "status_label": "scored",
1908
  "scored": true,
 
1912
  "normalized_score": 0.8252408266656923,
1913
  "metric_key": "f1",
1914
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1915
+ "scope": "multi_episode_128_aligned_baseline",
1916
  "reason": null
1917
  },
1918
  {
 
2046
  "task_id": "misalignment_detection",
2047
  "task_label": "Multimodal Synchronization Detection",
2048
  "series_id": "metadata128_simple",
2049
+ "method": "128ep Aligned Simple",
2050
+ "status": "scored",
2051
+ "status_label": "scored",
2052
+ "scored": true,
2053
  "proxy_scored": false,
2054
+ "raw": 0.49980060227663614,
2055
+ "raw_text": "0.4998",
2056
+ "normalized_score": 0.49980060227663614,
2057
  "metric_key": "f1",
2058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
2059
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2060
+ "reason": null
2061
  },
2062
  {
2063
  "task_number": 12,
2064
  "task_id": "misalignment_detection",
2065
  "task_label": "Multimodal Synchronization Detection",
2066
  "series_id": "metadata128_neural_mlp",
2067
+ "method": "128ep Aligned NN",
2068
+ "status": "scored",
2069
+ "status_label": "scored",
2070
+ "scored": true,
2071
  "proxy_scored": false,
2072
+ "raw": 0.7773773780941162,
2073
+ "raw_text": "0.7774",
2074
+ "normalized_score": 0.7773773780941162,
2075
  "metric_key": "f1",
2076
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
2077
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2078
+ "reason": null
2079
  },
2080
  {
2081
  "task_number": 12,
 
2208
  "task_id": "long_horizon_next_action",
2209
  "task_label": "Long-Horizon Next-Action Forecasting",
2210
  "series_id": "metadata128_simple",
2211
+ "method": "128ep Aligned Simple",
2212
  "status": "scored",
2213
  "status_label": "scored",
2214
  "scored": true,
 
2218
  "normalized_score": 0.004579592783699693,
2219
  "metric_key": "macro_f1",
2220
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
2221
+ "scope": "multi_episode_128_aligned_baseline",
2222
  "reason": null
2223
  },
2224
  {
 
2226
  "task_id": "long_horizon_next_action",
2227
  "task_label": "Long-Horizon Next-Action Forecasting",
2228
  "series_id": "metadata128_neural_mlp",
2229
+ "method": "128ep Aligned NN",
2230
  "status": "scored",
2231
  "status_label": "scored",
2232
  "scored": true,
 
2236
  "normalized_score": 0.0029821307969142615,
2237
  "metric_key": "macro_f1",
2238
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
2239
+ "scope": "multi_episode_128_aligned_baseline",
2240
  "reason": null
2241
  },
2242
  {
 
2370
  "task_id": "next_subtask_forecast",
2371
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2372
  "series_id": "metadata128_simple",
2373
+ "method": "128ep Aligned Simple",
2374
  "status": "scored",
2375
  "status_label": "scored",
2376
  "scored": true,
 
2380
  "normalized_score": 0.0001206030150753769,
2381
  "metric_key": "macro_f1",
2382
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
2383
+ "scope": "multi_episode_128_aligned_baseline",
2384
  "reason": null
2385
  },
2386
  {
 
2388
  "task_id": "next_subtask_forecast",
2389
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2390
  "series_id": "metadata128_neural_mlp",
2391
+ "method": "128ep Aligned NN",
2392
  "status": "scored",
2393
  "status_label": "scored",
2394
  "scored": true,
 
2398
  "normalized_score": 2.086049543676662e-05,
2399
  "metric_key": "macro_f1",
2400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
2401
+ "scope": "multi_episode_128_aligned_baseline",
2402
  "reason": null
2403
  },
2404
  {
 
2532
  "task_id": "interaction_text_prediction",
2533
  "task_label": "Interaction Text Prediction",
2534
  "series_id": "metadata128_simple",
2535
+ "method": "128ep Aligned Simple",
2536
  "status": "unsupported_without_required_target",
2537
  "status_label": "unsupported",
2538
  "scored": false,
 
2542
  "normalized_score": null,
2543
  "metric_key": "macro_f1",
2544
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
2545
+ "scope": "multi_episode_128_aligned_baseline",
2546
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
2547
  },
2548
  {
 
2550
  "task_id": "interaction_text_prediction",
2551
  "task_label": "Interaction Text Prediction",
2552
  "series_id": "metadata128_neural_mlp",
2553
+ "method": "128ep Aligned NN",
2554
  "status": "not_supported_by_metadata_only_package",
2555
  "status_label": "not supported",
2556
  "scored": false,
 
2560
  "normalized_score": null,
2561
  "metric_key": "macro_f1",
2562
  "source": null,
2563
+ "scope": "multi_episode_128_aligned_baseline",
2564
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
2565
  },
2566
  {
2567
  "task_number": 15,
 
2694
  "task_id": "action_object_relation",
2695
  "task_label": "Action-Object Relation Prediction",
2696
  "series_id": "metadata128_simple",
2697
+ "method": "128ep Aligned Simple",
2698
  "status": "scored",
2699
  "status_label": "scored",
2700
  "scored": true,
 
2704
  "normalized_score": 0.0,
2705
  "metric_key": "macro_f1",
2706
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
2707
+ "scope": "multi_episode_128_aligned_baseline",
2708
  "reason": null
2709
  },
2710
  {
 
2712
  "task_id": "action_object_relation",
2713
  "task_label": "Action-Object Relation Prediction",
2714
  "series_id": "metadata128_neural_mlp",
2715
+ "method": "128ep Aligned NN",
2716
  "status": "scored",
2717
  "status_label": "scored",
2718
  "scored": true,
 
2722
  "normalized_score": 0.0,
2723
  "metric_key": "macro_f1",
2724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
2725
+ "scope": "multi_episode_128_aligned_baseline",
2726
  "reason": null
2727
  },
2728
  {
 
2856
  "task_id": "object_set_forecast",
2857
  "task_label": "Future Object-Set Forecasting",
2858
  "series_id": "metadata128_simple",
2859
+ "method": "128ep Aligned Simple",
2860
  "status": "scored",
2861
  "status_label": "scored",
2862
  "scored": true,
 
2866
  "normalized_score": 0.17656983343047333,
2867
  "metric_key": "micro_f1",
2868
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2869
+ "scope": "multi_episode_128_aligned_baseline",
2870
  "reason": null
2871
  },
2872
  {
 
2874
  "task_id": "object_set_forecast",
2875
  "task_label": "Future Object-Set Forecasting",
2876
  "series_id": "metadata128_neural_mlp",
2877
+ "method": "128ep Aligned NN",
2878
  "status": "scored",
2879
  "status_label": "scored",
2880
  "scored": true,
 
2884
  "normalized_score": 0.17418550827844048,
2885
  "metric_key": "micro_f1",
2886
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2887
+ "scope": "multi_episode_128_aligned_baseline",
2888
  "reason": null
2889
  },
2890
  {
 
3018
  "task_id": "imu_to_hand_pose",
3019
  "task_label": "IMU-to-Hand Pose Reconstruction",
3020
  "series_id": "metadata128_simple",
3021
+ "method": "128ep Aligned Simple",
3022
+ "status": "scored",
3023
+ "status_label": "scored",
3024
+ "scored": true,
3025
  "proxy_scored": false,
3026
+ "raw": 0.2294670194387436,
3027
+ "raw_text": "0.2295",
3028
+ "normalized_score": 0.18324815505876868,
3029
  "metric_key": "mae",
3030
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
3031
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3032
+ "reason": null
3033
  },
3034
  {
3035
  "task_number": 18,
3036
  "task_id": "imu_to_hand_pose",
3037
  "task_label": "IMU-to-Hand Pose Reconstruction",
3038
  "series_id": "metadata128_neural_mlp",
3039
+ "method": "128ep Aligned NN",
3040
+ "status": "scored",
3041
+ "status_label": "scored",
3042
+ "scored": true,
3043
  "proxy_scored": false,
3044
+ "raw": 0.2555866539478302,
3045
+ "raw_text": "0.2556",
3046
+ "normalized_score": 0.16452114110609004,
3047
  "metric_key": "mae",
3048
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
3049
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3050
+ "reason": null
3051
  },
3052
  {
3053
  "task_number": 18,
 
3180
  "task_id": "camera_view_sync_retrieval",
3181
  "task_label": "Camera-View Synchronization Retrieval",
3182
  "series_id": "metadata128_simple",
3183
+ "method": "128ep Aligned Simple",
3184
  "status": "unsupported_without_required_target",
3185
  "status_label": "unsupported",
3186
  "scored": false,
 
3190
  "normalized_score": null,
3191
  "metric_key": "mrr",
3192
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
3193
+ "scope": "multi_episode_128_aligned_baseline",
3194
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
3195
  },
3196
  {
 
3198
  "task_id": "camera_view_sync_retrieval",
3199
  "task_label": "Camera-View Synchronization Retrieval",
3200
  "series_id": "metadata128_neural_mlp",
3201
+ "method": "128ep Aligned NN",
3202
  "status": "not_supported_by_metadata_only_package",
3203
  "status_label": "not supported",
3204
  "scored": false,
 
3208
  "normalized_score": null,
3209
  "metric_key": "mrr",
3210
  "source": null,
3211
+ "scope": "multi_episode_128_aligned_baseline",
3212
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
3213
  },
3214
  {
3215
  "task_number": 19,
 
3342
  "task_id": "time_to_transition",
3343
  "task_label": "Time-to-Next-Transition Regression",
3344
  "series_id": "metadata128_simple",
3345
+ "method": "128ep Aligned Simple",
3346
  "status": "scored",
3347
  "status_label": "scored",
3348
  "scored": true,
 
3352
  "normalized_score": 0.016864874132806403,
3353
  "metric_key": "mae",
3354
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
3355
+ "scope": "multi_episode_128_aligned_baseline",
3356
  "reason": null
3357
  },
3358
  {
 
3360
  "task_id": "time_to_transition",
3361
  "task_label": "Time-to-Next-Transition Regression",
3362
  "series_id": "metadata128_neural_mlp",
3363
+ "method": "128ep Aligned NN",
3364
  "status": "scored",
3365
  "status_label": "scored",
3366
  "scored": true,
 
3370
  "normalized_score": 0.25411768748242325,
3371
  "metric_key": "mae",
3372
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
3373
+ "scope": "multi_episode_128_aligned_baseline",
3374
  "reason": null
3375
  },
3376
  {
data/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:09:25+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:54:18+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
data/unified_task_model_radar.json CHANGED
@@ -1,18 +1,18 @@
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 133,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
12
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
13
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
14
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
15
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
16
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
17
  },
18
  "series": [
@@ -64,50 +64,50 @@
64
  },
65
  {
66
  "id": "metadata128_simple",
67
- "label": "128ep Metadata Simple",
68
  "short_label": "128-S",
69
  "color": "#ffd166",
70
- "kind": "partial_128_episode_metadata_baseline",
71
- "scope": "128 selected episodes, JSONL metadata/text only",
72
  "stroke_dasharray": "9 6",
73
- "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
- "scored_task_count": 13,
77
- "covered_task_count": 13,
78
  "proxy_scored_task_count": 0,
79
- "scoreless_task_count": 7,
80
- "unsupported_task_count": 7,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
- "scored": 13,
84
- "unsupported_without_required_target": 7
85
  },
86
- "coverage_fraction": 0.65,
87
  "result_record_fraction": 1.0
88
  },
89
  {
90
  "id": "metadata128_neural_mlp",
91
- "label": "128ep Metadata NN",
92
  "short_label": "128-NN",
93
  "color": "#f472b6",
94
- "kind": "partial_128_episode_metadata_baseline",
95
- "scope": "128 selected episodes, JSONL metadata/text only",
96
  "stroke_dasharray": "3 6",
97
- "method_detail": "128-episode JSONL metadata/text MLP baselines.",
98
  "plotted_as": "colored point overlay",
99
  "result_record_count": 20,
100
- "scored_task_count": 13,
101
- "covered_task_count": 13,
102
  "proxy_scored_task_count": 0,
103
- "scoreless_task_count": 7,
104
- "unsupported_task_count": 7,
105
  "not_evaluated_task_count": 0,
106
  "status_counts": {
107
- "not_supported_by_metadata_only_package": 7,
108
- "scored": 13
109
  },
110
- "coverage_fraction": 0.65,
111
  "result_record_fraction": 1.0
112
  },
113
  {
@@ -301,7 +301,7 @@
301
  "raw": 0.008252821966746326,
302
  "metric_key": "macro_f1",
303
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
304
- "scope": "multi_episode_128_metadata_baseline",
305
  "status": "scored",
306
  "reason": null,
307
  "normalized_score": 0.008252821966746326,
@@ -312,7 +312,7 @@
312
  "raw": 0.004175793689174209,
313
  "metric_key": "macro_f1",
314
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
315
- "scope": "multi_episode_128_metadata_baseline",
316
  "status": "scored",
317
  "reason": null,
318
  "normalized_score": 0.004175793689174209,
@@ -401,7 +401,7 @@
401
  "raw": 0.00019512195121951218,
402
  "metric_key": "macro_f1",
403
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
404
- "scope": "multi_episode_128_metadata_baseline",
405
  "status": "scored",
406
  "reason": null,
407
  "normalized_score": 0.00019512195121951218,
@@ -412,7 +412,7 @@
412
  "raw": 7.207207207207208e-05,
413
  "metric_key": "macro_f1",
414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
415
- "scope": "multi_episode_128_metadata_baseline",
416
  "status": "scored",
417
  "reason": null,
418
  "normalized_score": 7.207207207207208e-05,
@@ -523,7 +523,7 @@
523
  "raw": 0.29652162550029315,
524
  "metric_key": "macro_f1",
525
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
526
- "scope": "multi_episode_128_metadata_baseline",
527
  "status": "scored",
528
  "reason": null,
529
  "normalized_score": 0.29652162550029315,
@@ -534,7 +534,7 @@
534
  "raw": 0.4841733292368365,
535
  "metric_key": "macro_f1",
536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
537
- "scope": "multi_episode_128_metadata_baseline",
538
  "status": "scored",
539
  "reason": null,
540
  "normalized_score": 0.4841733292368365,
@@ -634,7 +634,7 @@
634
  "raw": 0.006514774539765508,
635
  "metric_key": "macro_f1",
636
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
637
- "scope": "multi_episode_128_metadata_baseline",
638
  "status": "scored",
639
  "reason": null,
640
  "normalized_score": 0.006514774539765508,
@@ -645,7 +645,7 @@
645
  "raw": 0.004910507980164745,
646
  "metric_key": "macro_f1",
647
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
648
- "scope": "multi_episode_128_metadata_baseline",
649
  "status": "scored",
650
  "reason": null,
651
  "normalized_score": 0.004910507980164745,
@@ -709,15 +709,26 @@
709
  "status_label": "scored"
710
  },
711
  "metadata128_simple": {
712
- "raw": null,
713
  "metric_key": "mpjpe",
714
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
715
- "scope": "multi_episode_128_metadata_baseline",
716
- "status": "unsupported_without_required_target",
717
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package",
718
- "normalized_score": null,
719
- "raw_text": "n/a",
720
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
721
  },
722
  "raw128_simple": {
723
  "raw": 0.2729249894618988,
@@ -741,17 +752,6 @@
741
  "raw_text": "0.1848",
742
  "status_label": "scored"
743
  },
744
- "metadata128_neural_mlp": {
745
- "raw": null,
746
- "metric_key": "mpjpe",
747
- "source": null,
748
- "scope": "multi_episode_128_metadata_baseline",
749
- "status": "not_supported_by_metadata_only_package",
750
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
751
- "normalized_score": null,
752
- "raw_text": "n/a",
753
- "status_label": "not supported"
754
- },
755
  "qwen3_omni_v6_lora": {
756
  "raw": null,
757
  "metric_key": "mpjpe",
@@ -856,7 +856,7 @@
856
  "raw": 0.4381481308057444,
857
  "metric_key": "macro_f1",
858
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
859
- "scope": "multi_episode_128_metadata_baseline",
860
  "status": "scored",
861
  "reason": null,
862
  "normalized_score": 0.4381481308057444,
@@ -867,7 +867,7 @@
867
  "raw": 0.5682695682695682,
868
  "metric_key": "macro_f1",
869
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
870
- "scope": "multi_episode_128_metadata_baseline",
871
  "status": "scored",
872
  "reason": null,
873
  "normalized_score": 0.5682695682695682,
@@ -956,7 +956,7 @@
956
  "raw": 0.17764578833693304,
957
  "metric_key": "micro_f1",
958
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
959
- "scope": "multi_episode_128_metadata_baseline",
960
  "status": "scored",
961
  "reason": null,
962
  "normalized_score": 0.17764578833693304,
@@ -967,7 +967,7 @@
967
  "raw": 0.18662723837686876,
968
  "metric_key": "micro_f1",
969
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
970
- "scope": "multi_episode_128_metadata_baseline",
971
  "status": "scored",
972
  "reason": null,
973
  "normalized_score": 0.18662723837686876,
@@ -1056,7 +1056,7 @@
1056
  "raw": 0.002332374220713973,
1057
  "metric_key": "mrr",
1058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1059
- "scope": "multi_episode_128_metadata_baseline",
1060
  "status": "scored",
1061
  "reason": null,
1062
  "normalized_score": 0.002332374220713973,
@@ -1067,7 +1067,7 @@
1067
  "raw": 0.008236799389123917,
1068
  "metric_key": "mrr",
1069
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1070
- "scope": "multi_episode_128_metadata_baseline",
1071
  "status": "scored",
1072
  "reason": null,
1073
  "normalized_score": 0.008236799389123917,
@@ -1175,15 +1175,26 @@
1175
  "status_label": "scored"
1176
  },
1177
  "metadata128_simple": {
1178
- "raw": null,
1179
  "metric_key": "mrr",
1180
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1181
- "scope": "multi_episode_128_metadata_baseline",
1182
- "status": "unsupported_without_required_target",
1183
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package",
1184
- "normalized_score": null,
1185
- "raw_text": "n/a",
1186
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
1187
  },
1188
  "raw128_simple": {
1189
  "raw": 0.003459817497059703,
@@ -1207,17 +1218,6 @@
1207
  "raw_text": "0.0025",
1208
  "status_label": "scored"
1209
  },
1210
- "metadata128_neural_mlp": {
1211
- "raw": null,
1212
- "metric_key": "mrr",
1213
- "source": null,
1214
- "scope": "multi_episode_128_metadata_baseline",
1215
- "status": "not_supported_by_metadata_only_package",
1216
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1217
- "normalized_score": null,
1218
- "raw_text": "n/a",
1219
- "status_label": "not supported"
1220
- },
1221
  "cosmos3_super_reasoner": {
1222
  "raw": null,
1223
  "metric_key": "mrr",
@@ -1264,15 +1264,26 @@
1264
  "status_label": "scored"
1265
  },
1266
  "metadata128_simple": {
1267
- "raw": null,
1268
  "metric_key": "r2",
1269
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1270
- "scope": "multi_episode_128_metadata_baseline",
1271
- "status": "unsupported_without_required_target",
1272
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package",
1273
- "normalized_score": null,
1274
- "raw_text": "n/a",
1275
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
1276
  },
1277
  "raw128_simple": {
1278
  "raw": -1.3450960391924882,
@@ -1296,17 +1307,6 @@
1296
  "raw_text": "-1.397",
1297
  "status_label": "scored"
1298
  },
1299
- "metadata128_neural_mlp": {
1300
- "raw": null,
1301
- "metric_key": "r2",
1302
- "source": null,
1303
- "scope": "multi_episode_128_metadata_baseline",
1304
- "status": "not_supported_by_metadata_only_package",
1305
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1306
- "normalized_score": null,
1307
- "raw_text": "n/a",
1308
- "status_label": "not supported"
1309
- },
1310
  "qwen3_omni_v6_lora": {
1311
  "raw": null,
1312
  "metric_key": "r2",
@@ -1389,7 +1389,7 @@
1389
  "raw": 0.4198864140782312,
1390
  "metric_key": "f1",
1391
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1392
- "scope": "multi_episode_128_metadata_baseline",
1393
  "status": "scored",
1394
  "reason": null,
1395
  "normalized_score": 0.4198864140782312,
@@ -1400,7 +1400,7 @@
1400
  "raw": 0.8252408266656923,
1401
  "metric_key": "f1",
1402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1403
- "scope": "multi_episode_128_metadata_baseline",
1404
  "status": "scored",
1405
  "reason": null,
1406
  "normalized_score": 0.8252408266656923,
@@ -1497,15 +1497,26 @@
1497
  "status_label": "scored"
1498
  },
1499
  "metadata128_simple": {
1500
- "raw": null,
1501
  "metric_key": "f1",
1502
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1503
- "scope": "multi_episode_128_metadata_baseline",
1504
- "status": "unsupported_without_required_target",
1505
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone",
1506
- "normalized_score": null,
1507
- "raw_text": "n/a",
1508
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
1509
  },
1510
  "raw128_simple": {
1511
  "raw": 0.4958867673901769,
@@ -1529,17 +1540,6 @@
1529
  "raw_text": "0.8273",
1530
  "status_label": "scored"
1531
  },
1532
- "metadata128_neural_mlp": {
1533
- "raw": null,
1534
- "metric_key": "f1",
1535
- "source": null,
1536
- "scope": "multi_episode_128_metadata_baseline",
1537
- "status": "not_supported_by_metadata_only_package",
1538
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1539
- "normalized_score": null,
1540
- "raw_text": "n/a",
1541
- "status_label": "not supported"
1542
- },
1543
  "cosmos3_super_reasoner": {
1544
  "raw": null,
1545
  "metric_key": "f1",
@@ -1611,7 +1611,7 @@
1611
  "raw": 0.004579592783699693,
1612
  "metric_key": "macro_f1",
1613
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1614
- "scope": "multi_episode_128_metadata_baseline",
1615
  "status": "scored",
1616
  "reason": null,
1617
  "normalized_score": 0.004579592783699693,
@@ -1622,7 +1622,7 @@
1622
  "raw": 0.0029821307969142615,
1623
  "metric_key": "macro_f1",
1624
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1625
- "scope": "multi_episode_128_metadata_baseline",
1626
  "status": "scored",
1627
  "reason": null,
1628
  "normalized_score": 0.0029821307969142615,
@@ -1722,7 +1722,7 @@
1722
  "raw": 0.0001206030150753769,
1723
  "metric_key": "macro_f1",
1724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1725
- "scope": "multi_episode_128_metadata_baseline",
1726
  "status": "scored",
1727
  "reason": null,
1728
  "normalized_score": 0.0001206030150753769,
@@ -1733,7 +1733,7 @@
1733
  "raw": 2.086049543676662e-05,
1734
  "metric_key": "macro_f1",
1735
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1736
- "scope": "multi_episode_128_metadata_baseline",
1737
  "status": "scored",
1738
  "reason": null,
1739
  "normalized_score": 2.086049543676662e-05,
@@ -1822,7 +1822,7 @@
1822
  "raw": null,
1823
  "metric_key": "macro_f1",
1824
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1825
- "scope": "multi_episode_128_metadata_baseline",
1826
  "status": "unsupported_without_required_target",
1827
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1828
  "normalized_score": null,
@@ -1855,9 +1855,9 @@
1855
  "raw": null,
1856
  "metric_key": "macro_f1",
1857
  "source": null,
1858
- "scope": "multi_episode_128_metadata_baseline",
1859
  "status": "not_supported_by_metadata_only_package",
1860
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1861
  "normalized_score": null,
1862
  "raw_text": "n/a",
1863
  "status_label": "not supported"
@@ -1955,7 +1955,7 @@
1955
  "raw": 0.0,
1956
  "metric_key": "macro_f1",
1957
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1958
- "scope": "multi_episode_128_metadata_baseline",
1959
  "status": "scored",
1960
  "reason": null,
1961
  "normalized_score": 0.0,
@@ -1966,7 +1966,7 @@
1966
  "raw": 0.0,
1967
  "metric_key": "macro_f1",
1968
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1969
- "scope": "multi_episode_128_metadata_baseline",
1970
  "status": "scored",
1971
  "reason": null,
1972
  "normalized_score": 0.0,
@@ -2055,7 +2055,7 @@
2055
  "raw": 0.17656983343047333,
2056
  "metric_key": "micro_f1",
2057
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2058
- "scope": "multi_episode_128_metadata_baseline",
2059
  "status": "scored",
2060
  "reason": null,
2061
  "normalized_score": 0.17656983343047333,
@@ -2066,7 +2066,7 @@
2066
  "raw": 0.17418550827844048,
2067
  "metric_key": "micro_f1",
2068
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2069
- "scope": "multi_episode_128_metadata_baseline",
2070
  "status": "scored",
2071
  "reason": null,
2072
  "normalized_score": 0.17418550827844048,
@@ -2152,15 +2152,26 @@
2152
  "status_label": "scored"
2153
  },
2154
  "metadata128_simple": {
2155
- "raw": null,
2156
  "metric_key": "mae",
2157
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
2158
- "scope": "multi_episode_128_metadata_baseline",
2159
- "status": "unsupported_without_required_target",
2160
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
2161
- "normalized_score": null,
2162
- "raw_text": "n/a",
2163
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
2164
  },
2165
  "raw128_simple": {
2166
  "raw": 0.22941437363624573,
@@ -2184,17 +2195,6 @@
2184
  "raw_text": "0.2530",
2185
  "status_label": "scored"
2186
  },
2187
- "metadata128_neural_mlp": {
2188
- "raw": null,
2189
- "metric_key": "mae",
2190
- "source": null,
2191
- "scope": "multi_episode_128_metadata_baseline",
2192
- "status": "not_supported_by_metadata_only_package",
2193
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2194
- "normalized_score": null,
2195
- "raw_text": "n/a",
2196
- "status_label": "not supported"
2197
- },
2198
  "qwen3_omni_v6_lora": {
2199
  "raw": null,
2200
  "metric_key": "mae",
@@ -2266,7 +2266,7 @@
2266
  "raw": null,
2267
  "metric_key": "mrr",
2268
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
2269
- "scope": "multi_episode_128_metadata_baseline",
2270
  "status": "unsupported_without_required_target",
2271
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
2272
  "normalized_score": null,
@@ -2299,9 +2299,9 @@
2299
  "raw": null,
2300
  "metric_key": "mrr",
2301
  "source": null,
2302
- "scope": "multi_episode_128_metadata_baseline",
2303
  "status": "not_supported_by_metadata_only_package",
2304
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2305
  "normalized_score": null,
2306
  "raw_text": "n/a",
2307
  "status_label": "not supported"
@@ -2388,7 +2388,7 @@
2388
  "raw": 624.8108520507812,
2389
  "metric_key": "mae",
2390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
2391
- "scope": "multi_episode_128_metadata_baseline",
2392
  "status": "scored",
2393
  "reason": null,
2394
  "normalized_score": 0.016864874132806403,
@@ -2399,7 +2399,7 @@
2399
  "raw": 41.4664421081543,
2400
  "metric_key": "mae",
2401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
2402
- "scope": "multi_episode_128_metadata_baseline",
2403
  "status": "scored",
2404
  "reason": null,
2405
  "normalized_score": 0.25411768748242325,
@@ -2456,18 +2456,18 @@
2456
  "model_branch_cards": [
2457
  {
2458
  "id": "metadata128_simple",
2459
- "title": "128ep Metadata Simple",
2460
  "status": "a100_rerun_pass",
2461
- "coverage": "20 records / 13 scored JSONL-supported axes",
2462
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2463
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2464
  },
2465
  {
2466
  "id": "metadata128_neural_mlp",
2467
- "title": "128ep Metadata NN",
2468
  "status": "a100_rerun_pass",
2469
- "coverage": "20 records / 13 scored JSONL-supported axes",
2470
- "headline": "compact MLP heads over metadata/text features",
2471
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2472
  },
2473
  {
@@ -2562,7 +2562,7 @@
2562
  "task_id": "timeline_action",
2563
  "task_label": "Action Recognition",
2564
  "series_id": "metadata128_simple",
2565
- "method": "128ep Metadata Simple",
2566
  "status": "scored",
2567
  "status_label": "scored",
2568
  "scored": true,
@@ -2572,7 +2572,7 @@
2572
  "normalized_score": 0.008252821966746326,
2573
  "metric_key": "macro_f1",
2574
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2575
- "scope": "multi_episode_128_metadata_baseline",
2576
  "reason": null
2577
  },
2578
  {
@@ -2580,7 +2580,7 @@
2580
  "task_id": "timeline_action",
2581
  "task_label": "Action Recognition",
2582
  "series_id": "metadata128_neural_mlp",
2583
- "method": "128ep Metadata NN",
2584
  "status": "scored",
2585
  "status_label": "scored",
2586
  "scored": true,
@@ -2590,7 +2590,7 @@
2590
  "normalized_score": 0.004175793689174209,
2591
  "metric_key": "macro_f1",
2592
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2593
- "scope": "multi_episode_128_metadata_baseline",
2594
  "reason": null
2595
  },
2596
  {
@@ -2724,7 +2724,7 @@
2724
  "task_id": "timeline_subtask",
2725
  "task_label": "Procedure Step Recognition",
2726
  "series_id": "metadata128_simple",
2727
- "method": "128ep Metadata Simple",
2728
  "status": "scored",
2729
  "status_label": "scored",
2730
  "scored": true,
@@ -2734,7 +2734,7 @@
2734
  "normalized_score": 0.00019512195121951218,
2735
  "metric_key": "macro_f1",
2736
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2737
- "scope": "multi_episode_128_metadata_baseline",
2738
  "reason": null
2739
  },
2740
  {
@@ -2742,7 +2742,7 @@
2742
  "task_id": "timeline_subtask",
2743
  "task_label": "Procedure Step Recognition",
2744
  "series_id": "metadata128_neural_mlp",
2745
- "method": "128ep Metadata NN",
2746
  "status": "scored",
2747
  "status_label": "scored",
2748
  "scored": true,
@@ -2752,7 +2752,7 @@
2752
  "normalized_score": 7.207207207207208e-05,
2753
  "metric_key": "macro_f1",
2754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2755
- "scope": "multi_episode_128_metadata_baseline",
2756
  "reason": null
2757
  },
2758
  {
@@ -2886,7 +2886,7 @@
2886
  "task_id": "transition_detection",
2887
  "task_label": "Action Boundary Detection",
2888
  "series_id": "metadata128_simple",
2889
- "method": "128ep Metadata Simple",
2890
  "status": "scored",
2891
  "status_label": "scored",
2892
  "scored": true,
@@ -2896,7 +2896,7 @@
2896
  "normalized_score": 0.29652162550029315,
2897
  "metric_key": "macro_f1",
2898
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2899
- "scope": "multi_episode_128_metadata_baseline",
2900
  "reason": null
2901
  },
2902
  {
@@ -2904,7 +2904,7 @@
2904
  "task_id": "transition_detection",
2905
  "task_label": "Action Boundary Detection",
2906
  "series_id": "metadata128_neural_mlp",
2907
- "method": "128ep Metadata NN",
2908
  "status": "scored",
2909
  "status_label": "scored",
2910
  "scored": true,
@@ -2914,7 +2914,7 @@
2914
  "normalized_score": 0.4841733292368365,
2915
  "metric_key": "macro_f1",
2916
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2917
- "scope": "multi_episode_128_metadata_baseline",
2918
  "reason": null
2919
  },
2920
  {
@@ -3048,7 +3048,7 @@
3048
  "task_id": "next_action",
3049
  "task_label": "Next-Action Prediction",
3050
  "series_id": "metadata128_simple",
3051
- "method": "128ep Metadata Simple",
3052
  "status": "scored",
3053
  "status_label": "scored",
3054
  "scored": true,
@@ -3058,7 +3058,7 @@
3058
  "normalized_score": 0.006514774539765508,
3059
  "metric_key": "macro_f1",
3060
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
3061
- "scope": "multi_episode_128_metadata_baseline",
3062
  "reason": null
3063
  },
3064
  {
@@ -3066,7 +3066,7 @@
3066
  "task_id": "next_action",
3067
  "task_label": "Next-Action Prediction",
3068
  "series_id": "metadata128_neural_mlp",
3069
- "method": "128ep Metadata NN",
3070
  "status": "scored",
3071
  "status_label": "scored",
3072
  "scored": true,
@@ -3076,7 +3076,7 @@
3076
  "normalized_score": 0.004910507980164745,
3077
  "metric_key": "macro_f1",
3078
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
3079
- "scope": "multi_episode_128_metadata_baseline",
3080
  "reason": null
3081
  },
3082
  {
@@ -3210,36 +3210,36 @@
3210
  "task_id": "hand_trajectory_forecast",
3211
  "task_label": "Hand Trajectory Forecasting",
3212
  "series_id": "metadata128_simple",
3213
- "method": "128ep Metadata Simple",
3214
- "status": "unsupported_without_required_target",
3215
- "status_label": "unsupported",
3216
- "scored": false,
3217
  "proxy_scored": false,
3218
- "raw": null,
3219
- "raw_text": "n/a",
3220
- "normalized_score": null,
3221
  "metric_key": "mpjpe",
3222
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
3223
- "scope": "multi_episode_128_metadata_baseline",
3224
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package"
3225
  },
3226
  {
3227
  "task_number": 5,
3228
  "task_id": "hand_trajectory_forecast",
3229
  "task_label": "Hand Trajectory Forecasting",
3230
  "series_id": "metadata128_neural_mlp",
3231
- "method": "128ep Metadata NN",
3232
- "status": "not_supported_by_metadata_only_package",
3233
- "status_label": "not supported",
3234
- "scored": false,
3235
  "proxy_scored": false,
3236
- "raw": null,
3237
- "raw_text": "n/a",
3238
- "normalized_score": null,
3239
  "metric_key": "mpjpe",
3240
- "source": null,
3241
- "scope": "multi_episode_128_metadata_baseline",
3242
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3243
  },
3244
  {
3245
  "task_number": 5,
@@ -3372,7 +3372,7 @@
3372
  "task_id": "contact_prediction",
3373
  "task_label": "Contact State Prediction",
3374
  "series_id": "metadata128_simple",
3375
- "method": "128ep Metadata Simple",
3376
  "status": "scored",
3377
  "status_label": "scored",
3378
  "scored": true,
@@ -3382,7 +3382,7 @@
3382
  "normalized_score": 0.4381481308057444,
3383
  "metric_key": "macro_f1",
3384
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
3385
- "scope": "multi_episode_128_metadata_baseline",
3386
  "reason": null
3387
  },
3388
  {
@@ -3390,7 +3390,7 @@
3390
  "task_id": "contact_prediction",
3391
  "task_label": "Contact State Prediction",
3392
  "series_id": "metadata128_neural_mlp",
3393
- "method": "128ep Metadata NN",
3394
  "status": "scored",
3395
  "status_label": "scored",
3396
  "scored": true,
@@ -3400,7 +3400,7 @@
3400
  "normalized_score": 0.5682695682695682,
3401
  "metric_key": "macro_f1",
3402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
3403
- "scope": "multi_episode_128_metadata_baseline",
3404
  "reason": null
3405
  },
3406
  {
@@ -3534,7 +3534,7 @@
3534
  "task_id": "object_relevance",
3535
  "task_label": "Object Relevance Prediction",
3536
  "series_id": "metadata128_simple",
3537
- "method": "128ep Metadata Simple",
3538
  "status": "scored",
3539
  "status_label": "scored",
3540
  "scored": true,
@@ -3544,7 +3544,7 @@
3544
  "normalized_score": 0.17764578833693304,
3545
  "metric_key": "micro_f1",
3546
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
3547
- "scope": "multi_episode_128_metadata_baseline",
3548
  "reason": null
3549
  },
3550
  {
@@ -3552,7 +3552,7 @@
3552
  "task_id": "object_relevance",
3553
  "task_label": "Object Relevance Prediction",
3554
  "series_id": "metadata128_neural_mlp",
3555
- "method": "128ep Metadata NN",
3556
  "status": "scored",
3557
  "status_label": "scored",
3558
  "scored": true,
@@ -3562,7 +3562,7 @@
3562
  "normalized_score": 0.18662723837686876,
3563
  "metric_key": "micro_f1",
3564
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
3565
- "scope": "multi_episode_128_metadata_baseline",
3566
  "reason": null
3567
  },
3568
  {
@@ -3696,7 +3696,7 @@
3696
  "task_id": "caption_grounding",
3697
  "task_label": "Language Grounding",
3698
  "series_id": "metadata128_simple",
3699
- "method": "128ep Metadata Simple",
3700
  "status": "scored",
3701
  "status_label": "scored",
3702
  "scored": true,
@@ -3706,7 +3706,7 @@
3706
  "normalized_score": 0.002332374220713973,
3707
  "metric_key": "mrr",
3708
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
3709
- "scope": "multi_episode_128_metadata_baseline",
3710
  "reason": null
3711
  },
3712
  {
@@ -3714,7 +3714,7 @@
3714
  "task_id": "caption_grounding",
3715
  "task_label": "Language Grounding",
3716
  "series_id": "metadata128_neural_mlp",
3717
- "method": "128ep Metadata NN",
3718
  "status": "scored",
3719
  "status_label": "scored",
3720
  "scored": true,
@@ -3724,7 +3724,7 @@
3724
  "normalized_score": 0.008236799389123917,
3725
  "metric_key": "mrr",
3726
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
3727
- "scope": "multi_episode_128_metadata_baseline",
3728
  "reason": null
3729
  },
3730
  {
@@ -3858,36 +3858,36 @@
3858
  "task_id": "cross_modal_retrieval",
3859
  "task_label": "Cross-Modal Retrieval",
3860
  "series_id": "metadata128_simple",
3861
- "method": "128ep Metadata Simple",
3862
- "status": "unsupported_without_required_target",
3863
- "status_label": "unsupported",
3864
- "scored": false,
3865
  "proxy_scored": false,
3866
- "raw": null,
3867
- "raw_text": "n/a",
3868
- "normalized_score": null,
3869
  "metric_key": "mrr",
3870
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3871
- "scope": "multi_episode_128_metadata_baseline",
3872
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package"
3873
  },
3874
  {
3875
  "task_number": 9,
3876
  "task_id": "cross_modal_retrieval",
3877
  "task_label": "Cross-Modal Retrieval",
3878
  "series_id": "metadata128_neural_mlp",
3879
- "method": "128ep Metadata NN",
3880
- "status": "not_supported_by_metadata_only_package",
3881
- "status_label": "not supported",
3882
- "scored": false,
3883
  "proxy_scored": false,
3884
- "raw": null,
3885
- "raw_text": "n/a",
3886
- "normalized_score": null,
3887
  "metric_key": "mrr",
3888
- "source": null,
3889
- "scope": "multi_episode_128_metadata_baseline",
3890
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3891
  },
3892
  {
3893
  "task_number": 9,
@@ -4020,36 +4020,36 @@
4020
  "task_id": "modality_reconstruction",
4021
  "task_label": "Cross-Modal Reconstruction",
4022
  "series_id": "metadata128_simple",
4023
- "method": "128ep Metadata Simple",
4024
- "status": "unsupported_without_required_target",
4025
- "status_label": "unsupported",
4026
- "scored": false,
4027
  "proxy_scored": false,
4028
- "raw": null,
4029
- "raw_text": "n/a",
4030
- "normalized_score": null,
4031
  "metric_key": "r2",
4032
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
4033
- "scope": "multi_episode_128_metadata_baseline",
4034
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package"
4035
  },
4036
  {
4037
  "task_number": 10,
4038
  "task_id": "modality_reconstruction",
4039
  "task_label": "Cross-Modal Reconstruction",
4040
  "series_id": "metadata128_neural_mlp",
4041
- "method": "128ep Metadata NN",
4042
- "status": "not_supported_by_metadata_only_package",
4043
- "status_label": "not supported",
4044
- "scored": false,
4045
  "proxy_scored": false,
4046
- "raw": null,
4047
- "raw_text": "n/a",
4048
- "normalized_score": null,
4049
  "metric_key": "r2",
4050
- "source": null,
4051
- "scope": "multi_episode_128_metadata_baseline",
4052
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4053
  },
4054
  {
4055
  "task_number": 10,
@@ -4182,7 +4182,7 @@
4182
  "task_id": "temporal_order",
4183
  "task_label": "Temporal Order Verification",
4184
  "series_id": "metadata128_simple",
4185
- "method": "128ep Metadata Simple",
4186
  "status": "scored",
4187
  "status_label": "scored",
4188
  "scored": true,
@@ -4192,7 +4192,7 @@
4192
  "normalized_score": 0.4198864140782312,
4193
  "metric_key": "f1",
4194
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
4195
- "scope": "multi_episode_128_metadata_baseline",
4196
  "reason": null
4197
  },
4198
  {
@@ -4200,7 +4200,7 @@
4200
  "task_id": "temporal_order",
4201
  "task_label": "Temporal Order Verification",
4202
  "series_id": "metadata128_neural_mlp",
4203
- "method": "128ep Metadata NN",
4204
  "status": "scored",
4205
  "status_label": "scored",
4206
  "scored": true,
@@ -4210,7 +4210,7 @@
4210
  "normalized_score": 0.8252408266656923,
4211
  "metric_key": "f1",
4212
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
4213
- "scope": "multi_episode_128_metadata_baseline",
4214
  "reason": null
4215
  },
4216
  {
@@ -4344,36 +4344,36 @@
4344
  "task_id": "misalignment_detection",
4345
  "task_label": "Multimodal Synchronization Detection",
4346
  "series_id": "metadata128_simple",
4347
- "method": "128ep Metadata Simple",
4348
- "status": "unsupported_without_required_target",
4349
- "status_label": "unsupported",
4350
- "scored": false,
4351
  "proxy_scored": false,
4352
- "raw": null,
4353
- "raw_text": "n/a",
4354
- "normalized_score": null,
4355
  "metric_key": "f1",
4356
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
4357
- "scope": "multi_episode_128_metadata_baseline",
4358
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone"
4359
  },
4360
  {
4361
  "task_number": 12,
4362
  "task_id": "misalignment_detection",
4363
  "task_label": "Multimodal Synchronization Detection",
4364
  "series_id": "metadata128_neural_mlp",
4365
- "method": "128ep Metadata NN",
4366
- "status": "not_supported_by_metadata_only_package",
4367
- "status_label": "not supported",
4368
- "scored": false,
4369
  "proxy_scored": false,
4370
- "raw": null,
4371
- "raw_text": "n/a",
4372
- "normalized_score": null,
4373
  "metric_key": "f1",
4374
- "source": null,
4375
- "scope": "multi_episode_128_metadata_baseline",
4376
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4377
  },
4378
  {
4379
  "task_number": 12,
@@ -4506,7 +4506,7 @@
4506
  "task_id": "long_horizon_next_action",
4507
  "task_label": "Long-Horizon Next-Action Forecasting",
4508
  "series_id": "metadata128_simple",
4509
- "method": "128ep Metadata Simple",
4510
  "status": "scored",
4511
  "status_label": "scored",
4512
  "scored": true,
@@ -4516,7 +4516,7 @@
4516
  "normalized_score": 0.004579592783699693,
4517
  "metric_key": "macro_f1",
4518
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
4519
- "scope": "multi_episode_128_metadata_baseline",
4520
  "reason": null
4521
  },
4522
  {
@@ -4524,7 +4524,7 @@
4524
  "task_id": "long_horizon_next_action",
4525
  "task_label": "Long-Horizon Next-Action Forecasting",
4526
  "series_id": "metadata128_neural_mlp",
4527
- "method": "128ep Metadata NN",
4528
  "status": "scored",
4529
  "status_label": "scored",
4530
  "scored": true,
@@ -4534,7 +4534,7 @@
4534
  "normalized_score": 0.0029821307969142615,
4535
  "metric_key": "macro_f1",
4536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
4537
- "scope": "multi_episode_128_metadata_baseline",
4538
  "reason": null
4539
  },
4540
  {
@@ -4668,7 +4668,7 @@
4668
  "task_id": "next_subtask_forecast",
4669
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4670
  "series_id": "metadata128_simple",
4671
- "method": "128ep Metadata Simple",
4672
  "status": "scored",
4673
  "status_label": "scored",
4674
  "scored": true,
@@ -4678,7 +4678,7 @@
4678
  "normalized_score": 0.0001206030150753769,
4679
  "metric_key": "macro_f1",
4680
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
4681
- "scope": "multi_episode_128_metadata_baseline",
4682
  "reason": null
4683
  },
4684
  {
@@ -4686,7 +4686,7 @@
4686
  "task_id": "next_subtask_forecast",
4687
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4688
  "series_id": "metadata128_neural_mlp",
4689
- "method": "128ep Metadata NN",
4690
  "status": "scored",
4691
  "status_label": "scored",
4692
  "scored": true,
@@ -4696,7 +4696,7 @@
4696
  "normalized_score": 2.086049543676662e-05,
4697
  "metric_key": "macro_f1",
4698
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
4699
- "scope": "multi_episode_128_metadata_baseline",
4700
  "reason": null
4701
  },
4702
  {
@@ -4830,7 +4830,7 @@
4830
  "task_id": "interaction_text_prediction",
4831
  "task_label": "Interaction Text Prediction",
4832
  "series_id": "metadata128_simple",
4833
- "method": "128ep Metadata Simple",
4834
  "status": "unsupported_without_required_target",
4835
  "status_label": "unsupported",
4836
  "scored": false,
@@ -4840,7 +4840,7 @@
4840
  "normalized_score": null,
4841
  "metric_key": "macro_f1",
4842
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
4843
- "scope": "multi_episode_128_metadata_baseline",
4844
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
4845
  },
4846
  {
@@ -4848,7 +4848,7 @@
4848
  "task_id": "interaction_text_prediction",
4849
  "task_label": "Interaction Text Prediction",
4850
  "series_id": "metadata128_neural_mlp",
4851
- "method": "128ep Metadata NN",
4852
  "status": "not_supported_by_metadata_only_package",
4853
  "status_label": "not supported",
4854
  "scored": false,
@@ -4858,8 +4858,8 @@
4858
  "normalized_score": null,
4859
  "metric_key": "macro_f1",
4860
  "source": null,
4861
- "scope": "multi_episode_128_metadata_baseline",
4862
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4863
  },
4864
  {
4865
  "task_number": 15,
@@ -4992,7 +4992,7 @@
4992
  "task_id": "action_object_relation",
4993
  "task_label": "Action-Object Relation Prediction",
4994
  "series_id": "metadata128_simple",
4995
- "method": "128ep Metadata Simple",
4996
  "status": "scored",
4997
  "status_label": "scored",
4998
  "scored": true,
@@ -5002,7 +5002,7 @@
5002
  "normalized_score": 0.0,
5003
  "metric_key": "macro_f1",
5004
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
5005
- "scope": "multi_episode_128_metadata_baseline",
5006
  "reason": null
5007
  },
5008
  {
@@ -5010,7 +5010,7 @@
5010
  "task_id": "action_object_relation",
5011
  "task_label": "Action-Object Relation Prediction",
5012
  "series_id": "metadata128_neural_mlp",
5013
- "method": "128ep Metadata NN",
5014
  "status": "scored",
5015
  "status_label": "scored",
5016
  "scored": true,
@@ -5020,7 +5020,7 @@
5020
  "normalized_score": 0.0,
5021
  "metric_key": "macro_f1",
5022
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
5023
- "scope": "multi_episode_128_metadata_baseline",
5024
  "reason": null
5025
  },
5026
  {
@@ -5154,7 +5154,7 @@
5154
  "task_id": "object_set_forecast",
5155
  "task_label": "Future Object-Set Forecasting",
5156
  "series_id": "metadata128_simple",
5157
- "method": "128ep Metadata Simple",
5158
  "status": "scored",
5159
  "status_label": "scored",
5160
  "scored": true,
@@ -5164,7 +5164,7 @@
5164
  "normalized_score": 0.17656983343047333,
5165
  "metric_key": "micro_f1",
5166
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
5167
- "scope": "multi_episode_128_metadata_baseline",
5168
  "reason": null
5169
  },
5170
  {
@@ -5172,7 +5172,7 @@
5172
  "task_id": "object_set_forecast",
5173
  "task_label": "Future Object-Set Forecasting",
5174
  "series_id": "metadata128_neural_mlp",
5175
- "method": "128ep Metadata NN",
5176
  "status": "scored",
5177
  "status_label": "scored",
5178
  "scored": true,
@@ -5182,7 +5182,7 @@
5182
  "normalized_score": 0.17418550827844048,
5183
  "metric_key": "micro_f1",
5184
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
5185
- "scope": "multi_episode_128_metadata_baseline",
5186
  "reason": null
5187
  },
5188
  {
@@ -5316,36 +5316,36 @@
5316
  "task_id": "imu_to_hand_pose",
5317
  "task_label": "IMU-to-Hand Pose Reconstruction",
5318
  "series_id": "metadata128_simple",
5319
- "method": "128ep Metadata Simple",
5320
- "status": "unsupported_without_required_target",
5321
- "status_label": "unsupported",
5322
- "scored": false,
5323
  "proxy_scored": false,
5324
- "raw": null,
5325
- "raw_text": "n/a",
5326
- "normalized_score": null,
5327
  "metric_key": "mae",
5328
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
5329
- "scope": "multi_episode_128_metadata_baseline",
5330
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
5331
  },
5332
  {
5333
  "task_number": 18,
5334
  "task_id": "imu_to_hand_pose",
5335
  "task_label": "IMU-to-Hand Pose Reconstruction",
5336
  "series_id": "metadata128_neural_mlp",
5337
- "method": "128ep Metadata NN",
5338
- "status": "not_supported_by_metadata_only_package",
5339
- "status_label": "not supported",
5340
- "scored": false,
5341
  "proxy_scored": false,
5342
- "raw": null,
5343
- "raw_text": "n/a",
5344
- "normalized_score": null,
5345
  "metric_key": "mae",
5346
- "source": null,
5347
- "scope": "multi_episode_128_metadata_baseline",
5348
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5349
  },
5350
  {
5351
  "task_number": 18,
@@ -5478,7 +5478,7 @@
5478
  "task_id": "camera_view_sync_retrieval",
5479
  "task_label": "Camera-View Synchronization Retrieval",
5480
  "series_id": "metadata128_simple",
5481
- "method": "128ep Metadata Simple",
5482
  "status": "unsupported_without_required_target",
5483
  "status_label": "unsupported",
5484
  "scored": false,
@@ -5488,7 +5488,7 @@
5488
  "normalized_score": null,
5489
  "metric_key": "mrr",
5490
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
5491
- "scope": "multi_episode_128_metadata_baseline",
5492
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
5493
  },
5494
  {
@@ -5496,7 +5496,7 @@
5496
  "task_id": "camera_view_sync_retrieval",
5497
  "task_label": "Camera-View Synchronization Retrieval",
5498
  "series_id": "metadata128_neural_mlp",
5499
- "method": "128ep Metadata NN",
5500
  "status": "not_supported_by_metadata_only_package",
5501
  "status_label": "not supported",
5502
  "scored": false,
@@ -5506,8 +5506,8 @@
5506
  "normalized_score": null,
5507
  "metric_key": "mrr",
5508
  "source": null,
5509
- "scope": "multi_episode_128_metadata_baseline",
5510
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5511
  },
5512
  {
5513
  "task_number": 19,
@@ -5640,7 +5640,7 @@
5640
  "task_id": "time_to_transition",
5641
  "task_label": "Time-to-Next-Transition Regression",
5642
  "series_id": "metadata128_simple",
5643
- "method": "128ep Metadata Simple",
5644
  "status": "scored",
5645
  "status_label": "scored",
5646
  "scored": true,
@@ -5650,7 +5650,7 @@
5650
  "normalized_score": 0.016864874132806403,
5651
  "metric_key": "mae",
5652
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
5653
- "scope": "multi_episode_128_metadata_baseline",
5654
  "reason": null
5655
  },
5656
  {
@@ -5658,7 +5658,7 @@
5658
  "task_id": "time_to_transition",
5659
  "task_label": "Time-to-Next-Transition Regression",
5660
  "series_id": "metadata128_neural_mlp",
5661
- "method": "128ep Metadata NN",
5662
  "status": "scored",
5663
  "status_label": "scored",
5664
  "scored": true,
@@ -5668,7 +5668,7 @@
5668
  "normalized_score": 0.25411768748242325,
5669
  "metric_key": "mae",
5670
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
5671
- "scope": "multi_episode_128_metadata_baseline",
5672
  "reason": null
5673
  },
5674
  {
 
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 143,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
12
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
13
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
14
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
15
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
16
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
17
  },
18
  "series": [
 
64
  },
65
  {
66
  "id": "metadata128_simple",
67
+ "label": "128ep Aligned Simple",
68
  "short_label": "128-S",
69
  "color": "#ffd166",
70
+ "kind": "partial_128_episode_aligned_baseline",
71
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
72
  "stroke_dasharray": "9 6",
73
+ "method_detail": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
+ "scored_task_count": 18,
77
+ "covered_task_count": 18,
78
  "proxy_scored_task_count": 0,
79
+ "scoreless_task_count": 2,
80
+ "unsupported_task_count": 2,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
+ "scored": 18,
84
+ "unsupported_without_required_target": 2
85
  },
86
+ "coverage_fraction": 0.9,
87
  "result_record_fraction": 1.0
88
  },
89
  {
90
  "id": "metadata128_neural_mlp",
91
+ "label": "128ep Aligned NN",
92
  "short_label": "128-NN",
93
  "color": "#f472b6",
94
+ "kind": "partial_128_episode_aligned_baseline",
95
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
96
  "stroke_dasharray": "3 6",
97
+ "method_detail": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
98
  "plotted_as": "colored point overlay",
99
  "result_record_count": 20,
100
+ "scored_task_count": 18,
101
+ "covered_task_count": 18,
102
  "proxy_scored_task_count": 0,
103
+ "scoreless_task_count": 2,
104
+ "unsupported_task_count": 2,
105
  "not_evaluated_task_count": 0,
106
  "status_counts": {
107
+ "not_supported_by_metadata_only_package": 2,
108
+ "scored": 18
109
  },
110
+ "coverage_fraction": 0.9,
111
  "result_record_fraction": 1.0
112
  },
113
  {
 
301
  "raw": 0.008252821966746326,
302
  "metric_key": "macro_f1",
303
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
304
+ "scope": "multi_episode_128_aligned_baseline",
305
  "status": "scored",
306
  "reason": null,
307
  "normalized_score": 0.008252821966746326,
 
312
  "raw": 0.004175793689174209,
313
  "metric_key": "macro_f1",
314
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
315
+ "scope": "multi_episode_128_aligned_baseline",
316
  "status": "scored",
317
  "reason": null,
318
  "normalized_score": 0.004175793689174209,
 
401
  "raw": 0.00019512195121951218,
402
  "metric_key": "macro_f1",
403
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
404
+ "scope": "multi_episode_128_aligned_baseline",
405
  "status": "scored",
406
  "reason": null,
407
  "normalized_score": 0.00019512195121951218,
 
412
  "raw": 7.207207207207208e-05,
413
  "metric_key": "macro_f1",
414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
415
+ "scope": "multi_episode_128_aligned_baseline",
416
  "status": "scored",
417
  "reason": null,
418
  "normalized_score": 7.207207207207208e-05,
 
523
  "raw": 0.29652162550029315,
524
  "metric_key": "macro_f1",
525
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
526
+ "scope": "multi_episode_128_aligned_baseline",
527
  "status": "scored",
528
  "reason": null,
529
  "normalized_score": 0.29652162550029315,
 
534
  "raw": 0.4841733292368365,
535
  "metric_key": "macro_f1",
536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
537
+ "scope": "multi_episode_128_aligned_baseline",
538
  "status": "scored",
539
  "reason": null,
540
  "normalized_score": 0.4841733292368365,
 
634
  "raw": 0.006514774539765508,
635
  "metric_key": "macro_f1",
636
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
637
+ "scope": "multi_episode_128_aligned_baseline",
638
  "status": "scored",
639
  "reason": null,
640
  "normalized_score": 0.006514774539765508,
 
645
  "raw": 0.004910507980164745,
646
  "metric_key": "macro_f1",
647
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
648
+ "scope": "multi_episode_128_aligned_baseline",
649
  "status": "scored",
650
  "reason": null,
651
  "normalized_score": 0.004910507980164745,
 
709
  "status_label": "scored"
710
  },
711
  "metadata128_simple": {
712
+ "raw": 8.817333221435547,
713
  "metric_key": "mpjpe",
714
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
715
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
716
+ "status": "scored",
717
+ "reason": null,
718
+ "normalized_score": 0.012231610603598841,
719
+ "raw_text": "8.817",
720
+ "status_label": "scored"
721
+ },
722
+ "metadata128_neural_mlp": {
723
+ "raw": 0.429434210062027,
724
+ "metric_key": "mpjpe",
725
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
726
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
727
+ "status": "scored",
728
+ "reason": null,
729
+ "normalized_score": 0.25114484128127007,
730
+ "raw_text": "0.4294",
731
+ "status_label": "scored"
732
  },
733
  "raw128_simple": {
734
  "raw": 0.2729249894618988,
 
752
  "raw_text": "0.1848",
753
  "status_label": "scored"
754
  },
 
 
 
 
 
 
 
 
 
 
 
755
  "qwen3_omni_v6_lora": {
756
  "raw": null,
757
  "metric_key": "mpjpe",
 
856
  "raw": 0.4381481308057444,
857
  "metric_key": "macro_f1",
858
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
859
+ "scope": "multi_episode_128_aligned_baseline",
860
  "status": "scored",
861
  "reason": null,
862
  "normalized_score": 0.4381481308057444,
 
867
  "raw": 0.5682695682695682,
868
  "metric_key": "macro_f1",
869
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
870
+ "scope": "multi_episode_128_aligned_baseline",
871
  "status": "scored",
872
  "reason": null,
873
  "normalized_score": 0.5682695682695682,
 
956
  "raw": 0.17764578833693304,
957
  "metric_key": "micro_f1",
958
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
959
+ "scope": "multi_episode_128_aligned_baseline",
960
  "status": "scored",
961
  "reason": null,
962
  "normalized_score": 0.17764578833693304,
 
967
  "raw": 0.18662723837686876,
968
  "metric_key": "micro_f1",
969
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
970
+ "scope": "multi_episode_128_aligned_baseline",
971
  "status": "scored",
972
  "reason": null,
973
  "normalized_score": 0.18662723837686876,
 
1056
  "raw": 0.002332374220713973,
1057
  "metric_key": "mrr",
1058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1059
+ "scope": "multi_episode_128_aligned_baseline",
1060
  "status": "scored",
1061
  "reason": null,
1062
  "normalized_score": 0.002332374220713973,
 
1067
  "raw": 0.008236799389123917,
1068
  "metric_key": "mrr",
1069
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1070
+ "scope": "multi_episode_128_aligned_baseline",
1071
  "status": "scored",
1072
  "reason": null,
1073
  "normalized_score": 0.008236799389123917,
 
1175
  "status_label": "scored"
1176
  },
1177
  "metadata128_simple": {
1178
+ "raw": 0.002587692579254508,
1179
  "metric_key": "mrr",
1180
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1181
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1182
+ "status": "scored",
1183
+ "reason": null,
1184
+ "normalized_score": 0.002587692579254508,
1185
+ "raw_text": "0.0026",
1186
+ "status_label": "scored"
1187
+ },
1188
+ "metadata128_neural_mlp": {
1189
+ "raw": 0.0026067993603646755,
1190
+ "metric_key": "mrr",
1191
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
1192
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1193
+ "status": "scored",
1194
+ "reason": null,
1195
+ "normalized_score": 0.0026067993603646755,
1196
+ "raw_text": "0.0026",
1197
+ "status_label": "scored"
1198
  },
1199
  "raw128_simple": {
1200
  "raw": 0.003459817497059703,
 
1218
  "raw_text": "0.0025",
1219
  "status_label": "scored"
1220
  },
 
 
 
 
 
 
 
 
 
 
 
1221
  "cosmos3_super_reasoner": {
1222
  "raw": null,
1223
  "metric_key": "mrr",
 
1264
  "status_label": "scored"
1265
  },
1266
  "metadata128_simple": {
1267
+ "raw": -190.66106203944798,
1268
  "metric_key": "r2",
1269
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1270
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1271
+ "status": "scored",
1272
+ "reason": null,
1273
+ "normalized_score": 0.0,
1274
+ "raw_text": "-190.66",
1275
+ "status_label": "scored"
1276
+ },
1277
+ "metadata128_neural_mlp": {
1278
+ "raw": -0.43481132003942147,
1279
+ "metric_key": "r2",
1280
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
1281
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1282
+ "status": "scored",
1283
+ "reason": null,
1284
+ "normalized_score": 0.0,
1285
+ "raw_text": "-0.4348",
1286
+ "status_label": "scored"
1287
  },
1288
  "raw128_simple": {
1289
  "raw": -1.3450960391924882,
 
1307
  "raw_text": "-1.397",
1308
  "status_label": "scored"
1309
  },
 
 
 
 
 
 
 
 
 
 
 
1310
  "qwen3_omni_v6_lora": {
1311
  "raw": null,
1312
  "metric_key": "r2",
 
1389
  "raw": 0.4198864140782312,
1390
  "metric_key": "f1",
1391
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1392
+ "scope": "multi_episode_128_aligned_baseline",
1393
  "status": "scored",
1394
  "reason": null,
1395
  "normalized_score": 0.4198864140782312,
 
1400
  "raw": 0.8252408266656923,
1401
  "metric_key": "f1",
1402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1403
+ "scope": "multi_episode_128_aligned_baseline",
1404
  "status": "scored",
1405
  "reason": null,
1406
  "normalized_score": 0.8252408266656923,
 
1497
  "status_label": "scored"
1498
  },
1499
  "metadata128_simple": {
1500
+ "raw": 0.49980060227663614,
1501
  "metric_key": "f1",
1502
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1503
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1504
+ "status": "scored",
1505
+ "reason": null,
1506
+ "normalized_score": 0.49980060227663614,
1507
+ "raw_text": "0.4998",
1508
+ "status_label": "scored"
1509
+ },
1510
+ "metadata128_neural_mlp": {
1511
+ "raw": 0.7773773780941162,
1512
+ "metric_key": "f1",
1513
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
1514
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1515
+ "status": "scored",
1516
+ "reason": null,
1517
+ "normalized_score": 0.7773773780941162,
1518
+ "raw_text": "0.7774",
1519
+ "status_label": "scored"
1520
  },
1521
  "raw128_simple": {
1522
  "raw": 0.4958867673901769,
 
1540
  "raw_text": "0.8273",
1541
  "status_label": "scored"
1542
  },
 
 
 
 
 
 
 
 
 
 
 
1543
  "cosmos3_super_reasoner": {
1544
  "raw": null,
1545
  "metric_key": "f1",
 
1611
  "raw": 0.004579592783699693,
1612
  "metric_key": "macro_f1",
1613
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1614
+ "scope": "multi_episode_128_aligned_baseline",
1615
  "status": "scored",
1616
  "reason": null,
1617
  "normalized_score": 0.004579592783699693,
 
1622
  "raw": 0.0029821307969142615,
1623
  "metric_key": "macro_f1",
1624
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1625
+ "scope": "multi_episode_128_aligned_baseline",
1626
  "status": "scored",
1627
  "reason": null,
1628
  "normalized_score": 0.0029821307969142615,
 
1722
  "raw": 0.0001206030150753769,
1723
  "metric_key": "macro_f1",
1724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1725
+ "scope": "multi_episode_128_aligned_baseline",
1726
  "status": "scored",
1727
  "reason": null,
1728
  "normalized_score": 0.0001206030150753769,
 
1733
  "raw": 2.086049543676662e-05,
1734
  "metric_key": "macro_f1",
1735
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1736
+ "scope": "multi_episode_128_aligned_baseline",
1737
  "status": "scored",
1738
  "reason": null,
1739
  "normalized_score": 2.086049543676662e-05,
 
1822
  "raw": null,
1823
  "metric_key": "macro_f1",
1824
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1825
+ "scope": "multi_episode_128_aligned_baseline",
1826
  "status": "unsupported_without_required_target",
1827
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1828
  "normalized_score": null,
 
1855
  "raw": null,
1856
  "metric_key": "macro_f1",
1857
  "source": null,
1858
+ "scope": "multi_episode_128_aligned_baseline",
1859
  "status": "not_supported_by_metadata_only_package",
1860
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1861
  "normalized_score": null,
1862
  "raw_text": "n/a",
1863
  "status_label": "not supported"
 
1955
  "raw": 0.0,
1956
  "metric_key": "macro_f1",
1957
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1958
+ "scope": "multi_episode_128_aligned_baseline",
1959
  "status": "scored",
1960
  "reason": null,
1961
  "normalized_score": 0.0,
 
1966
  "raw": 0.0,
1967
  "metric_key": "macro_f1",
1968
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1969
+ "scope": "multi_episode_128_aligned_baseline",
1970
  "status": "scored",
1971
  "reason": null,
1972
  "normalized_score": 0.0,
 
2055
  "raw": 0.17656983343047333,
2056
  "metric_key": "micro_f1",
2057
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2058
+ "scope": "multi_episode_128_aligned_baseline",
2059
  "status": "scored",
2060
  "reason": null,
2061
  "normalized_score": 0.17656983343047333,
 
2066
  "raw": 0.17418550827844048,
2067
  "metric_key": "micro_f1",
2068
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2069
+ "scope": "multi_episode_128_aligned_baseline",
2070
  "status": "scored",
2071
  "reason": null,
2072
  "normalized_score": 0.17418550827844048,
 
2152
  "status_label": "scored"
2153
  },
2154
  "metadata128_simple": {
2155
+ "raw": 0.2294670194387436,
2156
  "metric_key": "mae",
2157
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
2158
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2159
+ "status": "scored",
2160
+ "reason": null,
2161
+ "normalized_score": 0.18324815505876868,
2162
+ "raw_text": "0.2295",
2163
+ "status_label": "scored"
2164
+ },
2165
+ "metadata128_neural_mlp": {
2166
+ "raw": 0.2555866539478302,
2167
+ "metric_key": "mae",
2168
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
2169
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2170
+ "status": "scored",
2171
+ "reason": null,
2172
+ "normalized_score": 0.16452114110609004,
2173
+ "raw_text": "0.2556",
2174
+ "status_label": "scored"
2175
  },
2176
  "raw128_simple": {
2177
  "raw": 0.22941437363624573,
 
2195
  "raw_text": "0.2530",
2196
  "status_label": "scored"
2197
  },
 
 
 
 
 
 
 
 
 
 
 
2198
  "qwen3_omni_v6_lora": {
2199
  "raw": null,
2200
  "metric_key": "mae",
 
2266
  "raw": null,
2267
  "metric_key": "mrr",
2268
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
2269
+ "scope": "multi_episode_128_aligned_baseline",
2270
  "status": "unsupported_without_required_target",
2271
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
2272
  "normalized_score": null,
 
2299
  "raw": null,
2300
  "metric_key": "mrr",
2301
  "source": null,
2302
+ "scope": "multi_episode_128_aligned_baseline",
2303
  "status": "not_supported_by_metadata_only_package",
2304
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
2305
  "normalized_score": null,
2306
  "raw_text": "n/a",
2307
  "status_label": "not supported"
 
2388
  "raw": 624.8108520507812,
2389
  "metric_key": "mae",
2390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
2391
+ "scope": "multi_episode_128_aligned_baseline",
2392
  "status": "scored",
2393
  "reason": null,
2394
  "normalized_score": 0.016864874132806403,
 
2399
  "raw": 41.4664421081543,
2400
  "metric_key": "mae",
2401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
2402
+ "scope": "multi_episode_128_aligned_baseline",
2403
  "status": "scored",
2404
  "reason": null,
2405
  "normalized_score": 0.25411768748242325,
 
2456
  "model_branch_cards": [
2457
  {
2458
  "id": "metadata128_simple",
2459
+ "title": "128ep Aligned Simple",
2460
  "status": "a100_rerun_pass",
2461
+ "coverage": "20 records / 18 scored aligned axes",
2462
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2463
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2464
  },
2465
  {
2466
  "id": "metadata128_neural_mlp",
2467
+ "title": "128ep Aligned NN",
2468
  "status": "a100_rerun_pass",
2469
+ "coverage": "20 records / 18 scored aligned axes",
2470
+ "headline": "compact MLP heads over metadata/text and staged block features",
2471
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2472
  },
2473
  {
 
2562
  "task_id": "timeline_action",
2563
  "task_label": "Action Recognition",
2564
  "series_id": "metadata128_simple",
2565
+ "method": "128ep Aligned Simple",
2566
  "status": "scored",
2567
  "status_label": "scored",
2568
  "scored": true,
 
2572
  "normalized_score": 0.008252821966746326,
2573
  "metric_key": "macro_f1",
2574
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2575
+ "scope": "multi_episode_128_aligned_baseline",
2576
  "reason": null
2577
  },
2578
  {
 
2580
  "task_id": "timeline_action",
2581
  "task_label": "Action Recognition",
2582
  "series_id": "metadata128_neural_mlp",
2583
+ "method": "128ep Aligned NN",
2584
  "status": "scored",
2585
  "status_label": "scored",
2586
  "scored": true,
 
2590
  "normalized_score": 0.004175793689174209,
2591
  "metric_key": "macro_f1",
2592
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2593
+ "scope": "multi_episode_128_aligned_baseline",
2594
  "reason": null
2595
  },
2596
  {
 
2724
  "task_id": "timeline_subtask",
2725
  "task_label": "Procedure Step Recognition",
2726
  "series_id": "metadata128_simple",
2727
+ "method": "128ep Aligned Simple",
2728
  "status": "scored",
2729
  "status_label": "scored",
2730
  "scored": true,
 
2734
  "normalized_score": 0.00019512195121951218,
2735
  "metric_key": "macro_f1",
2736
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2737
+ "scope": "multi_episode_128_aligned_baseline",
2738
  "reason": null
2739
  },
2740
  {
 
2742
  "task_id": "timeline_subtask",
2743
  "task_label": "Procedure Step Recognition",
2744
  "series_id": "metadata128_neural_mlp",
2745
+ "method": "128ep Aligned NN",
2746
  "status": "scored",
2747
  "status_label": "scored",
2748
  "scored": true,
 
2752
  "normalized_score": 7.207207207207208e-05,
2753
  "metric_key": "macro_f1",
2754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2755
+ "scope": "multi_episode_128_aligned_baseline",
2756
  "reason": null
2757
  },
2758
  {
 
2886
  "task_id": "transition_detection",
2887
  "task_label": "Action Boundary Detection",
2888
  "series_id": "metadata128_simple",
2889
+ "method": "128ep Aligned Simple",
2890
  "status": "scored",
2891
  "status_label": "scored",
2892
  "scored": true,
 
2896
  "normalized_score": 0.29652162550029315,
2897
  "metric_key": "macro_f1",
2898
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2899
+ "scope": "multi_episode_128_aligned_baseline",
2900
  "reason": null
2901
  },
2902
  {
 
2904
  "task_id": "transition_detection",
2905
  "task_label": "Action Boundary Detection",
2906
  "series_id": "metadata128_neural_mlp",
2907
+ "method": "128ep Aligned NN",
2908
  "status": "scored",
2909
  "status_label": "scored",
2910
  "scored": true,
 
2914
  "normalized_score": 0.4841733292368365,
2915
  "metric_key": "macro_f1",
2916
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2917
+ "scope": "multi_episode_128_aligned_baseline",
2918
  "reason": null
2919
  },
2920
  {
 
3048
  "task_id": "next_action",
3049
  "task_label": "Next-Action Prediction",
3050
  "series_id": "metadata128_simple",
3051
+ "method": "128ep Aligned Simple",
3052
  "status": "scored",
3053
  "status_label": "scored",
3054
  "scored": true,
 
3058
  "normalized_score": 0.006514774539765508,
3059
  "metric_key": "macro_f1",
3060
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
3061
+ "scope": "multi_episode_128_aligned_baseline",
3062
  "reason": null
3063
  },
3064
  {
 
3066
  "task_id": "next_action",
3067
  "task_label": "Next-Action Prediction",
3068
  "series_id": "metadata128_neural_mlp",
3069
+ "method": "128ep Aligned NN",
3070
  "status": "scored",
3071
  "status_label": "scored",
3072
  "scored": true,
 
3076
  "normalized_score": 0.004910507980164745,
3077
  "metric_key": "macro_f1",
3078
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
3079
+ "scope": "multi_episode_128_aligned_baseline",
3080
  "reason": null
3081
  },
3082
  {
 
3210
  "task_id": "hand_trajectory_forecast",
3211
  "task_label": "Hand Trajectory Forecasting",
3212
  "series_id": "metadata128_simple",
3213
+ "method": "128ep Aligned Simple",
3214
+ "status": "scored",
3215
+ "status_label": "scored",
3216
+ "scored": true,
3217
  "proxy_scored": false,
3218
+ "raw": 8.817333221435547,
3219
+ "raw_text": "8.817",
3220
+ "normalized_score": 0.012231610603598841,
3221
  "metric_key": "mpjpe",
3222
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
3223
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3224
+ "reason": null
3225
  },
3226
  {
3227
  "task_number": 5,
3228
  "task_id": "hand_trajectory_forecast",
3229
  "task_label": "Hand Trajectory Forecasting",
3230
  "series_id": "metadata128_neural_mlp",
3231
+ "method": "128ep Aligned NN",
3232
+ "status": "scored",
3233
+ "status_label": "scored",
3234
+ "scored": true,
3235
  "proxy_scored": false,
3236
+ "raw": 0.429434210062027,
3237
+ "raw_text": "0.4294",
3238
+ "normalized_score": 0.25114484128127007,
3239
  "metric_key": "mpjpe",
3240
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
3241
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3242
+ "reason": null
3243
  },
3244
  {
3245
  "task_number": 5,
 
3372
  "task_id": "contact_prediction",
3373
  "task_label": "Contact State Prediction",
3374
  "series_id": "metadata128_simple",
3375
+ "method": "128ep Aligned Simple",
3376
  "status": "scored",
3377
  "status_label": "scored",
3378
  "scored": true,
 
3382
  "normalized_score": 0.4381481308057444,
3383
  "metric_key": "macro_f1",
3384
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
3385
+ "scope": "multi_episode_128_aligned_baseline",
3386
  "reason": null
3387
  },
3388
  {
 
3390
  "task_id": "contact_prediction",
3391
  "task_label": "Contact State Prediction",
3392
  "series_id": "metadata128_neural_mlp",
3393
+ "method": "128ep Aligned NN",
3394
  "status": "scored",
3395
  "status_label": "scored",
3396
  "scored": true,
 
3400
  "normalized_score": 0.5682695682695682,
3401
  "metric_key": "macro_f1",
3402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
3403
+ "scope": "multi_episode_128_aligned_baseline",
3404
  "reason": null
3405
  },
3406
  {
 
3534
  "task_id": "object_relevance",
3535
  "task_label": "Object Relevance Prediction",
3536
  "series_id": "metadata128_simple",
3537
+ "method": "128ep Aligned Simple",
3538
  "status": "scored",
3539
  "status_label": "scored",
3540
  "scored": true,
 
3544
  "normalized_score": 0.17764578833693304,
3545
  "metric_key": "micro_f1",
3546
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
3547
+ "scope": "multi_episode_128_aligned_baseline",
3548
  "reason": null
3549
  },
3550
  {
 
3552
  "task_id": "object_relevance",
3553
  "task_label": "Object Relevance Prediction",
3554
  "series_id": "metadata128_neural_mlp",
3555
+ "method": "128ep Aligned NN",
3556
  "status": "scored",
3557
  "status_label": "scored",
3558
  "scored": true,
 
3562
  "normalized_score": 0.18662723837686876,
3563
  "metric_key": "micro_f1",
3564
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
3565
+ "scope": "multi_episode_128_aligned_baseline",
3566
  "reason": null
3567
  },
3568
  {
 
3696
  "task_id": "caption_grounding",
3697
  "task_label": "Language Grounding",
3698
  "series_id": "metadata128_simple",
3699
+ "method": "128ep Aligned Simple",
3700
  "status": "scored",
3701
  "status_label": "scored",
3702
  "scored": true,
 
3706
  "normalized_score": 0.002332374220713973,
3707
  "metric_key": "mrr",
3708
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
3709
+ "scope": "multi_episode_128_aligned_baseline",
3710
  "reason": null
3711
  },
3712
  {
 
3714
  "task_id": "caption_grounding",
3715
  "task_label": "Language Grounding",
3716
  "series_id": "metadata128_neural_mlp",
3717
+ "method": "128ep Aligned NN",
3718
  "status": "scored",
3719
  "status_label": "scored",
3720
  "scored": true,
 
3724
  "normalized_score": 0.008236799389123917,
3725
  "metric_key": "mrr",
3726
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
3727
+ "scope": "multi_episode_128_aligned_baseline",
3728
  "reason": null
3729
  },
3730
  {
 
3858
  "task_id": "cross_modal_retrieval",
3859
  "task_label": "Cross-Modal Retrieval",
3860
  "series_id": "metadata128_simple",
3861
+ "method": "128ep Aligned Simple",
3862
+ "status": "scored",
3863
+ "status_label": "scored",
3864
+ "scored": true,
3865
  "proxy_scored": false,
3866
+ "raw": 0.002587692579254508,
3867
+ "raw_text": "0.0026",
3868
+ "normalized_score": 0.002587692579254508,
3869
  "metric_key": "mrr",
3870
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3871
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3872
+ "reason": null
3873
  },
3874
  {
3875
  "task_number": 9,
3876
  "task_id": "cross_modal_retrieval",
3877
  "task_label": "Cross-Modal Retrieval",
3878
  "series_id": "metadata128_neural_mlp",
3879
+ "method": "128ep Aligned NN",
3880
+ "status": "scored",
3881
+ "status_label": "scored",
3882
+ "scored": true,
3883
  "proxy_scored": false,
3884
+ "raw": 0.0026067993603646755,
3885
+ "raw_text": "0.0026",
3886
+ "normalized_score": 0.0026067993603646755,
3887
  "metric_key": "mrr",
3888
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
3889
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3890
+ "reason": null
3891
  },
3892
  {
3893
  "task_number": 9,
 
4020
  "task_id": "modality_reconstruction",
4021
  "task_label": "Cross-Modal Reconstruction",
4022
  "series_id": "metadata128_simple",
4023
+ "method": "128ep Aligned Simple",
4024
+ "status": "scored",
4025
+ "status_label": "scored",
4026
+ "scored": true,
4027
  "proxy_scored": false,
4028
+ "raw": -190.66106203944798,
4029
+ "raw_text": "-190.66",
4030
+ "normalized_score": 0.0,
4031
  "metric_key": "r2",
4032
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
4033
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4034
+ "reason": null
4035
  },
4036
  {
4037
  "task_number": 10,
4038
  "task_id": "modality_reconstruction",
4039
  "task_label": "Cross-Modal Reconstruction",
4040
  "series_id": "metadata128_neural_mlp",
4041
+ "method": "128ep Aligned NN",
4042
+ "status": "scored",
4043
+ "status_label": "scored",
4044
+ "scored": true,
4045
  "proxy_scored": false,
4046
+ "raw": -0.43481132003942147,
4047
+ "raw_text": "-0.4348",
4048
+ "normalized_score": 0.0,
4049
  "metric_key": "r2",
4050
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
4051
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4052
+ "reason": null
4053
  },
4054
  {
4055
  "task_number": 10,
 
4182
  "task_id": "temporal_order",
4183
  "task_label": "Temporal Order Verification",
4184
  "series_id": "metadata128_simple",
4185
+ "method": "128ep Aligned Simple",
4186
  "status": "scored",
4187
  "status_label": "scored",
4188
  "scored": true,
 
4192
  "normalized_score": 0.4198864140782312,
4193
  "metric_key": "f1",
4194
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
4195
+ "scope": "multi_episode_128_aligned_baseline",
4196
  "reason": null
4197
  },
4198
  {
 
4200
  "task_id": "temporal_order",
4201
  "task_label": "Temporal Order Verification",
4202
  "series_id": "metadata128_neural_mlp",
4203
+ "method": "128ep Aligned NN",
4204
  "status": "scored",
4205
  "status_label": "scored",
4206
  "scored": true,
 
4210
  "normalized_score": 0.8252408266656923,
4211
  "metric_key": "f1",
4212
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
4213
+ "scope": "multi_episode_128_aligned_baseline",
4214
  "reason": null
4215
  },
4216
  {
 
4344
  "task_id": "misalignment_detection",
4345
  "task_label": "Multimodal Synchronization Detection",
4346
  "series_id": "metadata128_simple",
4347
+ "method": "128ep Aligned Simple",
4348
+ "status": "scored",
4349
+ "status_label": "scored",
4350
+ "scored": true,
4351
  "proxy_scored": false,
4352
+ "raw": 0.49980060227663614,
4353
+ "raw_text": "0.4998",
4354
+ "normalized_score": 0.49980060227663614,
4355
  "metric_key": "f1",
4356
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
4357
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4358
+ "reason": null
4359
  },
4360
  {
4361
  "task_number": 12,
4362
  "task_id": "misalignment_detection",
4363
  "task_label": "Multimodal Synchronization Detection",
4364
  "series_id": "metadata128_neural_mlp",
4365
+ "method": "128ep Aligned NN",
4366
+ "status": "scored",
4367
+ "status_label": "scored",
4368
+ "scored": true,
4369
  "proxy_scored": false,
4370
+ "raw": 0.7773773780941162,
4371
+ "raw_text": "0.7774",
4372
+ "normalized_score": 0.7773773780941162,
4373
  "metric_key": "f1",
4374
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
4375
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4376
+ "reason": null
4377
  },
4378
  {
4379
  "task_number": 12,
 
4506
  "task_id": "long_horizon_next_action",
4507
  "task_label": "Long-Horizon Next-Action Forecasting",
4508
  "series_id": "metadata128_simple",
4509
+ "method": "128ep Aligned Simple",
4510
  "status": "scored",
4511
  "status_label": "scored",
4512
  "scored": true,
 
4516
  "normalized_score": 0.004579592783699693,
4517
  "metric_key": "macro_f1",
4518
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
4519
+ "scope": "multi_episode_128_aligned_baseline",
4520
  "reason": null
4521
  },
4522
  {
 
4524
  "task_id": "long_horizon_next_action",
4525
  "task_label": "Long-Horizon Next-Action Forecasting",
4526
  "series_id": "metadata128_neural_mlp",
4527
+ "method": "128ep Aligned NN",
4528
  "status": "scored",
4529
  "status_label": "scored",
4530
  "scored": true,
 
4534
  "normalized_score": 0.0029821307969142615,
4535
  "metric_key": "macro_f1",
4536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
4537
+ "scope": "multi_episode_128_aligned_baseline",
4538
  "reason": null
4539
  },
4540
  {
 
4668
  "task_id": "next_subtask_forecast",
4669
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4670
  "series_id": "metadata128_simple",
4671
+ "method": "128ep Aligned Simple",
4672
  "status": "scored",
4673
  "status_label": "scored",
4674
  "scored": true,
 
4678
  "normalized_score": 0.0001206030150753769,
4679
  "metric_key": "macro_f1",
4680
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
4681
+ "scope": "multi_episode_128_aligned_baseline",
4682
  "reason": null
4683
  },
4684
  {
 
4686
  "task_id": "next_subtask_forecast",
4687
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4688
  "series_id": "metadata128_neural_mlp",
4689
+ "method": "128ep Aligned NN",
4690
  "status": "scored",
4691
  "status_label": "scored",
4692
  "scored": true,
 
4696
  "normalized_score": 2.086049543676662e-05,
4697
  "metric_key": "macro_f1",
4698
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
4699
+ "scope": "multi_episode_128_aligned_baseline",
4700
  "reason": null
4701
  },
4702
  {
 
4830
  "task_id": "interaction_text_prediction",
4831
  "task_label": "Interaction Text Prediction",
4832
  "series_id": "metadata128_simple",
4833
+ "method": "128ep Aligned Simple",
4834
  "status": "unsupported_without_required_target",
4835
  "status_label": "unsupported",
4836
  "scored": false,
 
4840
  "normalized_score": null,
4841
  "metric_key": "macro_f1",
4842
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
4843
+ "scope": "multi_episode_128_aligned_baseline",
4844
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
4845
  },
4846
  {
 
4848
  "task_id": "interaction_text_prediction",
4849
  "task_label": "Interaction Text Prediction",
4850
  "series_id": "metadata128_neural_mlp",
4851
+ "method": "128ep Aligned NN",
4852
  "status": "not_supported_by_metadata_only_package",
4853
  "status_label": "not supported",
4854
  "scored": false,
 
4858
  "normalized_score": null,
4859
  "metric_key": "macro_f1",
4860
  "source": null,
4861
+ "scope": "multi_episode_128_aligned_baseline",
4862
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
4863
  },
4864
  {
4865
  "task_number": 15,
 
4992
  "task_id": "action_object_relation",
4993
  "task_label": "Action-Object Relation Prediction",
4994
  "series_id": "metadata128_simple",
4995
+ "method": "128ep Aligned Simple",
4996
  "status": "scored",
4997
  "status_label": "scored",
4998
  "scored": true,
 
5002
  "normalized_score": 0.0,
5003
  "metric_key": "macro_f1",
5004
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
5005
+ "scope": "multi_episode_128_aligned_baseline",
5006
  "reason": null
5007
  },
5008
  {
 
5010
  "task_id": "action_object_relation",
5011
  "task_label": "Action-Object Relation Prediction",
5012
  "series_id": "metadata128_neural_mlp",
5013
+ "method": "128ep Aligned NN",
5014
  "status": "scored",
5015
  "status_label": "scored",
5016
  "scored": true,
 
5020
  "normalized_score": 0.0,
5021
  "metric_key": "macro_f1",
5022
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
5023
+ "scope": "multi_episode_128_aligned_baseline",
5024
  "reason": null
5025
  },
5026
  {
 
5154
  "task_id": "object_set_forecast",
5155
  "task_label": "Future Object-Set Forecasting",
5156
  "series_id": "metadata128_simple",
5157
+ "method": "128ep Aligned Simple",
5158
  "status": "scored",
5159
  "status_label": "scored",
5160
  "scored": true,
 
5164
  "normalized_score": 0.17656983343047333,
5165
  "metric_key": "micro_f1",
5166
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
5167
+ "scope": "multi_episode_128_aligned_baseline",
5168
  "reason": null
5169
  },
5170
  {
 
5172
  "task_id": "object_set_forecast",
5173
  "task_label": "Future Object-Set Forecasting",
5174
  "series_id": "metadata128_neural_mlp",
5175
+ "method": "128ep Aligned NN",
5176
  "status": "scored",
5177
  "status_label": "scored",
5178
  "scored": true,
 
5182
  "normalized_score": 0.17418550827844048,
5183
  "metric_key": "micro_f1",
5184
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
5185
+ "scope": "multi_episode_128_aligned_baseline",
5186
  "reason": null
5187
  },
5188
  {
 
5316
  "task_id": "imu_to_hand_pose",
5317
  "task_label": "IMU-to-Hand Pose Reconstruction",
5318
  "series_id": "metadata128_simple",
5319
+ "method": "128ep Aligned Simple",
5320
+ "status": "scored",
5321
+ "status_label": "scored",
5322
+ "scored": true,
5323
  "proxy_scored": false,
5324
+ "raw": 0.2294670194387436,
5325
+ "raw_text": "0.2295",
5326
+ "normalized_score": 0.18324815505876868,
5327
  "metric_key": "mae",
5328
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
5329
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
5330
+ "reason": null
5331
  },
5332
  {
5333
  "task_number": 18,
5334
  "task_id": "imu_to_hand_pose",
5335
  "task_label": "IMU-to-Hand Pose Reconstruction",
5336
  "series_id": "metadata128_neural_mlp",
5337
+ "method": "128ep Aligned NN",
5338
+ "status": "scored",
5339
+ "status_label": "scored",
5340
+ "scored": true,
5341
  "proxy_scored": false,
5342
+ "raw": 0.2555866539478302,
5343
+ "raw_text": "0.2556",
5344
+ "normalized_score": 0.16452114110609004,
5345
  "metric_key": "mae",
5346
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
5347
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
5348
+ "reason": null
5349
  },
5350
  {
5351
  "task_number": 18,
 
5478
  "task_id": "camera_view_sync_retrieval",
5479
  "task_label": "Camera-View Synchronization Retrieval",
5480
  "series_id": "metadata128_simple",
5481
+ "method": "128ep Aligned Simple",
5482
  "status": "unsupported_without_required_target",
5483
  "status_label": "unsupported",
5484
  "scored": false,
 
5488
  "normalized_score": null,
5489
  "metric_key": "mrr",
5490
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
5491
+ "scope": "multi_episode_128_aligned_baseline",
5492
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
5493
  },
5494
  {
 
5496
  "task_id": "camera_view_sync_retrieval",
5497
  "task_label": "Camera-View Synchronization Retrieval",
5498
  "series_id": "metadata128_neural_mlp",
5499
+ "method": "128ep Aligned NN",
5500
  "status": "not_supported_by_metadata_only_package",
5501
  "status_label": "not supported",
5502
  "scored": false,
 
5506
  "normalized_score": null,
5507
  "metric_key": "mrr",
5508
  "source": null,
5509
+ "scope": "multi_episode_128_aligned_baseline",
5510
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
5511
  },
5512
  {
5513
  "task_number": 19,
 
5640
  "task_id": "time_to_transition",
5641
  "task_label": "Time-to-Next-Transition Regression",
5642
  "series_id": "metadata128_simple",
5643
+ "method": "128ep Aligned Simple",
5644
  "status": "scored",
5645
  "status_label": "scored",
5646
  "scored": true,
 
5650
  "normalized_score": 0.016864874132806403,
5651
  "metric_key": "mae",
5652
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
5653
+ "scope": "multi_episode_128_aligned_baseline",
5654
  "reason": null
5655
  },
5656
  {
 
5658
  "task_id": "time_to_transition",
5659
  "task_label": "Time-to-Next-Transition Regression",
5660
  "series_id": "metadata128_neural_mlp",
5661
+ "method": "128ep Aligned NN",
5662
  "status": "scored",
5663
  "status_label": "scored",
5664
  "scored": true,
 
5668
  "normalized_score": 0.25411768748242325,
5669
  "metric_key": "mae",
5670
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
5671
+ "scope": "multi_episode_128_aligned_baseline",
5672
  "reason": null
5673
  },
5674
  {
data/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:09:46+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
- "bytes": 116110,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -316,7 +316,7 @@
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
- "bytes": 186443,
320
  "top_level_type": "dict"
321
  },
322
  {
@@ -351,7 +351,7 @@
351
  },
352
  {
353
  "path": "data/mirror_parity.json",
354
- "bytes": 994053,
355
  "top_level_type": "dict"
356
  },
357
  {
@@ -471,7 +471,7 @@
471
  },
472
  {
473
  "path": "data/single_episode_task_model_radar.json",
474
- "bytes": 50973,
475
  "top_level_type": "dict"
476
  },
477
  {
@@ -486,12 +486,12 @@
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
- "bytes": 46902,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
- "bytes": 129242,
495
  "top_level_type": "dict"
496
  },
497
  {
@@ -526,7 +526,7 @@
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
- "bytes": 230297,
530
  "top_level_type": "dict"
531
  },
532
  {
@@ -571,7 +571,7 @@
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
- "bytes": 45937,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
@@ -641,7 +641,7 @@
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
- "bytes": 51953,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
@@ -752,7 +752,7 @@
752
  {
753
  "path": "assets/task_suite_infographic.png",
754
  "exists": true,
755
- "bytes": 2627286,
756
  "width": 1800,
757
  "height": 6600,
758
  "format": "PNG"
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:54:19+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
+ "bytes": 116111,
305
  "top_level_type": "dict"
306
  },
307
  {
 
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
+ "bytes": 185447,
320
  "top_level_type": "dict"
321
  },
322
  {
 
351
  },
352
  {
353
  "path": "data/mirror_parity.json",
354
+ "bytes": 1059014,
355
  "top_level_type": "dict"
356
  },
357
  {
 
471
  },
472
  {
473
  "path": "data/single_episode_task_model_radar.json",
474
+ "bytes": 51064,
475
  "top_level_type": "dict"
476
  },
477
  {
 
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
+ "bytes": 35883,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
+ "bytes": 128794,
495
  "top_level_type": "dict"
496
  },
497
  {
 
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
+ "bytes": 229299,
530
  "top_level_type": "dict"
531
  },
532
  {
 
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
+ "bytes": 47540,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
 
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
+ "bytes": 53553,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
 
752
  {
753
  "path": "assets/task_suite_infographic.png",
754
  "exists": true,
755
+ "bytes": 1591194,
756
  "width": 1800,
757
  "height": 6600,
758
  "format": "PNG"
docs/data/episode128_task_model_radar.json CHANGED
@@ -1,19 +1,19 @@
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
- "scored_method_task_count": 93,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
@@ -21,50 +21,50 @@
21
  "series": [
22
  {
23
  "id": "metadata128_simple",
24
- "label": "128ep Metadata Simple",
25
  "short_label": "128-S",
26
  "color": "#ffd166",
27
- "kind": "partial_128_episode_metadata_baseline",
28
- "scope": "128 selected episodes, JSONL metadata/text only",
29
  "stroke_dasharray": "9 6",
30
- "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
- "scored_task_count": 13,
34
- "covered_task_count": 13,
35
  "proxy_scored_task_count": 0,
36
- "scoreless_task_count": 7,
37
- "unsupported_task_count": 7,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
- "scored": 13,
41
- "unsupported_without_required_target": 7
42
  },
43
- "coverage_fraction": 0.65,
44
  "result_record_fraction": 1.0
45
  },
46
  {
47
  "id": "metadata128_neural_mlp",
48
- "label": "128ep Metadata NN",
49
  "short_label": "128-NN",
50
  "color": "#f472b6",
51
- "kind": "partial_128_episode_metadata_baseline",
52
- "scope": "128 selected episodes, JSONL metadata/text only",
53
  "stroke_dasharray": "3 6",
54
- "method_detail": "128-episode JSONL metadata/text MLP baselines.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
- "scored_task_count": 13,
58
- "covered_task_count": 13,
59
  "proxy_scored_task_count": 0,
60
- "scoreless_task_count": 7,
61
- "unsupported_task_count": 7,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
- "not_supported_by_metadata_only_package": 7,
65
- "scored": 13
66
  },
67
- "coverage_fraction": 0.65,
68
  "result_record_fraction": 1.0
69
  },
70
  {
@@ -205,7 +205,7 @@
205
  "raw": 0.008252821966746326,
206
  "metric_key": "macro_f1",
207
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
208
- "scope": "multi_episode_128_metadata_baseline",
209
  "status": "scored",
210
  "reason": null,
211
  "normalized_score": 0.008252821966746326,
@@ -216,7 +216,7 @@
216
  "raw": 0.004175793689174209,
217
  "metric_key": "macro_f1",
218
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
219
- "scope": "multi_episode_128_metadata_baseline",
220
  "status": "scored",
221
  "reason": null,
222
  "normalized_score": 0.004175793689174209,
@@ -296,7 +296,7 @@
296
  "raw": 0.00019512195121951218,
297
  "metric_key": "macro_f1",
298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
299
- "scope": "multi_episode_128_metadata_baseline",
300
  "status": "scored",
301
  "reason": null,
302
  "normalized_score": 0.00019512195121951218,
@@ -307,7 +307,7 @@
307
  "raw": 7.207207207207208e-05,
308
  "metric_key": "macro_f1",
309
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
310
- "scope": "multi_episode_128_metadata_baseline",
311
  "status": "scored",
312
  "reason": null,
313
  "normalized_score": 7.207207207207208e-05,
@@ -387,7 +387,7 @@
387
  "raw": 0.29652162550029315,
388
  "metric_key": "macro_f1",
389
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
390
- "scope": "multi_episode_128_metadata_baseline",
391
  "status": "scored",
392
  "reason": null,
393
  "normalized_score": 0.29652162550029315,
@@ -398,7 +398,7 @@
398
  "raw": 0.4841733292368365,
399
  "metric_key": "macro_f1",
400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
401
- "scope": "multi_episode_128_metadata_baseline",
402
  "status": "scored",
403
  "reason": null,
404
  "normalized_score": 0.4841733292368365,
@@ -478,7 +478,7 @@
478
  "raw": 0.006514774539765508,
479
  "metric_key": "macro_f1",
480
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
481
- "scope": "multi_episode_128_metadata_baseline",
482
  "status": "scored",
483
  "reason": null,
484
  "normalized_score": 0.006514774539765508,
@@ -489,7 +489,7 @@
489
  "raw": 0.004910507980164745,
490
  "metric_key": "macro_f1",
491
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
492
- "scope": "multi_episode_128_metadata_baseline",
493
  "status": "scored",
494
  "reason": null,
495
  "normalized_score": 0.004910507980164745,
@@ -566,26 +566,26 @@
566
  "raw128_proxy_axis": false,
567
  "values": {
568
  "metadata128_simple": {
569
- "raw": null,
570
  "metric_key": "mpjpe",
571
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
572
- "scope": "multi_episode_128_metadata_baseline",
573
- "status": "unsupported_without_required_target",
574
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package",
575
- "normalized_score": null,
576
- "raw_text": "n/a",
577
- "status_label": "unsupported"
578
  },
579
  "metadata128_neural_mlp": {
580
- "raw": null,
581
  "metric_key": "mpjpe",
582
- "source": null,
583
- "scope": "multi_episode_128_metadata_baseline",
584
- "status": "not_supported_by_metadata_only_package",
585
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
586
- "normalized_score": null,
587
- "raw_text": "n/a",
588
- "status_label": "not supported"
589
  },
590
  "raw128_simple": {
591
  "raw": 0.2729249894618988,
@@ -660,7 +660,7 @@
660
  "raw": 0.4381481308057444,
661
  "metric_key": "macro_f1",
662
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
663
- "scope": "multi_episode_128_metadata_baseline",
664
  "status": "scored",
665
  "reason": null,
666
  "normalized_score": 0.4381481308057444,
@@ -671,7 +671,7 @@
671
  "raw": 0.5682695682695682,
672
  "metric_key": "macro_f1",
673
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
674
- "scope": "multi_episode_128_metadata_baseline",
675
  "status": "scored",
676
  "reason": null,
677
  "normalized_score": 0.5682695682695682,
@@ -751,7 +751,7 @@
751
  "raw": 0.17764578833693304,
752
  "metric_key": "micro_f1",
753
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
754
- "scope": "multi_episode_128_metadata_baseline",
755
  "status": "scored",
756
  "reason": null,
757
  "normalized_score": 0.17764578833693304,
@@ -762,7 +762,7 @@
762
  "raw": 0.18662723837686876,
763
  "metric_key": "micro_f1",
764
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
765
- "scope": "multi_episode_128_metadata_baseline",
766
  "status": "scored",
767
  "reason": null,
768
  "normalized_score": 0.18662723837686876,
@@ -842,7 +842,7 @@
842
  "raw": 0.002332374220713973,
843
  "metric_key": "mrr",
844
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
845
- "scope": "multi_episode_128_metadata_baseline",
846
  "status": "scored",
847
  "reason": null,
848
  "normalized_score": 0.002332374220713973,
@@ -853,7 +853,7 @@
853
  "raw": 0.008236799389123917,
854
  "metric_key": "mrr",
855
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
856
- "scope": "multi_episode_128_metadata_baseline",
857
  "status": "scored",
858
  "reason": null,
859
  "normalized_score": 0.008236799389123917,
@@ -930,26 +930,26 @@
930
  "raw128_proxy_axis": false,
931
  "values": {
932
  "metadata128_simple": {
933
- "raw": null,
934
  "metric_key": "mrr",
935
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
936
- "scope": "multi_episode_128_metadata_baseline",
937
- "status": "unsupported_without_required_target",
938
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package",
939
- "normalized_score": null,
940
- "raw_text": "n/a",
941
- "status_label": "unsupported"
942
  },
943
  "metadata128_neural_mlp": {
944
- "raw": null,
945
  "metric_key": "mrr",
946
- "source": null,
947
- "scope": "multi_episode_128_metadata_baseline",
948
- "status": "not_supported_by_metadata_only_package",
949
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
950
- "normalized_score": null,
951
- "raw_text": "n/a",
952
- "status_label": "not supported"
953
  },
954
  "raw128_simple": {
955
  "raw": 0.003459817497059703,
@@ -1021,26 +1021,26 @@
1021
  "raw128_proxy_axis": false,
1022
  "values": {
1023
  "metadata128_simple": {
1024
- "raw": null,
1025
  "metric_key": "r2",
1026
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1027
- "scope": "multi_episode_128_metadata_baseline",
1028
- "status": "unsupported_without_required_target",
1029
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package",
1030
- "normalized_score": null,
1031
- "raw_text": "n/a",
1032
- "status_label": "unsupported"
1033
  },
1034
  "metadata128_neural_mlp": {
1035
- "raw": null,
1036
  "metric_key": "r2",
1037
- "source": null,
1038
- "scope": "multi_episode_128_metadata_baseline",
1039
- "status": "not_supported_by_metadata_only_package",
1040
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1041
- "normalized_score": null,
1042
- "raw_text": "n/a",
1043
- "status_label": "not supported"
1044
  },
1045
  "raw128_simple": {
1046
  "raw": -1.3450960391924882,
@@ -1115,7 +1115,7 @@
1115
  "raw": 0.4198864140782312,
1116
  "metric_key": "f1",
1117
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1118
- "scope": "multi_episode_128_metadata_baseline",
1119
  "status": "scored",
1120
  "reason": null,
1121
  "normalized_score": 0.4198864140782312,
@@ -1126,7 +1126,7 @@
1126
  "raw": 0.8252408266656923,
1127
  "metric_key": "f1",
1128
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1129
- "scope": "multi_episode_128_metadata_baseline",
1130
  "status": "scored",
1131
  "reason": null,
1132
  "normalized_score": 0.8252408266656923,
@@ -1203,26 +1203,26 @@
1203
  "raw128_proxy_axis": false,
1204
  "values": {
1205
  "metadata128_simple": {
1206
- "raw": null,
1207
  "metric_key": "f1",
1208
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1209
- "scope": "multi_episode_128_metadata_baseline",
1210
- "status": "unsupported_without_required_target",
1211
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone",
1212
- "normalized_score": null,
1213
- "raw_text": "n/a",
1214
- "status_label": "unsupported"
1215
  },
1216
  "metadata128_neural_mlp": {
1217
- "raw": null,
1218
  "metric_key": "f1",
1219
- "source": null,
1220
- "scope": "multi_episode_128_metadata_baseline",
1221
- "status": "not_supported_by_metadata_only_package",
1222
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1223
- "normalized_score": null,
1224
- "raw_text": "n/a",
1225
- "status_label": "not supported"
1226
  },
1227
  "raw128_simple": {
1228
  "raw": 0.4958867673901769,
@@ -1297,7 +1297,7 @@
1297
  "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
- "scope": "multi_episode_128_metadata_baseline",
1301
  "status": "scored",
1302
  "reason": null,
1303
  "normalized_score": 0.004579592783699693,
@@ -1308,7 +1308,7 @@
1308
  "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
- "scope": "multi_episode_128_metadata_baseline",
1312
  "status": "scored",
1313
  "reason": null,
1314
  "normalized_score": 0.0029821307969142615,
@@ -1388,7 +1388,7 @@
1388
  "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
- "scope": "multi_episode_128_metadata_baseline",
1392
  "status": "scored",
1393
  "reason": null,
1394
  "normalized_score": 0.0001206030150753769,
@@ -1399,7 +1399,7 @@
1399
  "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
- "scope": "multi_episode_128_metadata_baseline",
1403
  "status": "scored",
1404
  "reason": null,
1405
  "normalized_score": 2.086049543676662e-05,
@@ -1479,7 +1479,7 @@
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
- "scope": "multi_episode_128_metadata_baseline",
1483
  "status": "unsupported_without_required_target",
1484
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
@@ -1490,9 +1490,9 @@
1490
  "raw": null,
1491
  "metric_key": "macro_f1",
1492
  "source": null,
1493
- "scope": "multi_episode_128_metadata_baseline",
1494
  "status": "not_supported_by_metadata_only_package",
1495
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1496
  "normalized_score": null,
1497
  "raw_text": "n/a",
1498
  "status_label": "not supported"
@@ -1570,7 +1570,7 @@
1570
  "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
- "scope": "multi_episode_128_metadata_baseline",
1574
  "status": "scored",
1575
  "reason": null,
1576
  "normalized_score": 0.0,
@@ -1581,7 +1581,7 @@
1581
  "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
- "scope": "multi_episode_128_metadata_baseline",
1585
  "status": "scored",
1586
  "reason": null,
1587
  "normalized_score": 0.0,
@@ -1661,7 +1661,7 @@
1661
  "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
- "scope": "multi_episode_128_metadata_baseline",
1665
  "status": "scored",
1666
  "reason": null,
1667
  "normalized_score": 0.17656983343047333,
@@ -1672,7 +1672,7 @@
1672
  "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
- "scope": "multi_episode_128_metadata_baseline",
1676
  "status": "scored",
1677
  "reason": null,
1678
  "normalized_score": 0.17418550827844048,
@@ -1749,26 +1749,26 @@
1749
  "raw128_proxy_axis": false,
1750
  "values": {
1751
  "metadata128_simple": {
1752
- "raw": null,
1753
  "metric_key": "mae",
1754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
- "scope": "multi_episode_128_metadata_baseline",
1756
- "status": "unsupported_without_required_target",
1757
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
1758
- "normalized_score": null,
1759
- "raw_text": "n/a",
1760
- "status_label": "unsupported"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
- "raw": null,
1764
  "metric_key": "mae",
1765
- "source": null,
1766
- "scope": "multi_episode_128_metadata_baseline",
1767
- "status": "not_supported_by_metadata_only_package",
1768
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1769
- "normalized_score": null,
1770
- "raw_text": "n/a",
1771
- "status_label": "not supported"
1772
  },
1773
  "raw128_simple": {
1774
  "raw": 0.22941437363624573,
@@ -1843,7 +1843,7 @@
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
- "scope": "multi_episode_128_metadata_baseline",
1847
  "status": "unsupported_without_required_target",
1848
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
@@ -1854,9 +1854,9 @@
1854
  "raw": null,
1855
  "metric_key": "mrr",
1856
  "source": null,
1857
- "scope": "multi_episode_128_metadata_baseline",
1858
  "status": "not_supported_by_metadata_only_package",
1859
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1860
  "normalized_score": null,
1861
  "raw_text": "n/a",
1862
  "status_label": "not supported"
@@ -1934,7 +1934,7 @@
1934
  "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
- "scope": "multi_episode_128_metadata_baseline",
1938
  "status": "scored",
1939
  "reason": null,
1940
  "normalized_score": 0.016864874132806403,
@@ -1945,7 +1945,7 @@
1945
  "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
- "scope": "multi_episode_128_metadata_baseline",
1949
  "status": "scored",
1950
  "reason": null,
1951
  "normalized_score": 0.25411768748242325,
@@ -2016,7 +2016,7 @@
2016
  "task_id": "timeline_action",
2017
  "task_label": "Action Recognition",
2018
  "series_id": "metadata128_simple",
2019
- "method": "128ep Metadata Simple",
2020
  "status": "scored",
2021
  "status_label": "scored",
2022
  "scored": true,
@@ -2026,7 +2026,7 @@
2026
  "normalized_score": 0.008252821966746326,
2027
  "metric_key": "macro_f1",
2028
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2029
- "scope": "multi_episode_128_metadata_baseline",
2030
  "reason": null
2031
  },
2032
  {
@@ -2034,7 +2034,7 @@
2034
  "task_id": "timeline_action",
2035
  "task_label": "Action Recognition",
2036
  "series_id": "metadata128_neural_mlp",
2037
- "method": "128ep Metadata NN",
2038
  "status": "scored",
2039
  "status_label": "scored",
2040
  "scored": true,
@@ -2044,7 +2044,7 @@
2044
  "normalized_score": 0.004175793689174209,
2045
  "metric_key": "macro_f1",
2046
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2047
- "scope": "multi_episode_128_metadata_baseline",
2048
  "reason": null
2049
  },
2050
  {
@@ -2142,7 +2142,7 @@
2142
  "task_id": "timeline_subtask",
2143
  "task_label": "Procedure Step Recognition",
2144
  "series_id": "metadata128_simple",
2145
- "method": "128ep Metadata Simple",
2146
  "status": "scored",
2147
  "status_label": "scored",
2148
  "scored": true,
@@ -2152,7 +2152,7 @@
2152
  "normalized_score": 0.00019512195121951218,
2153
  "metric_key": "macro_f1",
2154
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2155
- "scope": "multi_episode_128_metadata_baseline",
2156
  "reason": null
2157
  },
2158
  {
@@ -2160,7 +2160,7 @@
2160
  "task_id": "timeline_subtask",
2161
  "task_label": "Procedure Step Recognition",
2162
  "series_id": "metadata128_neural_mlp",
2163
- "method": "128ep Metadata NN",
2164
  "status": "scored",
2165
  "status_label": "scored",
2166
  "scored": true,
@@ -2170,7 +2170,7 @@
2170
  "normalized_score": 7.207207207207208e-05,
2171
  "metric_key": "macro_f1",
2172
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2173
- "scope": "multi_episode_128_metadata_baseline",
2174
  "reason": null
2175
  },
2176
  {
@@ -2268,7 +2268,7 @@
2268
  "task_id": "transition_detection",
2269
  "task_label": "Action Boundary Detection",
2270
  "series_id": "metadata128_simple",
2271
- "method": "128ep Metadata Simple",
2272
  "status": "scored",
2273
  "status_label": "scored",
2274
  "scored": true,
@@ -2278,7 +2278,7 @@
2278
  "normalized_score": 0.29652162550029315,
2279
  "metric_key": "macro_f1",
2280
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2281
- "scope": "multi_episode_128_metadata_baseline",
2282
  "reason": null
2283
  },
2284
  {
@@ -2286,7 +2286,7 @@
2286
  "task_id": "transition_detection",
2287
  "task_label": "Action Boundary Detection",
2288
  "series_id": "metadata128_neural_mlp",
2289
- "method": "128ep Metadata NN",
2290
  "status": "scored",
2291
  "status_label": "scored",
2292
  "scored": true,
@@ -2296,7 +2296,7 @@
2296
  "normalized_score": 0.4841733292368365,
2297
  "metric_key": "macro_f1",
2298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2299
- "scope": "multi_episode_128_metadata_baseline",
2300
  "reason": null
2301
  },
2302
  {
@@ -2394,7 +2394,7 @@
2394
  "task_id": "next_action",
2395
  "task_label": "Next-Action Prediction",
2396
  "series_id": "metadata128_simple",
2397
- "method": "128ep Metadata Simple",
2398
  "status": "scored",
2399
  "status_label": "scored",
2400
  "scored": true,
@@ -2404,7 +2404,7 @@
2404
  "normalized_score": 0.006514774539765508,
2405
  "metric_key": "macro_f1",
2406
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
2407
- "scope": "multi_episode_128_metadata_baseline",
2408
  "reason": null
2409
  },
2410
  {
@@ -2412,7 +2412,7 @@
2412
  "task_id": "next_action",
2413
  "task_label": "Next-Action Prediction",
2414
  "series_id": "metadata128_neural_mlp",
2415
- "method": "128ep Metadata NN",
2416
  "status": "scored",
2417
  "status_label": "scored",
2418
  "scored": true,
@@ -2422,7 +2422,7 @@
2422
  "normalized_score": 0.004910507980164745,
2423
  "metric_key": "macro_f1",
2424
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
2425
- "scope": "multi_episode_128_metadata_baseline",
2426
  "reason": null
2427
  },
2428
  {
@@ -2520,36 +2520,36 @@
2520
  "task_id": "hand_trajectory_forecast",
2521
  "task_label": "Hand Trajectory Forecasting",
2522
  "series_id": "metadata128_simple",
2523
- "method": "128ep Metadata Simple",
2524
- "status": "unsupported_without_required_target",
2525
- "status_label": "unsupported",
2526
- "scored": false,
2527
  "proxy_scored": false,
2528
- "raw": null,
2529
- "raw_text": "n/a",
2530
- "normalized_score": null,
2531
  "metric_key": "mpjpe",
2532
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
2533
- "scope": "multi_episode_128_metadata_baseline",
2534
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package"
2535
  },
2536
  {
2537
  "task_number": 5,
2538
  "task_id": "hand_trajectory_forecast",
2539
  "task_label": "Hand Trajectory Forecasting",
2540
  "series_id": "metadata128_neural_mlp",
2541
- "method": "128ep Metadata NN",
2542
- "status": "not_supported_by_metadata_only_package",
2543
- "status_label": "not supported",
2544
- "scored": false,
2545
  "proxy_scored": false,
2546
- "raw": null,
2547
- "raw_text": "n/a",
2548
- "normalized_score": null,
2549
  "metric_key": "mpjpe",
2550
- "source": null,
2551
- "scope": "multi_episode_128_metadata_baseline",
2552
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2553
  },
2554
  {
2555
  "task_number": 5,
@@ -2646,7 +2646,7 @@
2646
  "task_id": "contact_prediction",
2647
  "task_label": "Contact State Prediction",
2648
  "series_id": "metadata128_simple",
2649
- "method": "128ep Metadata Simple",
2650
  "status": "scored",
2651
  "status_label": "scored",
2652
  "scored": true,
@@ -2656,7 +2656,7 @@
2656
  "normalized_score": 0.4381481308057444,
2657
  "metric_key": "macro_f1",
2658
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
2659
- "scope": "multi_episode_128_metadata_baseline",
2660
  "reason": null
2661
  },
2662
  {
@@ -2664,7 +2664,7 @@
2664
  "task_id": "contact_prediction",
2665
  "task_label": "Contact State Prediction",
2666
  "series_id": "metadata128_neural_mlp",
2667
- "method": "128ep Metadata NN",
2668
  "status": "scored",
2669
  "status_label": "scored",
2670
  "scored": true,
@@ -2674,7 +2674,7 @@
2674
  "normalized_score": 0.5682695682695682,
2675
  "metric_key": "macro_f1",
2676
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
2677
- "scope": "multi_episode_128_metadata_baseline",
2678
  "reason": null
2679
  },
2680
  {
@@ -2772,7 +2772,7 @@
2772
  "task_id": "object_relevance",
2773
  "task_label": "Object Relevance Prediction",
2774
  "series_id": "metadata128_simple",
2775
- "method": "128ep Metadata Simple",
2776
  "status": "scored",
2777
  "status_label": "scored",
2778
  "scored": true,
@@ -2782,7 +2782,7 @@
2782
  "normalized_score": 0.17764578833693304,
2783
  "metric_key": "micro_f1",
2784
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
2785
- "scope": "multi_episode_128_metadata_baseline",
2786
  "reason": null
2787
  },
2788
  {
@@ -2790,7 +2790,7 @@
2790
  "task_id": "object_relevance",
2791
  "task_label": "Object Relevance Prediction",
2792
  "series_id": "metadata128_neural_mlp",
2793
- "method": "128ep Metadata NN",
2794
  "status": "scored",
2795
  "status_label": "scored",
2796
  "scored": true,
@@ -2800,7 +2800,7 @@
2800
  "normalized_score": 0.18662723837686876,
2801
  "metric_key": "micro_f1",
2802
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
2803
- "scope": "multi_episode_128_metadata_baseline",
2804
  "reason": null
2805
  },
2806
  {
@@ -2898,7 +2898,7 @@
2898
  "task_id": "caption_grounding",
2899
  "task_label": "Language Grounding",
2900
  "series_id": "metadata128_simple",
2901
- "method": "128ep Metadata Simple",
2902
  "status": "scored",
2903
  "status_label": "scored",
2904
  "scored": true,
@@ -2908,7 +2908,7 @@
2908
  "normalized_score": 0.002332374220713973,
2909
  "metric_key": "mrr",
2910
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
2911
- "scope": "multi_episode_128_metadata_baseline",
2912
  "reason": null
2913
  },
2914
  {
@@ -2916,7 +2916,7 @@
2916
  "task_id": "caption_grounding",
2917
  "task_label": "Language Grounding",
2918
  "series_id": "metadata128_neural_mlp",
2919
- "method": "128ep Metadata NN",
2920
  "status": "scored",
2921
  "status_label": "scored",
2922
  "scored": true,
@@ -2926,7 +2926,7 @@
2926
  "normalized_score": 0.008236799389123917,
2927
  "metric_key": "mrr",
2928
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
2929
- "scope": "multi_episode_128_metadata_baseline",
2930
  "reason": null
2931
  },
2932
  {
@@ -3024,36 +3024,36 @@
3024
  "task_id": "cross_modal_retrieval",
3025
  "task_label": "Cross-Modal Retrieval",
3026
  "series_id": "metadata128_simple",
3027
- "method": "128ep Metadata Simple",
3028
- "status": "unsupported_without_required_target",
3029
- "status_label": "unsupported",
3030
- "scored": false,
3031
  "proxy_scored": false,
3032
- "raw": null,
3033
- "raw_text": "n/a",
3034
- "normalized_score": null,
3035
  "metric_key": "mrr",
3036
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3037
- "scope": "multi_episode_128_metadata_baseline",
3038
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package"
3039
  },
3040
  {
3041
  "task_number": 9,
3042
  "task_id": "cross_modal_retrieval",
3043
  "task_label": "Cross-Modal Retrieval",
3044
  "series_id": "metadata128_neural_mlp",
3045
- "method": "128ep Metadata NN",
3046
- "status": "not_supported_by_metadata_only_package",
3047
- "status_label": "not supported",
3048
- "scored": false,
3049
  "proxy_scored": false,
3050
- "raw": null,
3051
- "raw_text": "n/a",
3052
- "normalized_score": null,
3053
  "metric_key": "mrr",
3054
- "source": null,
3055
- "scope": "multi_episode_128_metadata_baseline",
3056
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3057
  },
3058
  {
3059
  "task_number": 9,
@@ -3150,36 +3150,36 @@
3150
  "task_id": "modality_reconstruction",
3151
  "task_label": "Cross-Modal Reconstruction",
3152
  "series_id": "metadata128_simple",
3153
- "method": "128ep Metadata Simple",
3154
- "status": "unsupported_without_required_target",
3155
- "status_label": "unsupported",
3156
- "scored": false,
3157
  "proxy_scored": false,
3158
- "raw": null,
3159
- "raw_text": "n/a",
3160
- "normalized_score": null,
3161
  "metric_key": "r2",
3162
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
3163
- "scope": "multi_episode_128_metadata_baseline",
3164
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package"
3165
  },
3166
  {
3167
  "task_number": 10,
3168
  "task_id": "modality_reconstruction",
3169
  "task_label": "Cross-Modal Reconstruction",
3170
  "series_id": "metadata128_neural_mlp",
3171
- "method": "128ep Metadata NN",
3172
- "status": "not_supported_by_metadata_only_package",
3173
- "status_label": "not supported",
3174
- "scored": false,
3175
  "proxy_scored": false,
3176
- "raw": null,
3177
- "raw_text": "n/a",
3178
- "normalized_score": null,
3179
  "metric_key": "r2",
3180
- "source": null,
3181
- "scope": "multi_episode_128_metadata_baseline",
3182
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3183
  },
3184
  {
3185
  "task_number": 10,
@@ -3276,7 +3276,7 @@
3276
  "task_id": "temporal_order",
3277
  "task_label": "Temporal Order Verification",
3278
  "series_id": "metadata128_simple",
3279
- "method": "128ep Metadata Simple",
3280
  "status": "scored",
3281
  "status_label": "scored",
3282
  "scored": true,
@@ -3286,7 +3286,7 @@
3286
  "normalized_score": 0.4198864140782312,
3287
  "metric_key": "f1",
3288
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
3289
- "scope": "multi_episode_128_metadata_baseline",
3290
  "reason": null
3291
  },
3292
  {
@@ -3294,7 +3294,7 @@
3294
  "task_id": "temporal_order",
3295
  "task_label": "Temporal Order Verification",
3296
  "series_id": "metadata128_neural_mlp",
3297
- "method": "128ep Metadata NN",
3298
  "status": "scored",
3299
  "status_label": "scored",
3300
  "scored": true,
@@ -3304,7 +3304,7 @@
3304
  "normalized_score": 0.8252408266656923,
3305
  "metric_key": "f1",
3306
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
3307
- "scope": "multi_episode_128_metadata_baseline",
3308
  "reason": null
3309
  },
3310
  {
@@ -3402,36 +3402,36 @@
3402
  "task_id": "misalignment_detection",
3403
  "task_label": "Multimodal Synchronization Detection",
3404
  "series_id": "metadata128_simple",
3405
- "method": "128ep Metadata Simple",
3406
- "status": "unsupported_without_required_target",
3407
- "status_label": "unsupported",
3408
- "scored": false,
3409
  "proxy_scored": false,
3410
- "raw": null,
3411
- "raw_text": "n/a",
3412
- "normalized_score": null,
3413
  "metric_key": "f1",
3414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
3415
- "scope": "multi_episode_128_metadata_baseline",
3416
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone"
3417
  },
3418
  {
3419
  "task_number": 12,
3420
  "task_id": "misalignment_detection",
3421
  "task_label": "Multimodal Synchronization Detection",
3422
  "series_id": "metadata128_neural_mlp",
3423
- "method": "128ep Metadata NN",
3424
- "status": "not_supported_by_metadata_only_package",
3425
- "status_label": "not supported",
3426
- "scored": false,
3427
  "proxy_scored": false,
3428
- "raw": null,
3429
- "raw_text": "n/a",
3430
- "normalized_score": null,
3431
  "metric_key": "f1",
3432
- "source": null,
3433
- "scope": "multi_episode_128_metadata_baseline",
3434
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3435
  },
3436
  {
3437
  "task_number": 12,
@@ -3528,7 +3528,7 @@
3528
  "task_id": "long_horizon_next_action",
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
- "method": "128ep Metadata Simple",
3532
  "status": "scored",
3533
  "status_label": "scored",
3534
  "scored": true,
@@ -3538,7 +3538,7 @@
3538
  "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
- "scope": "multi_episode_128_metadata_baseline",
3542
  "reason": null
3543
  },
3544
  {
@@ -3546,7 +3546,7 @@
3546
  "task_id": "long_horizon_next_action",
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
- "method": "128ep Metadata NN",
3550
  "status": "scored",
3551
  "status_label": "scored",
3552
  "scored": true,
@@ -3556,7 +3556,7 @@
3556
  "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
- "scope": "multi_episode_128_metadata_baseline",
3560
  "reason": null
3561
  },
3562
  {
@@ -3654,7 +3654,7 @@
3654
  "task_id": "next_subtask_forecast",
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
- "method": "128ep Metadata Simple",
3658
  "status": "scored",
3659
  "status_label": "scored",
3660
  "scored": true,
@@ -3664,7 +3664,7 @@
3664
  "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
- "scope": "multi_episode_128_metadata_baseline",
3668
  "reason": null
3669
  },
3670
  {
@@ -3672,7 +3672,7 @@
3672
  "task_id": "next_subtask_forecast",
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
- "method": "128ep Metadata NN",
3676
  "status": "scored",
3677
  "status_label": "scored",
3678
  "scored": true,
@@ -3682,7 +3682,7 @@
3682
  "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
- "scope": "multi_episode_128_metadata_baseline",
3686
  "reason": null
3687
  },
3688
  {
@@ -3780,7 +3780,7 @@
3780
  "task_id": "interaction_text_prediction",
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
- "method": "128ep Metadata Simple",
3784
  "status": "unsupported_without_required_target",
3785
  "status_label": "unsupported",
3786
  "scored": false,
@@ -3790,7 +3790,7 @@
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
- "scope": "multi_episode_128_metadata_baseline",
3794
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
@@ -3798,7 +3798,7 @@
3798
  "task_id": "interaction_text_prediction",
3799
  "task_label": "Interaction Text Prediction",
3800
  "series_id": "metadata128_neural_mlp",
3801
- "method": "128ep Metadata NN",
3802
  "status": "not_supported_by_metadata_only_package",
3803
  "status_label": "not supported",
3804
  "scored": false,
@@ -3808,8 +3808,8 @@
3808
  "normalized_score": null,
3809
  "metric_key": "macro_f1",
3810
  "source": null,
3811
- "scope": "multi_episode_128_metadata_baseline",
3812
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3813
  },
3814
  {
3815
  "task_number": 15,
@@ -3906,7 +3906,7 @@
3906
  "task_id": "action_object_relation",
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
- "method": "128ep Metadata Simple",
3910
  "status": "scored",
3911
  "status_label": "scored",
3912
  "scored": true,
@@ -3916,7 +3916,7 @@
3916
  "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
- "scope": "multi_episode_128_metadata_baseline",
3920
  "reason": null
3921
  },
3922
  {
@@ -3924,7 +3924,7 @@
3924
  "task_id": "action_object_relation",
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
- "method": "128ep Metadata NN",
3928
  "status": "scored",
3929
  "status_label": "scored",
3930
  "scored": true,
@@ -3934,7 +3934,7 @@
3934
  "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
- "scope": "multi_episode_128_metadata_baseline",
3938
  "reason": null
3939
  },
3940
  {
@@ -4032,7 +4032,7 @@
4032
  "task_id": "object_set_forecast",
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
- "method": "128ep Metadata Simple",
4036
  "status": "scored",
4037
  "status_label": "scored",
4038
  "scored": true,
@@ -4042,7 +4042,7 @@
4042
  "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
- "scope": "multi_episode_128_metadata_baseline",
4046
  "reason": null
4047
  },
4048
  {
@@ -4050,7 +4050,7 @@
4050
  "task_id": "object_set_forecast",
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
- "method": "128ep Metadata NN",
4054
  "status": "scored",
4055
  "status_label": "scored",
4056
  "scored": true,
@@ -4060,7 +4060,7 @@
4060
  "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
- "scope": "multi_episode_128_metadata_baseline",
4064
  "reason": null
4065
  },
4066
  {
@@ -4158,36 +4158,36 @@
4158
  "task_id": "imu_to_hand_pose",
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
- "method": "128ep Metadata Simple",
4162
- "status": "unsupported_without_required_target",
4163
- "status_label": "unsupported",
4164
- "scored": false,
4165
  "proxy_scored": false,
4166
- "raw": null,
4167
- "raw_text": "n/a",
4168
- "normalized_score": null,
4169
  "metric_key": "mae",
4170
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
- "scope": "multi_episode_128_metadata_baseline",
4172
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
4173
  },
4174
  {
4175
  "task_number": 18,
4176
  "task_id": "imu_to_hand_pose",
4177
  "task_label": "IMU-to-Hand Pose Reconstruction",
4178
  "series_id": "metadata128_neural_mlp",
4179
- "method": "128ep Metadata NN",
4180
- "status": "not_supported_by_metadata_only_package",
4181
- "status_label": "not supported",
4182
- "scored": false,
4183
  "proxy_scored": false,
4184
- "raw": null,
4185
- "raw_text": "n/a",
4186
- "normalized_score": null,
4187
  "metric_key": "mae",
4188
- "source": null,
4189
- "scope": "multi_episode_128_metadata_baseline",
4190
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4191
  },
4192
  {
4193
  "task_number": 18,
@@ -4284,7 +4284,7 @@
4284
  "task_id": "camera_view_sync_retrieval",
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
- "method": "128ep Metadata Simple",
4288
  "status": "unsupported_without_required_target",
4289
  "status_label": "unsupported",
4290
  "scored": false,
@@ -4294,7 +4294,7 @@
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
- "scope": "multi_episode_128_metadata_baseline",
4298
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
@@ -4302,7 +4302,7 @@
4302
  "task_id": "camera_view_sync_retrieval",
4303
  "task_label": "Camera-View Synchronization Retrieval",
4304
  "series_id": "metadata128_neural_mlp",
4305
- "method": "128ep Metadata NN",
4306
  "status": "not_supported_by_metadata_only_package",
4307
  "status_label": "not supported",
4308
  "scored": false,
@@ -4312,8 +4312,8 @@
4312
  "normalized_score": null,
4313
  "metric_key": "mrr",
4314
  "source": null,
4315
- "scope": "multi_episode_128_metadata_baseline",
4316
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4317
  },
4318
  {
4319
  "task_number": 19,
@@ -4410,7 +4410,7 @@
4410
  "task_id": "time_to_transition",
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
- "method": "128ep Metadata Simple",
4414
  "status": "scored",
4415
  "status_label": "scored",
4416
  "scored": true,
@@ -4420,7 +4420,7 @@
4420
  "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
- "scope": "multi_episode_128_metadata_baseline",
4424
  "reason": null
4425
  },
4426
  {
@@ -4428,7 +4428,7 @@
4428
  "task_id": "time_to_transition",
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
- "method": "128ep Metadata NN",
4432
  "status": "scored",
4433
  "status_label": "scored",
4434
  "scored": true,
@@ -4438,7 +4438,7 @@
4438
  "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
- "scope": "multi_episode_128_metadata_baseline",
4442
  "reason": null
4443
  },
4444
  {
 
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
+ "scored_method_task_count": 103,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
 
21
  "series": [
22
  {
23
  "id": "metadata128_simple",
24
+ "label": "128ep Aligned Simple",
25
  "short_label": "128-S",
26
  "color": "#ffd166",
27
+ "kind": "partial_128_episode_aligned_baseline",
28
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
29
  "stroke_dasharray": "9 6",
30
+ "method_detail": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
+ "scored_task_count": 18,
34
+ "covered_task_count": 18,
35
  "proxy_scored_task_count": 0,
36
+ "scoreless_task_count": 2,
37
+ "unsupported_task_count": 2,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
+ "scored": 18,
41
+ "unsupported_without_required_target": 2
42
  },
43
+ "coverage_fraction": 0.9,
44
  "result_record_fraction": 1.0
45
  },
46
  {
47
  "id": "metadata128_neural_mlp",
48
+ "label": "128ep Aligned NN",
49
  "short_label": "128-NN",
50
  "color": "#f472b6",
51
+ "kind": "partial_128_episode_aligned_baseline",
52
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
53
  "stroke_dasharray": "3 6",
54
+ "method_detail": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
+ "scored_task_count": 18,
58
+ "covered_task_count": 18,
59
  "proxy_scored_task_count": 0,
60
+ "scoreless_task_count": 2,
61
+ "unsupported_task_count": 2,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
+ "not_supported_by_metadata_only_package": 2,
65
+ "scored": 18
66
  },
67
+ "coverage_fraction": 0.9,
68
  "result_record_fraction": 1.0
69
  },
70
  {
 
205
  "raw": 0.008252821966746326,
206
  "metric_key": "macro_f1",
207
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
208
+ "scope": "multi_episode_128_aligned_baseline",
209
  "status": "scored",
210
  "reason": null,
211
  "normalized_score": 0.008252821966746326,
 
216
  "raw": 0.004175793689174209,
217
  "metric_key": "macro_f1",
218
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
219
+ "scope": "multi_episode_128_aligned_baseline",
220
  "status": "scored",
221
  "reason": null,
222
  "normalized_score": 0.004175793689174209,
 
296
  "raw": 0.00019512195121951218,
297
  "metric_key": "macro_f1",
298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
299
+ "scope": "multi_episode_128_aligned_baseline",
300
  "status": "scored",
301
  "reason": null,
302
  "normalized_score": 0.00019512195121951218,
 
307
  "raw": 7.207207207207208e-05,
308
  "metric_key": "macro_f1",
309
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
310
+ "scope": "multi_episode_128_aligned_baseline",
311
  "status": "scored",
312
  "reason": null,
313
  "normalized_score": 7.207207207207208e-05,
 
387
  "raw": 0.29652162550029315,
388
  "metric_key": "macro_f1",
389
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
390
+ "scope": "multi_episode_128_aligned_baseline",
391
  "status": "scored",
392
  "reason": null,
393
  "normalized_score": 0.29652162550029315,
 
398
  "raw": 0.4841733292368365,
399
  "metric_key": "macro_f1",
400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
401
+ "scope": "multi_episode_128_aligned_baseline",
402
  "status": "scored",
403
  "reason": null,
404
  "normalized_score": 0.4841733292368365,
 
478
  "raw": 0.006514774539765508,
479
  "metric_key": "macro_f1",
480
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
481
+ "scope": "multi_episode_128_aligned_baseline",
482
  "status": "scored",
483
  "reason": null,
484
  "normalized_score": 0.006514774539765508,
 
489
  "raw": 0.004910507980164745,
490
  "metric_key": "macro_f1",
491
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
492
+ "scope": "multi_episode_128_aligned_baseline",
493
  "status": "scored",
494
  "reason": null,
495
  "normalized_score": 0.004910507980164745,
 
566
  "raw128_proxy_axis": false,
567
  "values": {
568
  "metadata128_simple": {
569
+ "raw": 8.817333221435547,
570
  "metric_key": "mpjpe",
571
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
572
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
573
+ "status": "scored",
574
+ "reason": null,
575
+ "normalized_score": 0.012231610603598841,
576
+ "raw_text": "8.817",
577
+ "status_label": "scored"
578
  },
579
  "metadata128_neural_mlp": {
580
+ "raw": 0.429434210062027,
581
  "metric_key": "mpjpe",
582
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
583
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
584
+ "status": "scored",
585
+ "reason": null,
586
+ "normalized_score": 0.25114484128127007,
587
+ "raw_text": "0.4294",
588
+ "status_label": "scored"
589
  },
590
  "raw128_simple": {
591
  "raw": 0.2729249894618988,
 
660
  "raw": 0.4381481308057444,
661
  "metric_key": "macro_f1",
662
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
663
+ "scope": "multi_episode_128_aligned_baseline",
664
  "status": "scored",
665
  "reason": null,
666
  "normalized_score": 0.4381481308057444,
 
671
  "raw": 0.5682695682695682,
672
  "metric_key": "macro_f1",
673
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
674
+ "scope": "multi_episode_128_aligned_baseline",
675
  "status": "scored",
676
  "reason": null,
677
  "normalized_score": 0.5682695682695682,
 
751
  "raw": 0.17764578833693304,
752
  "metric_key": "micro_f1",
753
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
754
+ "scope": "multi_episode_128_aligned_baseline",
755
  "status": "scored",
756
  "reason": null,
757
  "normalized_score": 0.17764578833693304,
 
762
  "raw": 0.18662723837686876,
763
  "metric_key": "micro_f1",
764
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
765
+ "scope": "multi_episode_128_aligned_baseline",
766
  "status": "scored",
767
  "reason": null,
768
  "normalized_score": 0.18662723837686876,
 
842
  "raw": 0.002332374220713973,
843
  "metric_key": "mrr",
844
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
845
+ "scope": "multi_episode_128_aligned_baseline",
846
  "status": "scored",
847
  "reason": null,
848
  "normalized_score": 0.002332374220713973,
 
853
  "raw": 0.008236799389123917,
854
  "metric_key": "mrr",
855
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
856
+ "scope": "multi_episode_128_aligned_baseline",
857
  "status": "scored",
858
  "reason": null,
859
  "normalized_score": 0.008236799389123917,
 
930
  "raw128_proxy_axis": false,
931
  "values": {
932
  "metadata128_simple": {
933
+ "raw": 0.002587692579254508,
934
  "metric_key": "mrr",
935
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
936
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
937
+ "status": "scored",
938
+ "reason": null,
939
+ "normalized_score": 0.002587692579254508,
940
+ "raw_text": "0.0026",
941
+ "status_label": "scored"
942
  },
943
  "metadata128_neural_mlp": {
944
+ "raw": 0.0026067993603646755,
945
  "metric_key": "mrr",
946
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
947
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
948
+ "status": "scored",
949
+ "reason": null,
950
+ "normalized_score": 0.0026067993603646755,
951
+ "raw_text": "0.0026",
952
+ "status_label": "scored"
953
  },
954
  "raw128_simple": {
955
  "raw": 0.003459817497059703,
 
1021
  "raw128_proxy_axis": false,
1022
  "values": {
1023
  "metadata128_simple": {
1024
+ "raw": -190.66106203944798,
1025
  "metric_key": "r2",
1026
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1027
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1028
+ "status": "scored",
1029
+ "reason": null,
1030
+ "normalized_score": 0.0,
1031
+ "raw_text": "-190.66",
1032
+ "status_label": "scored"
1033
  },
1034
  "metadata128_neural_mlp": {
1035
+ "raw": -0.43481132003942147,
1036
  "metric_key": "r2",
1037
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
1038
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1039
+ "status": "scored",
1040
+ "reason": null,
1041
+ "normalized_score": 0.0,
1042
+ "raw_text": "-0.4348",
1043
+ "status_label": "scored"
1044
  },
1045
  "raw128_simple": {
1046
  "raw": -1.3450960391924882,
 
1115
  "raw": 0.4198864140782312,
1116
  "metric_key": "f1",
1117
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1118
+ "scope": "multi_episode_128_aligned_baseline",
1119
  "status": "scored",
1120
  "reason": null,
1121
  "normalized_score": 0.4198864140782312,
 
1126
  "raw": 0.8252408266656923,
1127
  "metric_key": "f1",
1128
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1129
+ "scope": "multi_episode_128_aligned_baseline",
1130
  "status": "scored",
1131
  "reason": null,
1132
  "normalized_score": 0.8252408266656923,
 
1203
  "raw128_proxy_axis": false,
1204
  "values": {
1205
  "metadata128_simple": {
1206
+ "raw": 0.49980060227663614,
1207
  "metric_key": "f1",
1208
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1209
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1210
+ "status": "scored",
1211
+ "reason": null,
1212
+ "normalized_score": 0.49980060227663614,
1213
+ "raw_text": "0.4998",
1214
+ "status_label": "scored"
1215
  },
1216
  "metadata128_neural_mlp": {
1217
+ "raw": 0.7773773780941162,
1218
  "metric_key": "f1",
1219
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
1220
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1221
+ "status": "scored",
1222
+ "reason": null,
1223
+ "normalized_score": 0.7773773780941162,
1224
+ "raw_text": "0.7774",
1225
+ "status_label": "scored"
1226
  },
1227
  "raw128_simple": {
1228
  "raw": 0.4958867673901769,
 
1297
  "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
+ "scope": "multi_episode_128_aligned_baseline",
1301
  "status": "scored",
1302
  "reason": null,
1303
  "normalized_score": 0.004579592783699693,
 
1308
  "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
+ "scope": "multi_episode_128_aligned_baseline",
1312
  "status": "scored",
1313
  "reason": null,
1314
  "normalized_score": 0.0029821307969142615,
 
1388
  "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
+ "scope": "multi_episode_128_aligned_baseline",
1392
  "status": "scored",
1393
  "reason": null,
1394
  "normalized_score": 0.0001206030150753769,
 
1399
  "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
+ "scope": "multi_episode_128_aligned_baseline",
1403
  "status": "scored",
1404
  "reason": null,
1405
  "normalized_score": 2.086049543676662e-05,
 
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
+ "scope": "multi_episode_128_aligned_baseline",
1483
  "status": "unsupported_without_required_target",
1484
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
 
1490
  "raw": null,
1491
  "metric_key": "macro_f1",
1492
  "source": null,
1493
+ "scope": "multi_episode_128_aligned_baseline",
1494
  "status": "not_supported_by_metadata_only_package",
1495
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1496
  "normalized_score": null,
1497
  "raw_text": "n/a",
1498
  "status_label": "not supported"
 
1570
  "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
+ "scope": "multi_episode_128_aligned_baseline",
1574
  "status": "scored",
1575
  "reason": null,
1576
  "normalized_score": 0.0,
 
1581
  "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
+ "scope": "multi_episode_128_aligned_baseline",
1585
  "status": "scored",
1586
  "reason": null,
1587
  "normalized_score": 0.0,
 
1661
  "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
+ "scope": "multi_episode_128_aligned_baseline",
1665
  "status": "scored",
1666
  "reason": null,
1667
  "normalized_score": 0.17656983343047333,
 
1672
  "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
+ "scope": "multi_episode_128_aligned_baseline",
1676
  "status": "scored",
1677
  "reason": null,
1678
  "normalized_score": 0.17418550827844048,
 
1749
  "raw128_proxy_axis": false,
1750
  "values": {
1751
  "metadata128_simple": {
1752
+ "raw": 0.2294670194387436,
1753
  "metric_key": "mae",
1754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1756
+ "status": "scored",
1757
+ "reason": null,
1758
+ "normalized_score": 0.18324815505876868,
1759
+ "raw_text": "0.2295",
1760
+ "status_label": "scored"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
+ "raw": 0.2555866539478302,
1764
  "metric_key": "mae",
1765
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
1766
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1767
+ "status": "scored",
1768
+ "reason": null,
1769
+ "normalized_score": 0.16452114110609004,
1770
+ "raw_text": "0.2556",
1771
+ "status_label": "scored"
1772
  },
1773
  "raw128_simple": {
1774
  "raw": 0.22941437363624573,
 
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
+ "scope": "multi_episode_128_aligned_baseline",
1847
  "status": "unsupported_without_required_target",
1848
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
 
1854
  "raw": null,
1855
  "metric_key": "mrr",
1856
  "source": null,
1857
+ "scope": "multi_episode_128_aligned_baseline",
1858
  "status": "not_supported_by_metadata_only_package",
1859
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1860
  "normalized_score": null,
1861
  "raw_text": "n/a",
1862
  "status_label": "not supported"
 
1934
  "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
+ "scope": "multi_episode_128_aligned_baseline",
1938
  "status": "scored",
1939
  "reason": null,
1940
  "normalized_score": 0.016864874132806403,
 
1945
  "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
+ "scope": "multi_episode_128_aligned_baseline",
1949
  "status": "scored",
1950
  "reason": null,
1951
  "normalized_score": 0.25411768748242325,
 
2016
  "task_id": "timeline_action",
2017
  "task_label": "Action Recognition",
2018
  "series_id": "metadata128_simple",
2019
+ "method": "128ep Aligned Simple",
2020
  "status": "scored",
2021
  "status_label": "scored",
2022
  "scored": true,
 
2026
  "normalized_score": 0.008252821966746326,
2027
  "metric_key": "macro_f1",
2028
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2029
+ "scope": "multi_episode_128_aligned_baseline",
2030
  "reason": null
2031
  },
2032
  {
 
2034
  "task_id": "timeline_action",
2035
  "task_label": "Action Recognition",
2036
  "series_id": "metadata128_neural_mlp",
2037
+ "method": "128ep Aligned NN",
2038
  "status": "scored",
2039
  "status_label": "scored",
2040
  "scored": true,
 
2044
  "normalized_score": 0.004175793689174209,
2045
  "metric_key": "macro_f1",
2046
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2047
+ "scope": "multi_episode_128_aligned_baseline",
2048
  "reason": null
2049
  },
2050
  {
 
2142
  "task_id": "timeline_subtask",
2143
  "task_label": "Procedure Step Recognition",
2144
  "series_id": "metadata128_simple",
2145
+ "method": "128ep Aligned Simple",
2146
  "status": "scored",
2147
  "status_label": "scored",
2148
  "scored": true,
 
2152
  "normalized_score": 0.00019512195121951218,
2153
  "metric_key": "macro_f1",
2154
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2155
+ "scope": "multi_episode_128_aligned_baseline",
2156
  "reason": null
2157
  },
2158
  {
 
2160
  "task_id": "timeline_subtask",
2161
  "task_label": "Procedure Step Recognition",
2162
  "series_id": "metadata128_neural_mlp",
2163
+ "method": "128ep Aligned NN",
2164
  "status": "scored",
2165
  "status_label": "scored",
2166
  "scored": true,
 
2170
  "normalized_score": 7.207207207207208e-05,
2171
  "metric_key": "macro_f1",
2172
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2173
+ "scope": "multi_episode_128_aligned_baseline",
2174
  "reason": null
2175
  },
2176
  {
 
2268
  "task_id": "transition_detection",
2269
  "task_label": "Action Boundary Detection",
2270
  "series_id": "metadata128_simple",
2271
+ "method": "128ep Aligned Simple",
2272
  "status": "scored",
2273
  "status_label": "scored",
2274
  "scored": true,
 
2278
  "normalized_score": 0.29652162550029315,
2279
  "metric_key": "macro_f1",
2280
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2281
+ "scope": "multi_episode_128_aligned_baseline",
2282
  "reason": null
2283
  },
2284
  {
 
2286
  "task_id": "transition_detection",
2287
  "task_label": "Action Boundary Detection",
2288
  "series_id": "metadata128_neural_mlp",
2289
+ "method": "128ep Aligned NN",
2290
  "status": "scored",
2291
  "status_label": "scored",
2292
  "scored": true,
 
2296
  "normalized_score": 0.4841733292368365,
2297
  "metric_key": "macro_f1",
2298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2299
+ "scope": "multi_episode_128_aligned_baseline",
2300
  "reason": null
2301
  },
2302
  {
 
2394
  "task_id": "next_action",
2395
  "task_label": "Next-Action Prediction",
2396
  "series_id": "metadata128_simple",
2397
+ "method": "128ep Aligned Simple",
2398
  "status": "scored",
2399
  "status_label": "scored",
2400
  "scored": true,
 
2404
  "normalized_score": 0.006514774539765508,
2405
  "metric_key": "macro_f1",
2406
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
2407
+ "scope": "multi_episode_128_aligned_baseline",
2408
  "reason": null
2409
  },
2410
  {
 
2412
  "task_id": "next_action",
2413
  "task_label": "Next-Action Prediction",
2414
  "series_id": "metadata128_neural_mlp",
2415
+ "method": "128ep Aligned NN",
2416
  "status": "scored",
2417
  "status_label": "scored",
2418
  "scored": true,
 
2422
  "normalized_score": 0.004910507980164745,
2423
  "metric_key": "macro_f1",
2424
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
2425
+ "scope": "multi_episode_128_aligned_baseline",
2426
  "reason": null
2427
  },
2428
  {
 
2520
  "task_id": "hand_trajectory_forecast",
2521
  "task_label": "Hand Trajectory Forecasting",
2522
  "series_id": "metadata128_simple",
2523
+ "method": "128ep Aligned Simple",
2524
+ "status": "scored",
2525
+ "status_label": "scored",
2526
+ "scored": true,
2527
  "proxy_scored": false,
2528
+ "raw": 8.817333221435547,
2529
+ "raw_text": "8.817",
2530
+ "normalized_score": 0.012231610603598841,
2531
  "metric_key": "mpjpe",
2532
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
2533
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2534
+ "reason": null
2535
  },
2536
  {
2537
  "task_number": 5,
2538
  "task_id": "hand_trajectory_forecast",
2539
  "task_label": "Hand Trajectory Forecasting",
2540
  "series_id": "metadata128_neural_mlp",
2541
+ "method": "128ep Aligned NN",
2542
+ "status": "scored",
2543
+ "status_label": "scored",
2544
+ "scored": true,
2545
  "proxy_scored": false,
2546
+ "raw": 0.429434210062027,
2547
+ "raw_text": "0.4294",
2548
+ "normalized_score": 0.25114484128127007,
2549
  "metric_key": "mpjpe",
2550
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
2551
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2552
+ "reason": null
2553
  },
2554
  {
2555
  "task_number": 5,
 
2646
  "task_id": "contact_prediction",
2647
  "task_label": "Contact State Prediction",
2648
  "series_id": "metadata128_simple",
2649
+ "method": "128ep Aligned Simple",
2650
  "status": "scored",
2651
  "status_label": "scored",
2652
  "scored": true,
 
2656
  "normalized_score": 0.4381481308057444,
2657
  "metric_key": "macro_f1",
2658
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
2659
+ "scope": "multi_episode_128_aligned_baseline",
2660
  "reason": null
2661
  },
2662
  {
 
2664
  "task_id": "contact_prediction",
2665
  "task_label": "Contact State Prediction",
2666
  "series_id": "metadata128_neural_mlp",
2667
+ "method": "128ep Aligned NN",
2668
  "status": "scored",
2669
  "status_label": "scored",
2670
  "scored": true,
 
2674
  "normalized_score": 0.5682695682695682,
2675
  "metric_key": "macro_f1",
2676
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
2677
+ "scope": "multi_episode_128_aligned_baseline",
2678
  "reason": null
2679
  },
2680
  {
 
2772
  "task_id": "object_relevance",
2773
  "task_label": "Object Relevance Prediction",
2774
  "series_id": "metadata128_simple",
2775
+ "method": "128ep Aligned Simple",
2776
  "status": "scored",
2777
  "status_label": "scored",
2778
  "scored": true,
 
2782
  "normalized_score": 0.17764578833693304,
2783
  "metric_key": "micro_f1",
2784
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
2785
+ "scope": "multi_episode_128_aligned_baseline",
2786
  "reason": null
2787
  },
2788
  {
 
2790
  "task_id": "object_relevance",
2791
  "task_label": "Object Relevance Prediction",
2792
  "series_id": "metadata128_neural_mlp",
2793
+ "method": "128ep Aligned NN",
2794
  "status": "scored",
2795
  "status_label": "scored",
2796
  "scored": true,
 
2800
  "normalized_score": 0.18662723837686876,
2801
  "metric_key": "micro_f1",
2802
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
2803
+ "scope": "multi_episode_128_aligned_baseline",
2804
  "reason": null
2805
  },
2806
  {
 
2898
  "task_id": "caption_grounding",
2899
  "task_label": "Language Grounding",
2900
  "series_id": "metadata128_simple",
2901
+ "method": "128ep Aligned Simple",
2902
  "status": "scored",
2903
  "status_label": "scored",
2904
  "scored": true,
 
2908
  "normalized_score": 0.002332374220713973,
2909
  "metric_key": "mrr",
2910
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
2911
+ "scope": "multi_episode_128_aligned_baseline",
2912
  "reason": null
2913
  },
2914
  {
 
2916
  "task_id": "caption_grounding",
2917
  "task_label": "Language Grounding",
2918
  "series_id": "metadata128_neural_mlp",
2919
+ "method": "128ep Aligned NN",
2920
  "status": "scored",
2921
  "status_label": "scored",
2922
  "scored": true,
 
2926
  "normalized_score": 0.008236799389123917,
2927
  "metric_key": "mrr",
2928
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
2929
+ "scope": "multi_episode_128_aligned_baseline",
2930
  "reason": null
2931
  },
2932
  {
 
3024
  "task_id": "cross_modal_retrieval",
3025
  "task_label": "Cross-Modal Retrieval",
3026
  "series_id": "metadata128_simple",
3027
+ "method": "128ep Aligned Simple",
3028
+ "status": "scored",
3029
+ "status_label": "scored",
3030
+ "scored": true,
3031
  "proxy_scored": false,
3032
+ "raw": 0.002587692579254508,
3033
+ "raw_text": "0.0026",
3034
+ "normalized_score": 0.002587692579254508,
3035
  "metric_key": "mrr",
3036
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3037
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3038
+ "reason": null
3039
  },
3040
  {
3041
  "task_number": 9,
3042
  "task_id": "cross_modal_retrieval",
3043
  "task_label": "Cross-Modal Retrieval",
3044
  "series_id": "metadata128_neural_mlp",
3045
+ "method": "128ep Aligned NN",
3046
+ "status": "scored",
3047
+ "status_label": "scored",
3048
+ "scored": true,
3049
  "proxy_scored": false,
3050
+ "raw": 0.0026067993603646755,
3051
+ "raw_text": "0.0026",
3052
+ "normalized_score": 0.0026067993603646755,
3053
  "metric_key": "mrr",
3054
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
3055
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3056
+ "reason": null
3057
  },
3058
  {
3059
  "task_number": 9,
 
3150
  "task_id": "modality_reconstruction",
3151
  "task_label": "Cross-Modal Reconstruction",
3152
  "series_id": "metadata128_simple",
3153
+ "method": "128ep Aligned Simple",
3154
+ "status": "scored",
3155
+ "status_label": "scored",
3156
+ "scored": true,
3157
  "proxy_scored": false,
3158
+ "raw": -190.66106203944798,
3159
+ "raw_text": "-190.66",
3160
+ "normalized_score": 0.0,
3161
  "metric_key": "r2",
3162
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
3163
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3164
+ "reason": null
3165
  },
3166
  {
3167
  "task_number": 10,
3168
  "task_id": "modality_reconstruction",
3169
  "task_label": "Cross-Modal Reconstruction",
3170
  "series_id": "metadata128_neural_mlp",
3171
+ "method": "128ep Aligned NN",
3172
+ "status": "scored",
3173
+ "status_label": "scored",
3174
+ "scored": true,
3175
  "proxy_scored": false,
3176
+ "raw": -0.43481132003942147,
3177
+ "raw_text": "-0.4348",
3178
+ "normalized_score": 0.0,
3179
  "metric_key": "r2",
3180
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
3181
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3182
+ "reason": null
3183
  },
3184
  {
3185
  "task_number": 10,
 
3276
  "task_id": "temporal_order",
3277
  "task_label": "Temporal Order Verification",
3278
  "series_id": "metadata128_simple",
3279
+ "method": "128ep Aligned Simple",
3280
  "status": "scored",
3281
  "status_label": "scored",
3282
  "scored": true,
 
3286
  "normalized_score": 0.4198864140782312,
3287
  "metric_key": "f1",
3288
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
3289
+ "scope": "multi_episode_128_aligned_baseline",
3290
  "reason": null
3291
  },
3292
  {
 
3294
  "task_id": "temporal_order",
3295
  "task_label": "Temporal Order Verification",
3296
  "series_id": "metadata128_neural_mlp",
3297
+ "method": "128ep Aligned NN",
3298
  "status": "scored",
3299
  "status_label": "scored",
3300
  "scored": true,
 
3304
  "normalized_score": 0.8252408266656923,
3305
  "metric_key": "f1",
3306
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
3307
+ "scope": "multi_episode_128_aligned_baseline",
3308
  "reason": null
3309
  },
3310
  {
 
3402
  "task_id": "misalignment_detection",
3403
  "task_label": "Multimodal Synchronization Detection",
3404
  "series_id": "metadata128_simple",
3405
+ "method": "128ep Aligned Simple",
3406
+ "status": "scored",
3407
+ "status_label": "scored",
3408
+ "scored": true,
3409
  "proxy_scored": false,
3410
+ "raw": 0.49980060227663614,
3411
+ "raw_text": "0.4998",
3412
+ "normalized_score": 0.49980060227663614,
3413
  "metric_key": "f1",
3414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
3415
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3416
+ "reason": null
3417
  },
3418
  {
3419
  "task_number": 12,
3420
  "task_id": "misalignment_detection",
3421
  "task_label": "Multimodal Synchronization Detection",
3422
  "series_id": "metadata128_neural_mlp",
3423
+ "method": "128ep Aligned NN",
3424
+ "status": "scored",
3425
+ "status_label": "scored",
3426
+ "scored": true,
3427
  "proxy_scored": false,
3428
+ "raw": 0.7773773780941162,
3429
+ "raw_text": "0.7774",
3430
+ "normalized_score": 0.7773773780941162,
3431
  "metric_key": "f1",
3432
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
3433
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3434
+ "reason": null
3435
  },
3436
  {
3437
  "task_number": 12,
 
3528
  "task_id": "long_horizon_next_action",
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
+ "method": "128ep Aligned Simple",
3532
  "status": "scored",
3533
  "status_label": "scored",
3534
  "scored": true,
 
3538
  "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
+ "scope": "multi_episode_128_aligned_baseline",
3542
  "reason": null
3543
  },
3544
  {
 
3546
  "task_id": "long_horizon_next_action",
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
+ "method": "128ep Aligned NN",
3550
  "status": "scored",
3551
  "status_label": "scored",
3552
  "scored": true,
 
3556
  "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
+ "scope": "multi_episode_128_aligned_baseline",
3560
  "reason": null
3561
  },
3562
  {
 
3654
  "task_id": "next_subtask_forecast",
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
+ "method": "128ep Aligned Simple",
3658
  "status": "scored",
3659
  "status_label": "scored",
3660
  "scored": true,
 
3664
  "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
+ "scope": "multi_episode_128_aligned_baseline",
3668
  "reason": null
3669
  },
3670
  {
 
3672
  "task_id": "next_subtask_forecast",
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
+ "method": "128ep Aligned NN",
3676
  "status": "scored",
3677
  "status_label": "scored",
3678
  "scored": true,
 
3682
  "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
+ "scope": "multi_episode_128_aligned_baseline",
3686
  "reason": null
3687
  },
3688
  {
 
3780
  "task_id": "interaction_text_prediction",
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
+ "method": "128ep Aligned Simple",
3784
  "status": "unsupported_without_required_target",
3785
  "status_label": "unsupported",
3786
  "scored": false,
 
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
+ "scope": "multi_episode_128_aligned_baseline",
3794
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
 
3798
  "task_id": "interaction_text_prediction",
3799
  "task_label": "Interaction Text Prediction",
3800
  "series_id": "metadata128_neural_mlp",
3801
+ "method": "128ep Aligned NN",
3802
  "status": "not_supported_by_metadata_only_package",
3803
  "status_label": "not supported",
3804
  "scored": false,
 
3808
  "normalized_score": null,
3809
  "metric_key": "macro_f1",
3810
  "source": null,
3811
+ "scope": "multi_episode_128_aligned_baseline",
3812
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
3813
  },
3814
  {
3815
  "task_number": 15,
 
3906
  "task_id": "action_object_relation",
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
+ "method": "128ep Aligned Simple",
3910
  "status": "scored",
3911
  "status_label": "scored",
3912
  "scored": true,
 
3916
  "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
+ "scope": "multi_episode_128_aligned_baseline",
3920
  "reason": null
3921
  },
3922
  {
 
3924
  "task_id": "action_object_relation",
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
+ "method": "128ep Aligned NN",
3928
  "status": "scored",
3929
  "status_label": "scored",
3930
  "scored": true,
 
3934
  "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
+ "scope": "multi_episode_128_aligned_baseline",
3938
  "reason": null
3939
  },
3940
  {
 
4032
  "task_id": "object_set_forecast",
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
+ "method": "128ep Aligned Simple",
4036
  "status": "scored",
4037
  "status_label": "scored",
4038
  "scored": true,
 
4042
  "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
+ "scope": "multi_episode_128_aligned_baseline",
4046
  "reason": null
4047
  },
4048
  {
 
4050
  "task_id": "object_set_forecast",
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
+ "method": "128ep Aligned NN",
4054
  "status": "scored",
4055
  "status_label": "scored",
4056
  "scored": true,
 
4060
  "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
+ "scope": "multi_episode_128_aligned_baseline",
4064
  "reason": null
4065
  },
4066
  {
 
4158
  "task_id": "imu_to_hand_pose",
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
+ "method": "128ep Aligned Simple",
4162
+ "status": "scored",
4163
+ "status_label": "scored",
4164
+ "scored": true,
4165
  "proxy_scored": false,
4166
+ "raw": 0.2294670194387436,
4167
+ "raw_text": "0.2295",
4168
+ "normalized_score": 0.18324815505876868,
4169
  "metric_key": "mae",
4170
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4172
+ "reason": null
4173
  },
4174
  {
4175
  "task_number": 18,
4176
  "task_id": "imu_to_hand_pose",
4177
  "task_label": "IMU-to-Hand Pose Reconstruction",
4178
  "series_id": "metadata128_neural_mlp",
4179
+ "method": "128ep Aligned NN",
4180
+ "status": "scored",
4181
+ "status_label": "scored",
4182
+ "scored": true,
4183
  "proxy_scored": false,
4184
+ "raw": 0.2555866539478302,
4185
+ "raw_text": "0.2556",
4186
+ "normalized_score": 0.16452114110609004,
4187
  "metric_key": "mae",
4188
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
4189
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4190
+ "reason": null
4191
  },
4192
  {
4193
  "task_number": 18,
 
4284
  "task_id": "camera_view_sync_retrieval",
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
+ "method": "128ep Aligned Simple",
4288
  "status": "unsupported_without_required_target",
4289
  "status_label": "unsupported",
4290
  "scored": false,
 
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
+ "scope": "multi_episode_128_aligned_baseline",
4298
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
 
4302
  "task_id": "camera_view_sync_retrieval",
4303
  "task_label": "Camera-View Synchronization Retrieval",
4304
  "series_id": "metadata128_neural_mlp",
4305
+ "method": "128ep Aligned NN",
4306
  "status": "not_supported_by_metadata_only_package",
4307
  "status_label": "not supported",
4308
  "scored": false,
 
4312
  "normalized_score": null,
4313
  "metric_key": "mrr",
4314
  "source": null,
4315
+ "scope": "multi_episode_128_aligned_baseline",
4316
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
4317
  },
4318
  {
4319
  "task_number": 19,
 
4410
  "task_id": "time_to_transition",
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
+ "method": "128ep Aligned Simple",
4414
  "status": "scored",
4415
  "status_label": "scored",
4416
  "scored": true,
 
4420
  "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
+ "scope": "multi_episode_128_aligned_baseline",
4424
  "reason": null
4425
  },
4426
  {
 
4428
  "task_id": "time_to_transition",
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
+ "method": "128ep Aligned NN",
4432
  "status": "scored",
4433
  "status_label": "scored",
4434
  "scored": true,
 
4438
  "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
+ "scope": "multi_episode_128_aligned_baseline",
4442
  "reason": null
4443
  },
4444
  {
docs/data/mirror_parity.json CHANGED
The diff for this file is too large to render. See raw diff
 
docs/data/omni_model_comparison.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
- "generated_at_utc": "2026-06-13T18:14:42+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
 
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
+ "generated_at_utc": "2026-06-18T12:52:47+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
docs/data/public_surface_qa.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
@@ -18,7 +18,7 @@
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
- "generated_at_utc": "2026-06-18T11:41:43+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
@@ -28,27 +28,27 @@
28
  "task_surface_integrity": {
29
  "exists": true,
30
  "status": "pass",
31
- "generated_at_utc": "2026-06-18T11:18:04+00:00"
32
  },
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
- "generated_at_utc": "2026-06-18T11:18:04+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
- "generated_at_utc": "2026-06-18T11:18:06+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
- "generated_at_utc": "2026-06-18T11:42:48+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
- "generated_at_utc": "2026-06-18T11:43:59+00:00"
52
  }
53
  },
54
  "failures": {}
 
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:53:13+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
 
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
+ "generated_at_utc": "2026-06-18T12:09:46+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
 
28
  "task_surface_integrity": {
29
  "exists": true,
30
  "status": "pass",
31
+ "generated_at_utc": "2026-06-18T12:09:25+00:00"
32
  },
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
+ "generated_at_utc": "2026-06-18T12:09:45+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
+ "generated_at_utc": "2026-06-18T12:09:48+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
+ "generated_at_utc": "2026-06-18T12:24:04+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
+ "generated_at_utc": "2026-06-18T12:24:00+00:00"
52
  }
53
  },
54
  "failures": {}
docs/data/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:09:25+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:54:18+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
metrics/artifact_index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-18T12:09:24+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
@@ -290,8 +290,8 @@
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
- "bytes": 73236,
294
- "sha256": "76acae0de25d51413e7e6f11021163e7d9909cfe95d65bf6b02e74043d429e2d"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
@@ -599,7 +599,7 @@
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
- "sha256": "ae089cc0df132b63365e03b2157a488b5d1569567c0374d7621bcd347da62c9e"
603
  },
604
  {
605
  "id": "source_alignment_validator",
@@ -719,8 +719,8 @@
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
- "bytes": 230297,
723
- "sha256": "437874b1633e73165e3300f55580394663a44759c848288e696859b98f8aad32"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
@@ -730,8 +730,8 @@
730
  "surface": "website_hf",
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
- "bytes": 50973,
734
- "sha256": "38cb43512f2ac40feeb62333bdea89b3a55e5b48468beb8982cf22536f794ecf"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
@@ -741,8 +741,8 @@
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
- "bytes": 186443,
745
- "sha256": "55e758e8703f406889022976d0ba055181212305c9a7246e899463e0c3c3b554"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
@@ -752,8 +752,8 @@
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
- "bytes": 129242,
756
- "sha256": "64fb700d51f536edf11291799b6173cf9ae8dd7a41178aac348b8207ed4b1e42"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
@@ -763,8 +763,8 @@
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
- "bytes": 4026,
767
- "sha256": "55e949fc30419a52f7f5ec4dd9544a11b253b076f8e3637ec3e92b3d61a89aab"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
@@ -774,8 +774,8 @@
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
- "bytes": 46902,
778
- "sha256": "2b64dbd013625852679f9b91d25c48d1ed197fec727883b4fe37088b2d594784"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
@@ -785,8 +785,8 @@
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
- "bytes": 13387,
789
- "sha256": "d33461eb704f8e92545b6b54d9fc509e617fbacc9ca9894ac851ca9c3dec0fec"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
@@ -796,8 +796,8 @@
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
- "bytes": 51953,
800
- "sha256": "19c001f10319946ef0e4921064f8a012836f29e7c8b272f900c257169faf46a1"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
@@ -818,8 +818,8 @@
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
- "bytes": 45937,
822
- "sha256": "b504b1b9c5cad0caa8c822d5bb2971c1b708251cf7b9ef587a92db2c12751e97"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
@@ -829,8 +829,8 @@
829
  "surface": "repo_hf",
830
  "shows": "Regenerates the direction-aware radar chart and machine-readable metric overlay JSON.",
831
  "exists": true,
832
- "bytes": 52388,
833
- "sha256": "f4803360cfd02383a1942a93a5845308db936b479a5b906719e46e192f3ef142"
834
  },
835
  {
836
  "id": "task_method_20_gap_audit_builder",
@@ -906,8 +906,8 @@
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
- "bytes": 109248,
910
- "sha256": "5e7f3085be5012eb3dda46f9c7b5b7c0ae22d6a0fbce71d6e99dd317fecc12af"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
@@ -1310,7 +1310,7 @@
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
- "bytes": 994053,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
@@ -1620,7 +1620,7 @@
1620
  "shows": "Reader-facing comparison of the single-episode task suite, 128-episode aligned baselines, Qwen3-Omni packages, and Cosmos3 future-window branch.",
1621
  "exists": true,
1622
  "bytes": 15999,
1623
- "sha256": "30053bdea6c417ab02f98d99d8e80cd7e304bc3a9dfacbf599139d3221c02c8f"
1624
  },
1625
  {
1626
  "id": "omni_model_comparison_json",
@@ -1631,7 +1631,7 @@
1631
  "shows": "Machine-readable comparison of the current result versions, per-task aligned baselines, verified Qwen3 packages, and Cosmos3 package.",
1632
  "exists": true,
1633
  "bytes": 81866,
1634
- "sha256": "1c9d4ba370661b0e0cb7104e9a51abdc3fe91a440ae86e748b10b719d1d613cc"
1635
  },
1636
  {
1637
  "id": "cosmos3_nano_verified_summary",
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-18T12:52:48+00:00",
4
  "status": "pass",
5
  "artifact_count": 213,
6
  "missing": [],
 
290
  "surface": "repo_hf",
291
  "shows": "Runs simple metadata and neural MLP baselines on the same selected 96/16/16 episode split used by the Qwen3-Omni diagnostic pilot.",
292
  "exists": true,
293
+ "bytes": 74368,
294
+ "sha256": "6f54bfb963d5102ebd61eb8f8b6d8f6919db673378c9d5940d89ec5ea6f3d4b2"
295
  },
296
  {
297
  "id": "task_suite_enhancement_128",
 
599
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
600
  "exists": true,
601
  "bytes": 4432,
602
+ "sha256": "8ddadfe15ba8779e82879f965ff50bceb9c573bc942c3ecf176fbf20e5faeaea"
603
  },
604
  {
605
  "id": "source_alignment_validator",
 
719
  "surface": "website_hf",
720
  "shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
721
  "exists": true,
722
+ "bytes": 229299,
723
+ "sha256": "30f338139df391c36941da0b759cc237366ee43d006bfff2d2e43481cc2d2a63"
724
  },
725
  {
726
  "id": "single_episode_task_model_radar_json",
 
730
  "surface": "website_hf",
731
  "shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
732
  "exists": true,
733
+ "bytes": 51064,
734
+ "sha256": "52001c8ac081b14827a8a55cae21da8fd32516f81365d7dda1047ef68096eef8"
735
  },
736
  {
737
  "id": "episode128_task_model_radar_json",
 
741
  "surface": "website_hf",
742
  "shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
743
  "exists": true,
744
+ "bytes": 185447,
745
+ "sha256": "e9994f42a1e086411748e1233761c84a8dcd564898c216454a8872c2f4d4f213"
746
  },
747
  {
748
  "id": "task_method_20_result_matrix_json",
 
752
  "surface": "website_hf",
753
  "shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
754
  "exists": true,
755
+ "bytes": 128794,
756
+ "sha256": "1bce6001518b314fc8a5e86eab56521aa9718d09d787765d10caee4d791e9809"
757
  },
758
  {
759
  "id": "task_method_20_result_matrix",
 
763
  "surface": "repo_hf",
764
  "shows": "Reader-facing table that separates 20 records per method from numeric scored axes, documented raw128 proxy scores, unsupported metadata targets, and model targets not evaluated in verified packages.",
765
  "exists": true,
766
+ "bytes": 3954,
767
+ "sha256": "01b21d83954f700e4b061e96b1f58c6af474d79a2caaff1bfcff4854b66722ca"
768
  },
769
  {
770
  "id": "task_method_20_gap_audit_json",
 
774
  "surface": "website_hf",
775
  "shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
776
  "exists": true,
777
+ "bytes": 35883,
778
+ "sha256": "9336756d67d2488a28c4bb9c282f65230031eeb8dddd087a11fd441d8e61539b"
779
  },
780
  {
781
  "id": "task_method_20_gap_audit",
 
785
  "surface": "repo_hf",
786
  "shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
787
  "exists": true,
788
+ "bytes": 10286,
789
+ "sha256": "45969b72e9a3ff8c40d958ea819e725fd4df5d90424ccdffd1c64fd1a5152063"
790
  },
791
  {
792
  "id": "unified_task_model_radar_chart",
 
796
  "surface": "website_hf",
797
  "shows": "Compares minimal and neural MLP baselines across all 20 tasks, with Qwen3/Cosmos task-aligned model overlays.",
798
  "exists": true,
799
+ "bytes": 53553,
800
+ "sha256": "ec9a8bf0f5814106ddb8e62d0941c7cc07d1b8a29323a61a400319ffe6bd3485"
801
  },
802
  {
803
  "id": "single_episode_task_model_radar_chart",
 
818
  "surface": "website_hf",
819
  "shows": "Separates the selected 128-episode methods: raw-feature simple/NN as complete 20/20 scored polygons and metadata/Qwen/Cosmos as task-aligned overlays.",
820
  "exists": true,
821
+ "bytes": 47540,
822
+ "sha256": "0c2283a04fe401851b8b313de3ba383d24185262f4c6500d12fa0a3b8c0c4443"
823
  },
824
  {
825
  "id": "unified_task_model_radar_builder",
 
829
  "surface": "repo_hf",
830
  "shows": "Regenerates the direction-aware radar chart and machine-readable metric overlay JSON.",
831
  "exists": true,
832
+ "bytes": 52743,
833
+ "sha256": "e081f88e9f31934b24820c5cbffb957bb235a3275f553e573ab44e5c3d03c99a"
834
  },
835
  {
836
  "id": "task_method_20_gap_audit_builder",
 
906
  "surface": "repo_hf",
907
  "shows": "Rerun of JSONL metadata/text simple and neural baselines over the selected 128-episode multiscale dataset; supports radar overlays on JSONL-supported task axes.",
908
  "exists": true,
909
+ "bytes": 124232,
910
+ "sha256": "dba221a6ed8a6a84602dc21a1055cbb4444c03775f74b55e5d72861941820ac8"
911
  },
912
  {
913
  "id": "a100_128_raw20_task_baselines",
 
1310
  "volatile": true,
1311
  "shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
1312
  "exists": true,
1313
+ "bytes": 1059014,
1314
  "hash_policy": "existence_and_size_only"
1315
  },
1316
  {
 
1620
  "shows": "Reader-facing comparison of the single-episode task suite, 128-episode aligned baselines, Qwen3-Omni packages, and Cosmos3 future-window branch.",
1621
  "exists": true,
1622
  "bytes": 15999,
1623
+ "sha256": "dd65ae9077acbce91870b182d701db367a9c79eb287aeee2a1e165ec4915e5f3"
1624
  },
1625
  {
1626
  "id": "omni_model_comparison_json",
 
1631
  "shows": "Machine-readable comparison of the current result versions, per-task aligned baselines, verified Qwen3 packages, and Cosmos3 package.",
1632
  "exists": true,
1633
  "bytes": 81866,
1634
+ "sha256": "dd7a599117defcc1fd783c3134b6b3fc92f2ec2190ea517624cb215b931bd87a"
1635
  },
1636
  {
1637
  "id": "cosmos3_nano_verified_summary",
metrics/episode128_task_model_radar.json CHANGED
@@ -1,19 +1,19 @@
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
- "scored_method_task_count": 93,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
@@ -21,50 +21,50 @@
21
  "series": [
22
  {
23
  "id": "metadata128_simple",
24
- "label": "128ep Metadata Simple",
25
  "short_label": "128-S",
26
  "color": "#ffd166",
27
- "kind": "partial_128_episode_metadata_baseline",
28
- "scope": "128 selected episodes, JSONL metadata/text only",
29
  "stroke_dasharray": "9 6",
30
- "method_detail": "128-episode JSONL metadata/text simple baselines.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
- "scored_task_count": 13,
34
- "covered_task_count": 13,
35
  "proxy_scored_task_count": 0,
36
- "scoreless_task_count": 7,
37
- "unsupported_task_count": 7,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
- "scored": 13,
41
- "unsupported_without_required_target": 7
42
  },
43
- "coverage_fraction": 0.65,
44
  "result_record_fraction": 1.0
45
  },
46
  {
47
  "id": "metadata128_neural_mlp",
48
- "label": "128ep Metadata NN",
49
  "short_label": "128-NN",
50
  "color": "#f472b6",
51
- "kind": "partial_128_episode_metadata_baseline",
52
- "scope": "128 selected episodes, JSONL metadata/text only",
53
  "stroke_dasharray": "3 6",
54
- "method_detail": "128-episode JSONL metadata/text MLP baselines.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
- "scored_task_count": 13,
58
- "covered_task_count": 13,
59
  "proxy_scored_task_count": 0,
60
- "scoreless_task_count": 7,
61
- "unsupported_task_count": 7,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
- "not_supported_by_metadata_only_package": 7,
65
- "scored": 13
66
  },
67
- "coverage_fraction": 0.65,
68
  "result_record_fraction": 1.0
69
  },
70
  {
@@ -205,7 +205,7 @@
205
  "raw": 0.008252821966746326,
206
  "metric_key": "macro_f1",
207
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
208
- "scope": "multi_episode_128_metadata_baseline",
209
  "status": "scored",
210
  "reason": null,
211
  "normalized_score": 0.008252821966746326,
@@ -216,7 +216,7 @@
216
  "raw": 0.004175793689174209,
217
  "metric_key": "macro_f1",
218
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
219
- "scope": "multi_episode_128_metadata_baseline",
220
  "status": "scored",
221
  "reason": null,
222
  "normalized_score": 0.004175793689174209,
@@ -296,7 +296,7 @@
296
  "raw": 0.00019512195121951218,
297
  "metric_key": "macro_f1",
298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
299
- "scope": "multi_episode_128_metadata_baseline",
300
  "status": "scored",
301
  "reason": null,
302
  "normalized_score": 0.00019512195121951218,
@@ -307,7 +307,7 @@
307
  "raw": 7.207207207207208e-05,
308
  "metric_key": "macro_f1",
309
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
310
- "scope": "multi_episode_128_metadata_baseline",
311
  "status": "scored",
312
  "reason": null,
313
  "normalized_score": 7.207207207207208e-05,
@@ -387,7 +387,7 @@
387
  "raw": 0.29652162550029315,
388
  "metric_key": "macro_f1",
389
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
390
- "scope": "multi_episode_128_metadata_baseline",
391
  "status": "scored",
392
  "reason": null,
393
  "normalized_score": 0.29652162550029315,
@@ -398,7 +398,7 @@
398
  "raw": 0.4841733292368365,
399
  "metric_key": "macro_f1",
400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
401
- "scope": "multi_episode_128_metadata_baseline",
402
  "status": "scored",
403
  "reason": null,
404
  "normalized_score": 0.4841733292368365,
@@ -478,7 +478,7 @@
478
  "raw": 0.006514774539765508,
479
  "metric_key": "macro_f1",
480
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
481
- "scope": "multi_episode_128_metadata_baseline",
482
  "status": "scored",
483
  "reason": null,
484
  "normalized_score": 0.006514774539765508,
@@ -489,7 +489,7 @@
489
  "raw": 0.004910507980164745,
490
  "metric_key": "macro_f1",
491
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
492
- "scope": "multi_episode_128_metadata_baseline",
493
  "status": "scored",
494
  "reason": null,
495
  "normalized_score": 0.004910507980164745,
@@ -566,26 +566,26 @@
566
  "raw128_proxy_axis": false,
567
  "values": {
568
  "metadata128_simple": {
569
- "raw": null,
570
  "metric_key": "mpjpe",
571
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
572
- "scope": "multi_episode_128_metadata_baseline",
573
- "status": "unsupported_without_required_target",
574
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package",
575
- "normalized_score": null,
576
- "raw_text": "n/a",
577
- "status_label": "unsupported"
578
  },
579
  "metadata128_neural_mlp": {
580
- "raw": null,
581
  "metric_key": "mpjpe",
582
- "source": null,
583
- "scope": "multi_episode_128_metadata_baseline",
584
- "status": "not_supported_by_metadata_only_package",
585
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
586
- "normalized_score": null,
587
- "raw_text": "n/a",
588
- "status_label": "not supported"
589
  },
590
  "raw128_simple": {
591
  "raw": 0.2729249894618988,
@@ -660,7 +660,7 @@
660
  "raw": 0.4381481308057444,
661
  "metric_key": "macro_f1",
662
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
663
- "scope": "multi_episode_128_metadata_baseline",
664
  "status": "scored",
665
  "reason": null,
666
  "normalized_score": 0.4381481308057444,
@@ -671,7 +671,7 @@
671
  "raw": 0.5682695682695682,
672
  "metric_key": "macro_f1",
673
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
674
- "scope": "multi_episode_128_metadata_baseline",
675
  "status": "scored",
676
  "reason": null,
677
  "normalized_score": 0.5682695682695682,
@@ -751,7 +751,7 @@
751
  "raw": 0.17764578833693304,
752
  "metric_key": "micro_f1",
753
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
754
- "scope": "multi_episode_128_metadata_baseline",
755
  "status": "scored",
756
  "reason": null,
757
  "normalized_score": 0.17764578833693304,
@@ -762,7 +762,7 @@
762
  "raw": 0.18662723837686876,
763
  "metric_key": "micro_f1",
764
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
765
- "scope": "multi_episode_128_metadata_baseline",
766
  "status": "scored",
767
  "reason": null,
768
  "normalized_score": 0.18662723837686876,
@@ -842,7 +842,7 @@
842
  "raw": 0.002332374220713973,
843
  "metric_key": "mrr",
844
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
845
- "scope": "multi_episode_128_metadata_baseline",
846
  "status": "scored",
847
  "reason": null,
848
  "normalized_score": 0.002332374220713973,
@@ -853,7 +853,7 @@
853
  "raw": 0.008236799389123917,
854
  "metric_key": "mrr",
855
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
856
- "scope": "multi_episode_128_metadata_baseline",
857
  "status": "scored",
858
  "reason": null,
859
  "normalized_score": 0.008236799389123917,
@@ -930,26 +930,26 @@
930
  "raw128_proxy_axis": false,
931
  "values": {
932
  "metadata128_simple": {
933
- "raw": null,
934
  "metric_key": "mrr",
935
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
936
- "scope": "multi_episode_128_metadata_baseline",
937
- "status": "unsupported_without_required_target",
938
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package",
939
- "normalized_score": null,
940
- "raw_text": "n/a",
941
- "status_label": "unsupported"
942
  },
943
  "metadata128_neural_mlp": {
944
- "raw": null,
945
  "metric_key": "mrr",
946
- "source": null,
947
- "scope": "multi_episode_128_metadata_baseline",
948
- "status": "not_supported_by_metadata_only_package",
949
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
950
- "normalized_score": null,
951
- "raw_text": "n/a",
952
- "status_label": "not supported"
953
  },
954
  "raw128_simple": {
955
  "raw": 0.003459817497059703,
@@ -1021,26 +1021,26 @@
1021
  "raw128_proxy_axis": false,
1022
  "values": {
1023
  "metadata128_simple": {
1024
- "raw": null,
1025
  "metric_key": "r2",
1026
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1027
- "scope": "multi_episode_128_metadata_baseline",
1028
- "status": "unsupported_without_required_target",
1029
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package",
1030
- "normalized_score": null,
1031
- "raw_text": "n/a",
1032
- "status_label": "unsupported"
1033
  },
1034
  "metadata128_neural_mlp": {
1035
- "raw": null,
1036
  "metric_key": "r2",
1037
- "source": null,
1038
- "scope": "multi_episode_128_metadata_baseline",
1039
- "status": "not_supported_by_metadata_only_package",
1040
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1041
- "normalized_score": null,
1042
- "raw_text": "n/a",
1043
- "status_label": "not supported"
1044
  },
1045
  "raw128_simple": {
1046
  "raw": -1.3450960391924882,
@@ -1115,7 +1115,7 @@
1115
  "raw": 0.4198864140782312,
1116
  "metric_key": "f1",
1117
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1118
- "scope": "multi_episode_128_metadata_baseline",
1119
  "status": "scored",
1120
  "reason": null,
1121
  "normalized_score": 0.4198864140782312,
@@ -1126,7 +1126,7 @@
1126
  "raw": 0.8252408266656923,
1127
  "metric_key": "f1",
1128
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1129
- "scope": "multi_episode_128_metadata_baseline",
1130
  "status": "scored",
1131
  "reason": null,
1132
  "normalized_score": 0.8252408266656923,
@@ -1203,26 +1203,26 @@
1203
  "raw128_proxy_axis": false,
1204
  "values": {
1205
  "metadata128_simple": {
1206
- "raw": null,
1207
  "metric_key": "f1",
1208
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1209
- "scope": "multi_episode_128_metadata_baseline",
1210
- "status": "unsupported_without_required_target",
1211
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone",
1212
- "normalized_score": null,
1213
- "raw_text": "n/a",
1214
- "status_label": "unsupported"
1215
  },
1216
  "metadata128_neural_mlp": {
1217
- "raw": null,
1218
  "metric_key": "f1",
1219
- "source": null,
1220
- "scope": "multi_episode_128_metadata_baseline",
1221
- "status": "not_supported_by_metadata_only_package",
1222
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1223
- "normalized_score": null,
1224
- "raw_text": "n/a",
1225
- "status_label": "not supported"
1226
  },
1227
  "raw128_simple": {
1228
  "raw": 0.4958867673901769,
@@ -1297,7 +1297,7 @@
1297
  "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
- "scope": "multi_episode_128_metadata_baseline",
1301
  "status": "scored",
1302
  "reason": null,
1303
  "normalized_score": 0.004579592783699693,
@@ -1308,7 +1308,7 @@
1308
  "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
- "scope": "multi_episode_128_metadata_baseline",
1312
  "status": "scored",
1313
  "reason": null,
1314
  "normalized_score": 0.0029821307969142615,
@@ -1388,7 +1388,7 @@
1388
  "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
- "scope": "multi_episode_128_metadata_baseline",
1392
  "status": "scored",
1393
  "reason": null,
1394
  "normalized_score": 0.0001206030150753769,
@@ -1399,7 +1399,7 @@
1399
  "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
- "scope": "multi_episode_128_metadata_baseline",
1403
  "status": "scored",
1404
  "reason": null,
1405
  "normalized_score": 2.086049543676662e-05,
@@ -1479,7 +1479,7 @@
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
- "scope": "multi_episode_128_metadata_baseline",
1483
  "status": "unsupported_without_required_target",
1484
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
@@ -1490,9 +1490,9 @@
1490
  "raw": null,
1491
  "metric_key": "macro_f1",
1492
  "source": null,
1493
- "scope": "multi_episode_128_metadata_baseline",
1494
  "status": "not_supported_by_metadata_only_package",
1495
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1496
  "normalized_score": null,
1497
  "raw_text": "n/a",
1498
  "status_label": "not supported"
@@ -1570,7 +1570,7 @@
1570
  "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
- "scope": "multi_episode_128_metadata_baseline",
1574
  "status": "scored",
1575
  "reason": null,
1576
  "normalized_score": 0.0,
@@ -1581,7 +1581,7 @@
1581
  "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
- "scope": "multi_episode_128_metadata_baseline",
1585
  "status": "scored",
1586
  "reason": null,
1587
  "normalized_score": 0.0,
@@ -1661,7 +1661,7 @@
1661
  "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
- "scope": "multi_episode_128_metadata_baseline",
1665
  "status": "scored",
1666
  "reason": null,
1667
  "normalized_score": 0.17656983343047333,
@@ -1672,7 +1672,7 @@
1672
  "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
- "scope": "multi_episode_128_metadata_baseline",
1676
  "status": "scored",
1677
  "reason": null,
1678
  "normalized_score": 0.17418550827844048,
@@ -1749,26 +1749,26 @@
1749
  "raw128_proxy_axis": false,
1750
  "values": {
1751
  "metadata128_simple": {
1752
- "raw": null,
1753
  "metric_key": "mae",
1754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
- "scope": "multi_episode_128_metadata_baseline",
1756
- "status": "unsupported_without_required_target",
1757
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
1758
- "normalized_score": null,
1759
- "raw_text": "n/a",
1760
- "status_label": "unsupported"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
- "raw": null,
1764
  "metric_key": "mae",
1765
- "source": null,
1766
- "scope": "multi_episode_128_metadata_baseline",
1767
- "status": "not_supported_by_metadata_only_package",
1768
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1769
- "normalized_score": null,
1770
- "raw_text": "n/a",
1771
- "status_label": "not supported"
1772
  },
1773
  "raw128_simple": {
1774
  "raw": 0.22941437363624573,
@@ -1843,7 +1843,7 @@
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
- "scope": "multi_episode_128_metadata_baseline",
1847
  "status": "unsupported_without_required_target",
1848
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
@@ -1854,9 +1854,9 @@
1854
  "raw": null,
1855
  "metric_key": "mrr",
1856
  "source": null,
1857
- "scope": "multi_episode_128_metadata_baseline",
1858
  "status": "not_supported_by_metadata_only_package",
1859
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1860
  "normalized_score": null,
1861
  "raw_text": "n/a",
1862
  "status_label": "not supported"
@@ -1934,7 +1934,7 @@
1934
  "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
- "scope": "multi_episode_128_metadata_baseline",
1938
  "status": "scored",
1939
  "reason": null,
1940
  "normalized_score": 0.016864874132806403,
@@ -1945,7 +1945,7 @@
1945
  "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
- "scope": "multi_episode_128_metadata_baseline",
1949
  "status": "scored",
1950
  "reason": null,
1951
  "normalized_score": 0.25411768748242325,
@@ -2016,7 +2016,7 @@
2016
  "task_id": "timeline_action",
2017
  "task_label": "Action Recognition",
2018
  "series_id": "metadata128_simple",
2019
- "method": "128ep Metadata Simple",
2020
  "status": "scored",
2021
  "status_label": "scored",
2022
  "scored": true,
@@ -2026,7 +2026,7 @@
2026
  "normalized_score": 0.008252821966746326,
2027
  "metric_key": "macro_f1",
2028
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2029
- "scope": "multi_episode_128_metadata_baseline",
2030
  "reason": null
2031
  },
2032
  {
@@ -2034,7 +2034,7 @@
2034
  "task_id": "timeline_action",
2035
  "task_label": "Action Recognition",
2036
  "series_id": "metadata128_neural_mlp",
2037
- "method": "128ep Metadata NN",
2038
  "status": "scored",
2039
  "status_label": "scored",
2040
  "scored": true,
@@ -2044,7 +2044,7 @@
2044
  "normalized_score": 0.004175793689174209,
2045
  "metric_key": "macro_f1",
2046
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2047
- "scope": "multi_episode_128_metadata_baseline",
2048
  "reason": null
2049
  },
2050
  {
@@ -2142,7 +2142,7 @@
2142
  "task_id": "timeline_subtask",
2143
  "task_label": "Procedure Step Recognition",
2144
  "series_id": "metadata128_simple",
2145
- "method": "128ep Metadata Simple",
2146
  "status": "scored",
2147
  "status_label": "scored",
2148
  "scored": true,
@@ -2152,7 +2152,7 @@
2152
  "normalized_score": 0.00019512195121951218,
2153
  "metric_key": "macro_f1",
2154
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2155
- "scope": "multi_episode_128_metadata_baseline",
2156
  "reason": null
2157
  },
2158
  {
@@ -2160,7 +2160,7 @@
2160
  "task_id": "timeline_subtask",
2161
  "task_label": "Procedure Step Recognition",
2162
  "series_id": "metadata128_neural_mlp",
2163
- "method": "128ep Metadata NN",
2164
  "status": "scored",
2165
  "status_label": "scored",
2166
  "scored": true,
@@ -2170,7 +2170,7 @@
2170
  "normalized_score": 7.207207207207208e-05,
2171
  "metric_key": "macro_f1",
2172
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2173
- "scope": "multi_episode_128_metadata_baseline",
2174
  "reason": null
2175
  },
2176
  {
@@ -2268,7 +2268,7 @@
2268
  "task_id": "transition_detection",
2269
  "task_label": "Action Boundary Detection",
2270
  "series_id": "metadata128_simple",
2271
- "method": "128ep Metadata Simple",
2272
  "status": "scored",
2273
  "status_label": "scored",
2274
  "scored": true,
@@ -2278,7 +2278,7 @@
2278
  "normalized_score": 0.29652162550029315,
2279
  "metric_key": "macro_f1",
2280
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2281
- "scope": "multi_episode_128_metadata_baseline",
2282
  "reason": null
2283
  },
2284
  {
@@ -2286,7 +2286,7 @@
2286
  "task_id": "transition_detection",
2287
  "task_label": "Action Boundary Detection",
2288
  "series_id": "metadata128_neural_mlp",
2289
- "method": "128ep Metadata NN",
2290
  "status": "scored",
2291
  "status_label": "scored",
2292
  "scored": true,
@@ -2296,7 +2296,7 @@
2296
  "normalized_score": 0.4841733292368365,
2297
  "metric_key": "macro_f1",
2298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2299
- "scope": "multi_episode_128_metadata_baseline",
2300
  "reason": null
2301
  },
2302
  {
@@ -2394,7 +2394,7 @@
2394
  "task_id": "next_action",
2395
  "task_label": "Next-Action Prediction",
2396
  "series_id": "metadata128_simple",
2397
- "method": "128ep Metadata Simple",
2398
  "status": "scored",
2399
  "status_label": "scored",
2400
  "scored": true,
@@ -2404,7 +2404,7 @@
2404
  "normalized_score": 0.006514774539765508,
2405
  "metric_key": "macro_f1",
2406
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
2407
- "scope": "multi_episode_128_metadata_baseline",
2408
  "reason": null
2409
  },
2410
  {
@@ -2412,7 +2412,7 @@
2412
  "task_id": "next_action",
2413
  "task_label": "Next-Action Prediction",
2414
  "series_id": "metadata128_neural_mlp",
2415
- "method": "128ep Metadata NN",
2416
  "status": "scored",
2417
  "status_label": "scored",
2418
  "scored": true,
@@ -2422,7 +2422,7 @@
2422
  "normalized_score": 0.004910507980164745,
2423
  "metric_key": "macro_f1",
2424
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
2425
- "scope": "multi_episode_128_metadata_baseline",
2426
  "reason": null
2427
  },
2428
  {
@@ -2520,36 +2520,36 @@
2520
  "task_id": "hand_trajectory_forecast",
2521
  "task_label": "Hand Trajectory Forecasting",
2522
  "series_id": "metadata128_simple",
2523
- "method": "128ep Metadata Simple",
2524
- "status": "unsupported_without_required_target",
2525
- "status_label": "unsupported",
2526
- "scored": false,
2527
  "proxy_scored": false,
2528
- "raw": null,
2529
- "raw_text": "n/a",
2530
- "normalized_score": null,
2531
  "metric_key": "mpjpe",
2532
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
2533
- "scope": "multi_episode_128_metadata_baseline",
2534
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package"
2535
  },
2536
  {
2537
  "task_number": 5,
2538
  "task_id": "hand_trajectory_forecast",
2539
  "task_label": "Hand Trajectory Forecasting",
2540
  "series_id": "metadata128_neural_mlp",
2541
- "method": "128ep Metadata NN",
2542
- "status": "not_supported_by_metadata_only_package",
2543
- "status_label": "not supported",
2544
- "scored": false,
2545
  "proxy_scored": false,
2546
- "raw": null,
2547
- "raw_text": "n/a",
2548
- "normalized_score": null,
2549
  "metric_key": "mpjpe",
2550
- "source": null,
2551
- "scope": "multi_episode_128_metadata_baseline",
2552
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2553
  },
2554
  {
2555
  "task_number": 5,
@@ -2646,7 +2646,7 @@
2646
  "task_id": "contact_prediction",
2647
  "task_label": "Contact State Prediction",
2648
  "series_id": "metadata128_simple",
2649
- "method": "128ep Metadata Simple",
2650
  "status": "scored",
2651
  "status_label": "scored",
2652
  "scored": true,
@@ -2656,7 +2656,7 @@
2656
  "normalized_score": 0.4381481308057444,
2657
  "metric_key": "macro_f1",
2658
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
2659
- "scope": "multi_episode_128_metadata_baseline",
2660
  "reason": null
2661
  },
2662
  {
@@ -2664,7 +2664,7 @@
2664
  "task_id": "contact_prediction",
2665
  "task_label": "Contact State Prediction",
2666
  "series_id": "metadata128_neural_mlp",
2667
- "method": "128ep Metadata NN",
2668
  "status": "scored",
2669
  "status_label": "scored",
2670
  "scored": true,
@@ -2674,7 +2674,7 @@
2674
  "normalized_score": 0.5682695682695682,
2675
  "metric_key": "macro_f1",
2676
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
2677
- "scope": "multi_episode_128_metadata_baseline",
2678
  "reason": null
2679
  },
2680
  {
@@ -2772,7 +2772,7 @@
2772
  "task_id": "object_relevance",
2773
  "task_label": "Object Relevance Prediction",
2774
  "series_id": "metadata128_simple",
2775
- "method": "128ep Metadata Simple",
2776
  "status": "scored",
2777
  "status_label": "scored",
2778
  "scored": true,
@@ -2782,7 +2782,7 @@
2782
  "normalized_score": 0.17764578833693304,
2783
  "metric_key": "micro_f1",
2784
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
2785
- "scope": "multi_episode_128_metadata_baseline",
2786
  "reason": null
2787
  },
2788
  {
@@ -2790,7 +2790,7 @@
2790
  "task_id": "object_relevance",
2791
  "task_label": "Object Relevance Prediction",
2792
  "series_id": "metadata128_neural_mlp",
2793
- "method": "128ep Metadata NN",
2794
  "status": "scored",
2795
  "status_label": "scored",
2796
  "scored": true,
@@ -2800,7 +2800,7 @@
2800
  "normalized_score": 0.18662723837686876,
2801
  "metric_key": "micro_f1",
2802
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
2803
- "scope": "multi_episode_128_metadata_baseline",
2804
  "reason": null
2805
  },
2806
  {
@@ -2898,7 +2898,7 @@
2898
  "task_id": "caption_grounding",
2899
  "task_label": "Language Grounding",
2900
  "series_id": "metadata128_simple",
2901
- "method": "128ep Metadata Simple",
2902
  "status": "scored",
2903
  "status_label": "scored",
2904
  "scored": true,
@@ -2908,7 +2908,7 @@
2908
  "normalized_score": 0.002332374220713973,
2909
  "metric_key": "mrr",
2910
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
2911
- "scope": "multi_episode_128_metadata_baseline",
2912
  "reason": null
2913
  },
2914
  {
@@ -2916,7 +2916,7 @@
2916
  "task_id": "caption_grounding",
2917
  "task_label": "Language Grounding",
2918
  "series_id": "metadata128_neural_mlp",
2919
- "method": "128ep Metadata NN",
2920
  "status": "scored",
2921
  "status_label": "scored",
2922
  "scored": true,
@@ -2926,7 +2926,7 @@
2926
  "normalized_score": 0.008236799389123917,
2927
  "metric_key": "mrr",
2928
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
2929
- "scope": "multi_episode_128_metadata_baseline",
2930
  "reason": null
2931
  },
2932
  {
@@ -3024,36 +3024,36 @@
3024
  "task_id": "cross_modal_retrieval",
3025
  "task_label": "Cross-Modal Retrieval",
3026
  "series_id": "metadata128_simple",
3027
- "method": "128ep Metadata Simple",
3028
- "status": "unsupported_without_required_target",
3029
- "status_label": "unsupported",
3030
- "scored": false,
3031
  "proxy_scored": false,
3032
- "raw": null,
3033
- "raw_text": "n/a",
3034
- "normalized_score": null,
3035
  "metric_key": "mrr",
3036
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3037
- "scope": "multi_episode_128_metadata_baseline",
3038
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package"
3039
  },
3040
  {
3041
  "task_number": 9,
3042
  "task_id": "cross_modal_retrieval",
3043
  "task_label": "Cross-Modal Retrieval",
3044
  "series_id": "metadata128_neural_mlp",
3045
- "method": "128ep Metadata NN",
3046
- "status": "not_supported_by_metadata_only_package",
3047
- "status_label": "not supported",
3048
- "scored": false,
3049
  "proxy_scored": false,
3050
- "raw": null,
3051
- "raw_text": "n/a",
3052
- "normalized_score": null,
3053
  "metric_key": "mrr",
3054
- "source": null,
3055
- "scope": "multi_episode_128_metadata_baseline",
3056
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3057
  },
3058
  {
3059
  "task_number": 9,
@@ -3150,36 +3150,36 @@
3150
  "task_id": "modality_reconstruction",
3151
  "task_label": "Cross-Modal Reconstruction",
3152
  "series_id": "metadata128_simple",
3153
- "method": "128ep Metadata Simple",
3154
- "status": "unsupported_without_required_target",
3155
- "status_label": "unsupported",
3156
- "scored": false,
3157
  "proxy_scored": false,
3158
- "raw": null,
3159
- "raw_text": "n/a",
3160
- "normalized_score": null,
3161
  "metric_key": "r2",
3162
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
3163
- "scope": "multi_episode_128_metadata_baseline",
3164
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package"
3165
  },
3166
  {
3167
  "task_number": 10,
3168
  "task_id": "modality_reconstruction",
3169
  "task_label": "Cross-Modal Reconstruction",
3170
  "series_id": "metadata128_neural_mlp",
3171
- "method": "128ep Metadata NN",
3172
- "status": "not_supported_by_metadata_only_package",
3173
- "status_label": "not supported",
3174
- "scored": false,
3175
  "proxy_scored": false,
3176
- "raw": null,
3177
- "raw_text": "n/a",
3178
- "normalized_score": null,
3179
  "metric_key": "r2",
3180
- "source": null,
3181
- "scope": "multi_episode_128_metadata_baseline",
3182
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3183
  },
3184
  {
3185
  "task_number": 10,
@@ -3276,7 +3276,7 @@
3276
  "task_id": "temporal_order",
3277
  "task_label": "Temporal Order Verification",
3278
  "series_id": "metadata128_simple",
3279
- "method": "128ep Metadata Simple",
3280
  "status": "scored",
3281
  "status_label": "scored",
3282
  "scored": true,
@@ -3286,7 +3286,7 @@
3286
  "normalized_score": 0.4198864140782312,
3287
  "metric_key": "f1",
3288
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
3289
- "scope": "multi_episode_128_metadata_baseline",
3290
  "reason": null
3291
  },
3292
  {
@@ -3294,7 +3294,7 @@
3294
  "task_id": "temporal_order",
3295
  "task_label": "Temporal Order Verification",
3296
  "series_id": "metadata128_neural_mlp",
3297
- "method": "128ep Metadata NN",
3298
  "status": "scored",
3299
  "status_label": "scored",
3300
  "scored": true,
@@ -3304,7 +3304,7 @@
3304
  "normalized_score": 0.8252408266656923,
3305
  "metric_key": "f1",
3306
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
3307
- "scope": "multi_episode_128_metadata_baseline",
3308
  "reason": null
3309
  },
3310
  {
@@ -3402,36 +3402,36 @@
3402
  "task_id": "misalignment_detection",
3403
  "task_label": "Multimodal Synchronization Detection",
3404
  "series_id": "metadata128_simple",
3405
- "method": "128ep Metadata Simple",
3406
- "status": "unsupported_without_required_target",
3407
- "status_label": "unsupported",
3408
- "scored": false,
3409
  "proxy_scored": false,
3410
- "raw": null,
3411
- "raw_text": "n/a",
3412
- "normalized_score": null,
3413
  "metric_key": "f1",
3414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
3415
- "scope": "multi_episode_128_metadata_baseline",
3416
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone"
3417
  },
3418
  {
3419
  "task_number": 12,
3420
  "task_id": "misalignment_detection",
3421
  "task_label": "Multimodal Synchronization Detection",
3422
  "series_id": "metadata128_neural_mlp",
3423
- "method": "128ep Metadata NN",
3424
- "status": "not_supported_by_metadata_only_package",
3425
- "status_label": "not supported",
3426
- "scored": false,
3427
  "proxy_scored": false,
3428
- "raw": null,
3429
- "raw_text": "n/a",
3430
- "normalized_score": null,
3431
  "metric_key": "f1",
3432
- "source": null,
3433
- "scope": "multi_episode_128_metadata_baseline",
3434
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3435
  },
3436
  {
3437
  "task_number": 12,
@@ -3528,7 +3528,7 @@
3528
  "task_id": "long_horizon_next_action",
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
- "method": "128ep Metadata Simple",
3532
  "status": "scored",
3533
  "status_label": "scored",
3534
  "scored": true,
@@ -3538,7 +3538,7 @@
3538
  "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
- "scope": "multi_episode_128_metadata_baseline",
3542
  "reason": null
3543
  },
3544
  {
@@ -3546,7 +3546,7 @@
3546
  "task_id": "long_horizon_next_action",
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
- "method": "128ep Metadata NN",
3550
  "status": "scored",
3551
  "status_label": "scored",
3552
  "scored": true,
@@ -3556,7 +3556,7 @@
3556
  "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
- "scope": "multi_episode_128_metadata_baseline",
3560
  "reason": null
3561
  },
3562
  {
@@ -3654,7 +3654,7 @@
3654
  "task_id": "next_subtask_forecast",
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
- "method": "128ep Metadata Simple",
3658
  "status": "scored",
3659
  "status_label": "scored",
3660
  "scored": true,
@@ -3664,7 +3664,7 @@
3664
  "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
- "scope": "multi_episode_128_metadata_baseline",
3668
  "reason": null
3669
  },
3670
  {
@@ -3672,7 +3672,7 @@
3672
  "task_id": "next_subtask_forecast",
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
- "method": "128ep Metadata NN",
3676
  "status": "scored",
3677
  "status_label": "scored",
3678
  "scored": true,
@@ -3682,7 +3682,7 @@
3682
  "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
- "scope": "multi_episode_128_metadata_baseline",
3686
  "reason": null
3687
  },
3688
  {
@@ -3780,7 +3780,7 @@
3780
  "task_id": "interaction_text_prediction",
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
- "method": "128ep Metadata Simple",
3784
  "status": "unsupported_without_required_target",
3785
  "status_label": "unsupported",
3786
  "scored": false,
@@ -3790,7 +3790,7 @@
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
- "scope": "multi_episode_128_metadata_baseline",
3794
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
@@ -3798,7 +3798,7 @@
3798
  "task_id": "interaction_text_prediction",
3799
  "task_label": "Interaction Text Prediction",
3800
  "series_id": "metadata128_neural_mlp",
3801
- "method": "128ep Metadata NN",
3802
  "status": "not_supported_by_metadata_only_package",
3803
  "status_label": "not supported",
3804
  "scored": false,
@@ -3808,8 +3808,8 @@
3808
  "normalized_score": null,
3809
  "metric_key": "macro_f1",
3810
  "source": null,
3811
- "scope": "multi_episode_128_metadata_baseline",
3812
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3813
  },
3814
  {
3815
  "task_number": 15,
@@ -3906,7 +3906,7 @@
3906
  "task_id": "action_object_relation",
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
- "method": "128ep Metadata Simple",
3910
  "status": "scored",
3911
  "status_label": "scored",
3912
  "scored": true,
@@ -3916,7 +3916,7 @@
3916
  "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
- "scope": "multi_episode_128_metadata_baseline",
3920
  "reason": null
3921
  },
3922
  {
@@ -3924,7 +3924,7 @@
3924
  "task_id": "action_object_relation",
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
- "method": "128ep Metadata NN",
3928
  "status": "scored",
3929
  "status_label": "scored",
3930
  "scored": true,
@@ -3934,7 +3934,7 @@
3934
  "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
- "scope": "multi_episode_128_metadata_baseline",
3938
  "reason": null
3939
  },
3940
  {
@@ -4032,7 +4032,7 @@
4032
  "task_id": "object_set_forecast",
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
- "method": "128ep Metadata Simple",
4036
  "status": "scored",
4037
  "status_label": "scored",
4038
  "scored": true,
@@ -4042,7 +4042,7 @@
4042
  "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
- "scope": "multi_episode_128_metadata_baseline",
4046
  "reason": null
4047
  },
4048
  {
@@ -4050,7 +4050,7 @@
4050
  "task_id": "object_set_forecast",
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
- "method": "128ep Metadata NN",
4054
  "status": "scored",
4055
  "status_label": "scored",
4056
  "scored": true,
@@ -4060,7 +4060,7 @@
4060
  "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
- "scope": "multi_episode_128_metadata_baseline",
4064
  "reason": null
4065
  },
4066
  {
@@ -4158,36 +4158,36 @@
4158
  "task_id": "imu_to_hand_pose",
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
- "method": "128ep Metadata Simple",
4162
- "status": "unsupported_without_required_target",
4163
- "status_label": "unsupported",
4164
- "scored": false,
4165
  "proxy_scored": false,
4166
- "raw": null,
4167
- "raw_text": "n/a",
4168
- "normalized_score": null,
4169
  "metric_key": "mae",
4170
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
- "scope": "multi_episode_128_metadata_baseline",
4172
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
4173
  },
4174
  {
4175
  "task_number": 18,
4176
  "task_id": "imu_to_hand_pose",
4177
  "task_label": "IMU-to-Hand Pose Reconstruction",
4178
  "series_id": "metadata128_neural_mlp",
4179
- "method": "128ep Metadata NN",
4180
- "status": "not_supported_by_metadata_only_package",
4181
- "status_label": "not supported",
4182
- "scored": false,
4183
  "proxy_scored": false,
4184
- "raw": null,
4185
- "raw_text": "n/a",
4186
- "normalized_score": null,
4187
  "metric_key": "mae",
4188
- "source": null,
4189
- "scope": "multi_episode_128_metadata_baseline",
4190
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4191
  },
4192
  {
4193
  "task_number": 18,
@@ -4284,7 +4284,7 @@
4284
  "task_id": "camera_view_sync_retrieval",
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
- "method": "128ep Metadata Simple",
4288
  "status": "unsupported_without_required_target",
4289
  "status_label": "unsupported",
4290
  "scored": false,
@@ -4294,7 +4294,7 @@
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
- "scope": "multi_episode_128_metadata_baseline",
4298
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
@@ -4302,7 +4302,7 @@
4302
  "task_id": "camera_view_sync_retrieval",
4303
  "task_label": "Camera-View Synchronization Retrieval",
4304
  "series_id": "metadata128_neural_mlp",
4305
- "method": "128ep Metadata NN",
4306
  "status": "not_supported_by_metadata_only_package",
4307
  "status_label": "not supported",
4308
  "scored": false,
@@ -4312,8 +4312,8 @@
4312
  "normalized_score": null,
4313
  "metric_key": "mrr",
4314
  "source": null,
4315
- "scope": "multi_episode_128_metadata_baseline",
4316
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4317
  },
4318
  {
4319
  "task_number": 19,
@@ -4410,7 +4410,7 @@
4410
  "task_id": "time_to_transition",
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
- "method": "128ep Metadata Simple",
4414
  "status": "scored",
4415
  "status_label": "scored",
4416
  "scored": true,
@@ -4420,7 +4420,7 @@
4420
  "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
- "scope": "multi_episode_128_metadata_baseline",
4424
  "reason": null
4425
  },
4426
  {
@@ -4428,7 +4428,7 @@
4428
  "task_id": "time_to_transition",
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
- "method": "128ep Metadata NN",
4432
  "status": "scored",
4433
  "status_label": "scored",
4434
  "scored": true,
@@ -4438,7 +4438,7 @@
4438
  "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
- "scope": "multi_episode_128_metadata_baseline",
4442
  "reason": null
4443
  },
4444
  {
 
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
8
  "method_task_record_count": 140,
9
+ "scored_method_task_count": 103,
10
  "normalization_policy": {
11
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
12
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
 
21
  "series": [
22
  {
23
  "id": "metadata128_simple",
24
+ "label": "128ep Aligned Simple",
25
  "short_label": "128-S",
26
  "color": "#ffd166",
27
+ "kind": "partial_128_episode_aligned_baseline",
28
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
29
  "stroke_dasharray": "9 6",
30
+ "method_detail": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
31
  "plotted_as": "colored point overlay",
32
  "result_record_count": 20,
33
+ "scored_task_count": 18,
34
+ "covered_task_count": 18,
35
  "proxy_scored_task_count": 0,
36
+ "scoreless_task_count": 2,
37
+ "unsupported_task_count": 2,
38
  "not_evaluated_task_count": 0,
39
  "status_counts": {
40
+ "scored": 18,
41
+ "unsupported_without_required_target": 2
42
  },
43
+ "coverage_fraction": 0.9,
44
  "result_record_fraction": 1.0
45
  },
46
  {
47
  "id": "metadata128_neural_mlp",
48
+ "label": "128ep Aligned NN",
49
  "short_label": "128-NN",
50
  "color": "#f472b6",
51
+ "kind": "partial_128_episode_aligned_baseline",
52
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
53
  "stroke_dasharray": "3 6",
54
+ "method_detail": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
55
  "plotted_as": "colored point overlay",
56
  "result_record_count": 20,
57
+ "scored_task_count": 18,
58
+ "covered_task_count": 18,
59
  "proxy_scored_task_count": 0,
60
+ "scoreless_task_count": 2,
61
+ "unsupported_task_count": 2,
62
  "not_evaluated_task_count": 0,
63
  "status_counts": {
64
+ "not_supported_by_metadata_only_package": 2,
65
+ "scored": 18
66
  },
67
+ "coverage_fraction": 0.9,
68
  "result_record_fraction": 1.0
69
  },
70
  {
 
205
  "raw": 0.008252821966746326,
206
  "metric_key": "macro_f1",
207
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
208
+ "scope": "multi_episode_128_aligned_baseline",
209
  "status": "scored",
210
  "reason": null,
211
  "normalized_score": 0.008252821966746326,
 
216
  "raw": 0.004175793689174209,
217
  "metric_key": "macro_f1",
218
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
219
+ "scope": "multi_episode_128_aligned_baseline",
220
  "status": "scored",
221
  "reason": null,
222
  "normalized_score": 0.004175793689174209,
 
296
  "raw": 0.00019512195121951218,
297
  "metric_key": "macro_f1",
298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
299
+ "scope": "multi_episode_128_aligned_baseline",
300
  "status": "scored",
301
  "reason": null,
302
  "normalized_score": 0.00019512195121951218,
 
307
  "raw": 7.207207207207208e-05,
308
  "metric_key": "macro_f1",
309
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
310
+ "scope": "multi_episode_128_aligned_baseline",
311
  "status": "scored",
312
  "reason": null,
313
  "normalized_score": 7.207207207207208e-05,
 
387
  "raw": 0.29652162550029315,
388
  "metric_key": "macro_f1",
389
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
390
+ "scope": "multi_episode_128_aligned_baseline",
391
  "status": "scored",
392
  "reason": null,
393
  "normalized_score": 0.29652162550029315,
 
398
  "raw": 0.4841733292368365,
399
  "metric_key": "macro_f1",
400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
401
+ "scope": "multi_episode_128_aligned_baseline",
402
  "status": "scored",
403
  "reason": null,
404
  "normalized_score": 0.4841733292368365,
 
478
  "raw": 0.006514774539765508,
479
  "metric_key": "macro_f1",
480
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
481
+ "scope": "multi_episode_128_aligned_baseline",
482
  "status": "scored",
483
  "reason": null,
484
  "normalized_score": 0.006514774539765508,
 
489
  "raw": 0.004910507980164745,
490
  "metric_key": "macro_f1",
491
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
492
+ "scope": "multi_episode_128_aligned_baseline",
493
  "status": "scored",
494
  "reason": null,
495
  "normalized_score": 0.004910507980164745,
 
566
  "raw128_proxy_axis": false,
567
  "values": {
568
  "metadata128_simple": {
569
+ "raw": 8.817333221435547,
570
  "metric_key": "mpjpe",
571
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
572
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
573
+ "status": "scored",
574
+ "reason": null,
575
+ "normalized_score": 0.012231610603598841,
576
+ "raw_text": "8.817",
577
+ "status_label": "scored"
578
  },
579
  "metadata128_neural_mlp": {
580
+ "raw": 0.429434210062027,
581
  "metric_key": "mpjpe",
582
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
583
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
584
+ "status": "scored",
585
+ "reason": null,
586
+ "normalized_score": 0.25114484128127007,
587
+ "raw_text": "0.4294",
588
+ "status_label": "scored"
589
  },
590
  "raw128_simple": {
591
  "raw": 0.2729249894618988,
 
660
  "raw": 0.4381481308057444,
661
  "metric_key": "macro_f1",
662
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
663
+ "scope": "multi_episode_128_aligned_baseline",
664
  "status": "scored",
665
  "reason": null,
666
  "normalized_score": 0.4381481308057444,
 
671
  "raw": 0.5682695682695682,
672
  "metric_key": "macro_f1",
673
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
674
+ "scope": "multi_episode_128_aligned_baseline",
675
  "status": "scored",
676
  "reason": null,
677
  "normalized_score": 0.5682695682695682,
 
751
  "raw": 0.17764578833693304,
752
  "metric_key": "micro_f1",
753
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
754
+ "scope": "multi_episode_128_aligned_baseline",
755
  "status": "scored",
756
  "reason": null,
757
  "normalized_score": 0.17764578833693304,
 
762
  "raw": 0.18662723837686876,
763
  "metric_key": "micro_f1",
764
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
765
+ "scope": "multi_episode_128_aligned_baseline",
766
  "status": "scored",
767
  "reason": null,
768
  "normalized_score": 0.18662723837686876,
 
842
  "raw": 0.002332374220713973,
843
  "metric_key": "mrr",
844
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
845
+ "scope": "multi_episode_128_aligned_baseline",
846
  "status": "scored",
847
  "reason": null,
848
  "normalized_score": 0.002332374220713973,
 
853
  "raw": 0.008236799389123917,
854
  "metric_key": "mrr",
855
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
856
+ "scope": "multi_episode_128_aligned_baseline",
857
  "status": "scored",
858
  "reason": null,
859
  "normalized_score": 0.008236799389123917,
 
930
  "raw128_proxy_axis": false,
931
  "values": {
932
  "metadata128_simple": {
933
+ "raw": 0.002587692579254508,
934
  "metric_key": "mrr",
935
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
936
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
937
+ "status": "scored",
938
+ "reason": null,
939
+ "normalized_score": 0.002587692579254508,
940
+ "raw_text": "0.0026",
941
+ "status_label": "scored"
942
  },
943
  "metadata128_neural_mlp": {
944
+ "raw": 0.0026067993603646755,
945
  "metric_key": "mrr",
946
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
947
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
948
+ "status": "scored",
949
+ "reason": null,
950
+ "normalized_score": 0.0026067993603646755,
951
+ "raw_text": "0.0026",
952
+ "status_label": "scored"
953
  },
954
  "raw128_simple": {
955
  "raw": 0.003459817497059703,
 
1021
  "raw128_proxy_axis": false,
1022
  "values": {
1023
  "metadata128_simple": {
1024
+ "raw": -190.66106203944798,
1025
  "metric_key": "r2",
1026
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1027
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1028
+ "status": "scored",
1029
+ "reason": null,
1030
+ "normalized_score": 0.0,
1031
+ "raw_text": "-190.66",
1032
+ "status_label": "scored"
1033
  },
1034
  "metadata128_neural_mlp": {
1035
+ "raw": -0.43481132003942147,
1036
  "metric_key": "r2",
1037
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
1038
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1039
+ "status": "scored",
1040
+ "reason": null,
1041
+ "normalized_score": 0.0,
1042
+ "raw_text": "-0.4348",
1043
+ "status_label": "scored"
1044
  },
1045
  "raw128_simple": {
1046
  "raw": -1.3450960391924882,
 
1115
  "raw": 0.4198864140782312,
1116
  "metric_key": "f1",
1117
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1118
+ "scope": "multi_episode_128_aligned_baseline",
1119
  "status": "scored",
1120
  "reason": null,
1121
  "normalized_score": 0.4198864140782312,
 
1126
  "raw": 0.8252408266656923,
1127
  "metric_key": "f1",
1128
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1129
+ "scope": "multi_episode_128_aligned_baseline",
1130
  "status": "scored",
1131
  "reason": null,
1132
  "normalized_score": 0.8252408266656923,
 
1203
  "raw128_proxy_axis": false,
1204
  "values": {
1205
  "metadata128_simple": {
1206
+ "raw": 0.49980060227663614,
1207
  "metric_key": "f1",
1208
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1209
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1210
+ "status": "scored",
1211
+ "reason": null,
1212
+ "normalized_score": 0.49980060227663614,
1213
+ "raw_text": "0.4998",
1214
+ "status_label": "scored"
1215
  },
1216
  "metadata128_neural_mlp": {
1217
+ "raw": 0.7773773780941162,
1218
  "metric_key": "f1",
1219
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
1220
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1221
+ "status": "scored",
1222
+ "reason": null,
1223
+ "normalized_score": 0.7773773780941162,
1224
+ "raw_text": "0.7774",
1225
+ "status_label": "scored"
1226
  },
1227
  "raw128_simple": {
1228
  "raw": 0.4958867673901769,
 
1297
  "raw": 0.004579592783699693,
1298
  "metric_key": "macro_f1",
1299
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1300
+ "scope": "multi_episode_128_aligned_baseline",
1301
  "status": "scored",
1302
  "reason": null,
1303
  "normalized_score": 0.004579592783699693,
 
1308
  "raw": 0.0029821307969142615,
1309
  "metric_key": "macro_f1",
1310
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1311
+ "scope": "multi_episode_128_aligned_baseline",
1312
  "status": "scored",
1313
  "reason": null,
1314
  "normalized_score": 0.0029821307969142615,
 
1388
  "raw": 0.0001206030150753769,
1389
  "metric_key": "macro_f1",
1390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1391
+ "scope": "multi_episode_128_aligned_baseline",
1392
  "status": "scored",
1393
  "reason": null,
1394
  "normalized_score": 0.0001206030150753769,
 
1399
  "raw": 2.086049543676662e-05,
1400
  "metric_key": "macro_f1",
1401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1402
+ "scope": "multi_episode_128_aligned_baseline",
1403
  "status": "scored",
1404
  "reason": null,
1405
  "normalized_score": 2.086049543676662e-05,
 
1479
  "raw": null,
1480
  "metric_key": "macro_f1",
1481
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1482
+ "scope": "multi_episode_128_aligned_baseline",
1483
  "status": "unsupported_without_required_target",
1484
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1485
  "normalized_score": null,
 
1490
  "raw": null,
1491
  "metric_key": "macro_f1",
1492
  "source": null,
1493
+ "scope": "multi_episode_128_aligned_baseline",
1494
  "status": "not_supported_by_metadata_only_package",
1495
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1496
  "normalized_score": null,
1497
  "raw_text": "n/a",
1498
  "status_label": "not supported"
 
1570
  "raw": 0.0,
1571
  "metric_key": "macro_f1",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1573
+ "scope": "multi_episode_128_aligned_baseline",
1574
  "status": "scored",
1575
  "reason": null,
1576
  "normalized_score": 0.0,
 
1581
  "raw": 0.0,
1582
  "metric_key": "macro_f1",
1583
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1584
+ "scope": "multi_episode_128_aligned_baseline",
1585
  "status": "scored",
1586
  "reason": null,
1587
  "normalized_score": 0.0,
 
1661
  "raw": 0.17656983343047333,
1662
  "metric_key": "micro_f1",
1663
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
1664
+ "scope": "multi_episode_128_aligned_baseline",
1665
  "status": "scored",
1666
  "reason": null,
1667
  "normalized_score": 0.17656983343047333,
 
1672
  "raw": 0.17418550827844048,
1673
  "metric_key": "micro_f1",
1674
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
1675
+ "scope": "multi_episode_128_aligned_baseline",
1676
  "status": "scored",
1677
  "reason": null,
1678
  "normalized_score": 0.17418550827844048,
 
1749
  "raw128_proxy_axis": false,
1750
  "values": {
1751
  "metadata128_simple": {
1752
+ "raw": 0.2294670194387436,
1753
  "metric_key": "mae",
1754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
1755
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1756
+ "status": "scored",
1757
+ "reason": null,
1758
+ "normalized_score": 0.18324815505876868,
1759
+ "raw_text": "0.2295",
1760
+ "status_label": "scored"
1761
  },
1762
  "metadata128_neural_mlp": {
1763
+ "raw": 0.2555866539478302,
1764
  "metric_key": "mae",
1765
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
1766
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1767
+ "status": "scored",
1768
+ "reason": null,
1769
+ "normalized_score": 0.16452114110609004,
1770
+ "raw_text": "0.2556",
1771
+ "status_label": "scored"
1772
  },
1773
  "raw128_simple": {
1774
  "raw": 0.22941437363624573,
 
1843
  "raw": null,
1844
  "metric_key": "mrr",
1845
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
1846
+ "scope": "multi_episode_128_aligned_baseline",
1847
  "status": "unsupported_without_required_target",
1848
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
1849
  "normalized_score": null,
 
1854
  "raw": null,
1855
  "metric_key": "mrr",
1856
  "source": null,
1857
+ "scope": "multi_episode_128_aligned_baseline",
1858
  "status": "not_supported_by_metadata_only_package",
1859
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1860
  "normalized_score": null,
1861
  "raw_text": "n/a",
1862
  "status_label": "not supported"
 
1934
  "raw": 624.8108520507812,
1935
  "metric_key": "mae",
1936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
1937
+ "scope": "multi_episode_128_aligned_baseline",
1938
  "status": "scored",
1939
  "reason": null,
1940
  "normalized_score": 0.016864874132806403,
 
1945
  "raw": 41.4664421081543,
1946
  "metric_key": "mae",
1947
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
1948
+ "scope": "multi_episode_128_aligned_baseline",
1949
  "status": "scored",
1950
  "reason": null,
1951
  "normalized_score": 0.25411768748242325,
 
2016
  "task_id": "timeline_action",
2017
  "task_label": "Action Recognition",
2018
  "series_id": "metadata128_simple",
2019
+ "method": "128ep Aligned Simple",
2020
  "status": "scored",
2021
  "status_label": "scored",
2022
  "scored": true,
 
2026
  "normalized_score": 0.008252821966746326,
2027
  "metric_key": "macro_f1",
2028
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2029
+ "scope": "multi_episode_128_aligned_baseline",
2030
  "reason": null
2031
  },
2032
  {
 
2034
  "task_id": "timeline_action",
2035
  "task_label": "Action Recognition",
2036
  "series_id": "metadata128_neural_mlp",
2037
+ "method": "128ep Aligned NN",
2038
  "status": "scored",
2039
  "status_label": "scored",
2040
  "scored": true,
 
2044
  "normalized_score": 0.004175793689174209,
2045
  "metric_key": "macro_f1",
2046
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2047
+ "scope": "multi_episode_128_aligned_baseline",
2048
  "reason": null
2049
  },
2050
  {
 
2142
  "task_id": "timeline_subtask",
2143
  "task_label": "Procedure Step Recognition",
2144
  "series_id": "metadata128_simple",
2145
+ "method": "128ep Aligned Simple",
2146
  "status": "scored",
2147
  "status_label": "scored",
2148
  "scored": true,
 
2152
  "normalized_score": 0.00019512195121951218,
2153
  "metric_key": "macro_f1",
2154
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2155
+ "scope": "multi_episode_128_aligned_baseline",
2156
  "reason": null
2157
  },
2158
  {
 
2160
  "task_id": "timeline_subtask",
2161
  "task_label": "Procedure Step Recognition",
2162
  "series_id": "metadata128_neural_mlp",
2163
+ "method": "128ep Aligned NN",
2164
  "status": "scored",
2165
  "status_label": "scored",
2166
  "scored": true,
 
2170
  "normalized_score": 7.207207207207208e-05,
2171
  "metric_key": "macro_f1",
2172
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2173
+ "scope": "multi_episode_128_aligned_baseline",
2174
  "reason": null
2175
  },
2176
  {
 
2268
  "task_id": "transition_detection",
2269
  "task_label": "Action Boundary Detection",
2270
  "series_id": "metadata128_simple",
2271
+ "method": "128ep Aligned Simple",
2272
  "status": "scored",
2273
  "status_label": "scored",
2274
  "scored": true,
 
2278
  "normalized_score": 0.29652162550029315,
2279
  "metric_key": "macro_f1",
2280
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2281
+ "scope": "multi_episode_128_aligned_baseline",
2282
  "reason": null
2283
  },
2284
  {
 
2286
  "task_id": "transition_detection",
2287
  "task_label": "Action Boundary Detection",
2288
  "series_id": "metadata128_neural_mlp",
2289
+ "method": "128ep Aligned NN",
2290
  "status": "scored",
2291
  "status_label": "scored",
2292
  "scored": true,
 
2296
  "normalized_score": 0.4841733292368365,
2297
  "metric_key": "macro_f1",
2298
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2299
+ "scope": "multi_episode_128_aligned_baseline",
2300
  "reason": null
2301
  },
2302
  {
 
2394
  "task_id": "next_action",
2395
  "task_label": "Next-Action Prediction",
2396
  "series_id": "metadata128_simple",
2397
+ "method": "128ep Aligned Simple",
2398
  "status": "scored",
2399
  "status_label": "scored",
2400
  "scored": true,
 
2404
  "normalized_score": 0.006514774539765508,
2405
  "metric_key": "macro_f1",
2406
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
2407
+ "scope": "multi_episode_128_aligned_baseline",
2408
  "reason": null
2409
  },
2410
  {
 
2412
  "task_id": "next_action",
2413
  "task_label": "Next-Action Prediction",
2414
  "series_id": "metadata128_neural_mlp",
2415
+ "method": "128ep Aligned NN",
2416
  "status": "scored",
2417
  "status_label": "scored",
2418
  "scored": true,
 
2422
  "normalized_score": 0.004910507980164745,
2423
  "metric_key": "macro_f1",
2424
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
2425
+ "scope": "multi_episode_128_aligned_baseline",
2426
  "reason": null
2427
  },
2428
  {
 
2520
  "task_id": "hand_trajectory_forecast",
2521
  "task_label": "Hand Trajectory Forecasting",
2522
  "series_id": "metadata128_simple",
2523
+ "method": "128ep Aligned Simple",
2524
+ "status": "scored",
2525
+ "status_label": "scored",
2526
+ "scored": true,
2527
  "proxy_scored": false,
2528
+ "raw": 8.817333221435547,
2529
+ "raw_text": "8.817",
2530
+ "normalized_score": 0.012231610603598841,
2531
  "metric_key": "mpjpe",
2532
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
2533
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2534
+ "reason": null
2535
  },
2536
  {
2537
  "task_number": 5,
2538
  "task_id": "hand_trajectory_forecast",
2539
  "task_label": "Hand Trajectory Forecasting",
2540
  "series_id": "metadata128_neural_mlp",
2541
+ "method": "128ep Aligned NN",
2542
+ "status": "scored",
2543
+ "status_label": "scored",
2544
+ "scored": true,
2545
  "proxy_scored": false,
2546
+ "raw": 0.429434210062027,
2547
+ "raw_text": "0.4294",
2548
+ "normalized_score": 0.25114484128127007,
2549
  "metric_key": "mpjpe",
2550
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
2551
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2552
+ "reason": null
2553
  },
2554
  {
2555
  "task_number": 5,
 
2646
  "task_id": "contact_prediction",
2647
  "task_label": "Contact State Prediction",
2648
  "series_id": "metadata128_simple",
2649
+ "method": "128ep Aligned Simple",
2650
  "status": "scored",
2651
  "status_label": "scored",
2652
  "scored": true,
 
2656
  "normalized_score": 0.4381481308057444,
2657
  "metric_key": "macro_f1",
2658
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
2659
+ "scope": "multi_episode_128_aligned_baseline",
2660
  "reason": null
2661
  },
2662
  {
 
2664
  "task_id": "contact_prediction",
2665
  "task_label": "Contact State Prediction",
2666
  "series_id": "metadata128_neural_mlp",
2667
+ "method": "128ep Aligned NN",
2668
  "status": "scored",
2669
  "status_label": "scored",
2670
  "scored": true,
 
2674
  "normalized_score": 0.5682695682695682,
2675
  "metric_key": "macro_f1",
2676
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
2677
+ "scope": "multi_episode_128_aligned_baseline",
2678
  "reason": null
2679
  },
2680
  {
 
2772
  "task_id": "object_relevance",
2773
  "task_label": "Object Relevance Prediction",
2774
  "series_id": "metadata128_simple",
2775
+ "method": "128ep Aligned Simple",
2776
  "status": "scored",
2777
  "status_label": "scored",
2778
  "scored": true,
 
2782
  "normalized_score": 0.17764578833693304,
2783
  "metric_key": "micro_f1",
2784
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
2785
+ "scope": "multi_episode_128_aligned_baseline",
2786
  "reason": null
2787
  },
2788
  {
 
2790
  "task_id": "object_relevance",
2791
  "task_label": "Object Relevance Prediction",
2792
  "series_id": "metadata128_neural_mlp",
2793
+ "method": "128ep Aligned NN",
2794
  "status": "scored",
2795
  "status_label": "scored",
2796
  "scored": true,
 
2800
  "normalized_score": 0.18662723837686876,
2801
  "metric_key": "micro_f1",
2802
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
2803
+ "scope": "multi_episode_128_aligned_baseline",
2804
  "reason": null
2805
  },
2806
  {
 
2898
  "task_id": "caption_grounding",
2899
  "task_label": "Language Grounding",
2900
  "series_id": "metadata128_simple",
2901
+ "method": "128ep Aligned Simple",
2902
  "status": "scored",
2903
  "status_label": "scored",
2904
  "scored": true,
 
2908
  "normalized_score": 0.002332374220713973,
2909
  "metric_key": "mrr",
2910
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
2911
+ "scope": "multi_episode_128_aligned_baseline",
2912
  "reason": null
2913
  },
2914
  {
 
2916
  "task_id": "caption_grounding",
2917
  "task_label": "Language Grounding",
2918
  "series_id": "metadata128_neural_mlp",
2919
+ "method": "128ep Aligned NN",
2920
  "status": "scored",
2921
  "status_label": "scored",
2922
  "scored": true,
 
2926
  "normalized_score": 0.008236799389123917,
2927
  "metric_key": "mrr",
2928
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
2929
+ "scope": "multi_episode_128_aligned_baseline",
2930
  "reason": null
2931
  },
2932
  {
 
3024
  "task_id": "cross_modal_retrieval",
3025
  "task_label": "Cross-Modal Retrieval",
3026
  "series_id": "metadata128_simple",
3027
+ "method": "128ep Aligned Simple",
3028
+ "status": "scored",
3029
+ "status_label": "scored",
3030
+ "scored": true,
3031
  "proxy_scored": false,
3032
+ "raw": 0.002587692579254508,
3033
+ "raw_text": "0.0026",
3034
+ "normalized_score": 0.002587692579254508,
3035
  "metric_key": "mrr",
3036
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3037
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3038
+ "reason": null
3039
  },
3040
  {
3041
  "task_number": 9,
3042
  "task_id": "cross_modal_retrieval",
3043
  "task_label": "Cross-Modal Retrieval",
3044
  "series_id": "metadata128_neural_mlp",
3045
+ "method": "128ep Aligned NN",
3046
+ "status": "scored",
3047
+ "status_label": "scored",
3048
+ "scored": true,
3049
  "proxy_scored": false,
3050
+ "raw": 0.0026067993603646755,
3051
+ "raw_text": "0.0026",
3052
+ "normalized_score": 0.0026067993603646755,
3053
  "metric_key": "mrr",
3054
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
3055
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3056
+ "reason": null
3057
  },
3058
  {
3059
  "task_number": 9,
 
3150
  "task_id": "modality_reconstruction",
3151
  "task_label": "Cross-Modal Reconstruction",
3152
  "series_id": "metadata128_simple",
3153
+ "method": "128ep Aligned Simple",
3154
+ "status": "scored",
3155
+ "status_label": "scored",
3156
+ "scored": true,
3157
  "proxy_scored": false,
3158
+ "raw": -190.66106203944798,
3159
+ "raw_text": "-190.66",
3160
+ "normalized_score": 0.0,
3161
  "metric_key": "r2",
3162
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
3163
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3164
+ "reason": null
3165
  },
3166
  {
3167
  "task_number": 10,
3168
  "task_id": "modality_reconstruction",
3169
  "task_label": "Cross-Modal Reconstruction",
3170
  "series_id": "metadata128_neural_mlp",
3171
+ "method": "128ep Aligned NN",
3172
+ "status": "scored",
3173
+ "status_label": "scored",
3174
+ "scored": true,
3175
  "proxy_scored": false,
3176
+ "raw": -0.43481132003942147,
3177
+ "raw_text": "-0.4348",
3178
+ "normalized_score": 0.0,
3179
  "metric_key": "r2",
3180
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
3181
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3182
+ "reason": null
3183
  },
3184
  {
3185
  "task_number": 10,
 
3276
  "task_id": "temporal_order",
3277
  "task_label": "Temporal Order Verification",
3278
  "series_id": "metadata128_simple",
3279
+ "method": "128ep Aligned Simple",
3280
  "status": "scored",
3281
  "status_label": "scored",
3282
  "scored": true,
 
3286
  "normalized_score": 0.4198864140782312,
3287
  "metric_key": "f1",
3288
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
3289
+ "scope": "multi_episode_128_aligned_baseline",
3290
  "reason": null
3291
  },
3292
  {
 
3294
  "task_id": "temporal_order",
3295
  "task_label": "Temporal Order Verification",
3296
  "series_id": "metadata128_neural_mlp",
3297
+ "method": "128ep Aligned NN",
3298
  "status": "scored",
3299
  "status_label": "scored",
3300
  "scored": true,
 
3304
  "normalized_score": 0.8252408266656923,
3305
  "metric_key": "f1",
3306
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
3307
+ "scope": "multi_episode_128_aligned_baseline",
3308
  "reason": null
3309
  },
3310
  {
 
3402
  "task_id": "misalignment_detection",
3403
  "task_label": "Multimodal Synchronization Detection",
3404
  "series_id": "metadata128_simple",
3405
+ "method": "128ep Aligned Simple",
3406
+ "status": "scored",
3407
+ "status_label": "scored",
3408
+ "scored": true,
3409
  "proxy_scored": false,
3410
+ "raw": 0.49980060227663614,
3411
+ "raw_text": "0.4998",
3412
+ "normalized_score": 0.49980060227663614,
3413
  "metric_key": "f1",
3414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
3415
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3416
+ "reason": null
3417
  },
3418
  {
3419
  "task_number": 12,
3420
  "task_id": "misalignment_detection",
3421
  "task_label": "Multimodal Synchronization Detection",
3422
  "series_id": "metadata128_neural_mlp",
3423
+ "method": "128ep Aligned NN",
3424
+ "status": "scored",
3425
+ "status_label": "scored",
3426
+ "scored": true,
3427
  "proxy_scored": false,
3428
+ "raw": 0.7773773780941162,
3429
+ "raw_text": "0.7774",
3430
+ "normalized_score": 0.7773773780941162,
3431
  "metric_key": "f1",
3432
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
3433
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3434
+ "reason": null
3435
  },
3436
  {
3437
  "task_number": 12,
 
3528
  "task_id": "long_horizon_next_action",
3529
  "task_label": "Long-Horizon Next-Action Forecasting",
3530
  "series_id": "metadata128_simple",
3531
+ "method": "128ep Aligned Simple",
3532
  "status": "scored",
3533
  "status_label": "scored",
3534
  "scored": true,
 
3538
  "normalized_score": 0.004579592783699693,
3539
  "metric_key": "macro_f1",
3540
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
3541
+ "scope": "multi_episode_128_aligned_baseline",
3542
  "reason": null
3543
  },
3544
  {
 
3546
  "task_id": "long_horizon_next_action",
3547
  "task_label": "Long-Horizon Next-Action Forecasting",
3548
  "series_id": "metadata128_neural_mlp",
3549
+ "method": "128ep Aligned NN",
3550
  "status": "scored",
3551
  "status_label": "scored",
3552
  "scored": true,
 
3556
  "normalized_score": 0.0029821307969142615,
3557
  "metric_key": "macro_f1",
3558
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
3559
+ "scope": "multi_episode_128_aligned_baseline",
3560
  "reason": null
3561
  },
3562
  {
 
3654
  "task_id": "next_subtask_forecast",
3655
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3656
  "series_id": "metadata128_simple",
3657
+ "method": "128ep Aligned Simple",
3658
  "status": "scored",
3659
  "status_label": "scored",
3660
  "scored": true,
 
3664
  "normalized_score": 0.0001206030150753769,
3665
  "metric_key": "macro_f1",
3666
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
3667
+ "scope": "multi_episode_128_aligned_baseline",
3668
  "reason": null
3669
  },
3670
  {
 
3672
  "task_id": "next_subtask_forecast",
3673
  "task_label": "Long-Horizon Next-Subtask Forecasting",
3674
  "series_id": "metadata128_neural_mlp",
3675
+ "method": "128ep Aligned NN",
3676
  "status": "scored",
3677
  "status_label": "scored",
3678
  "scored": true,
 
3682
  "normalized_score": 2.086049543676662e-05,
3683
  "metric_key": "macro_f1",
3684
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
3685
+ "scope": "multi_episode_128_aligned_baseline",
3686
  "reason": null
3687
  },
3688
  {
 
3780
  "task_id": "interaction_text_prediction",
3781
  "task_label": "Interaction Text Prediction",
3782
  "series_id": "metadata128_simple",
3783
+ "method": "128ep Aligned Simple",
3784
  "status": "unsupported_without_required_target",
3785
  "status_label": "unsupported",
3786
  "scored": false,
 
3790
  "normalized_score": null,
3791
  "metric_key": "macro_f1",
3792
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
3793
+ "scope": "multi_episode_128_aligned_baseline",
3794
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
3795
  },
3796
  {
 
3798
  "task_id": "interaction_text_prediction",
3799
  "task_label": "Interaction Text Prediction",
3800
  "series_id": "metadata128_neural_mlp",
3801
+ "method": "128ep Aligned NN",
3802
  "status": "not_supported_by_metadata_only_package",
3803
  "status_label": "not supported",
3804
  "scored": false,
 
3808
  "normalized_score": null,
3809
  "metric_key": "macro_f1",
3810
  "source": null,
3811
+ "scope": "multi_episode_128_aligned_baseline",
3812
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
3813
  },
3814
  {
3815
  "task_number": 15,
 
3906
  "task_id": "action_object_relation",
3907
  "task_label": "Action-Object Relation Prediction",
3908
  "series_id": "metadata128_simple",
3909
+ "method": "128ep Aligned Simple",
3910
  "status": "scored",
3911
  "status_label": "scored",
3912
  "scored": true,
 
3916
  "normalized_score": 0.0,
3917
  "metric_key": "macro_f1",
3918
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
3919
+ "scope": "multi_episode_128_aligned_baseline",
3920
  "reason": null
3921
  },
3922
  {
 
3924
  "task_id": "action_object_relation",
3925
  "task_label": "Action-Object Relation Prediction",
3926
  "series_id": "metadata128_neural_mlp",
3927
+ "method": "128ep Aligned NN",
3928
  "status": "scored",
3929
  "status_label": "scored",
3930
  "scored": true,
 
3934
  "normalized_score": 0.0,
3935
  "metric_key": "macro_f1",
3936
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
3937
+ "scope": "multi_episode_128_aligned_baseline",
3938
  "reason": null
3939
  },
3940
  {
 
4032
  "task_id": "object_set_forecast",
4033
  "task_label": "Future Object-Set Forecasting",
4034
  "series_id": "metadata128_simple",
4035
+ "method": "128ep Aligned Simple",
4036
  "status": "scored",
4037
  "status_label": "scored",
4038
  "scored": true,
 
4042
  "normalized_score": 0.17656983343047333,
4043
  "metric_key": "micro_f1",
4044
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
4045
+ "scope": "multi_episode_128_aligned_baseline",
4046
  "reason": null
4047
  },
4048
  {
 
4050
  "task_id": "object_set_forecast",
4051
  "task_label": "Future Object-Set Forecasting",
4052
  "series_id": "metadata128_neural_mlp",
4053
+ "method": "128ep Aligned NN",
4054
  "status": "scored",
4055
  "status_label": "scored",
4056
  "scored": true,
 
4060
  "normalized_score": 0.17418550827844048,
4061
  "metric_key": "micro_f1",
4062
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
4063
+ "scope": "multi_episode_128_aligned_baseline",
4064
  "reason": null
4065
  },
4066
  {
 
4158
  "task_id": "imu_to_hand_pose",
4159
  "task_label": "IMU-to-Hand Pose Reconstruction",
4160
  "series_id": "metadata128_simple",
4161
+ "method": "128ep Aligned Simple",
4162
+ "status": "scored",
4163
+ "status_label": "scored",
4164
+ "scored": true,
4165
  "proxy_scored": false,
4166
+ "raw": 0.2294670194387436,
4167
+ "raw_text": "0.2295",
4168
+ "normalized_score": 0.18324815505876868,
4169
  "metric_key": "mae",
4170
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
4171
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4172
+ "reason": null
4173
  },
4174
  {
4175
  "task_number": 18,
4176
  "task_id": "imu_to_hand_pose",
4177
  "task_label": "IMU-to-Hand Pose Reconstruction",
4178
  "series_id": "metadata128_neural_mlp",
4179
+ "method": "128ep Aligned NN",
4180
+ "status": "scored",
4181
+ "status_label": "scored",
4182
+ "scored": true,
4183
  "proxy_scored": false,
4184
+ "raw": 0.2555866539478302,
4185
+ "raw_text": "0.2556",
4186
+ "normalized_score": 0.16452114110609004,
4187
  "metric_key": "mae",
4188
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
4189
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4190
+ "reason": null
4191
  },
4192
  {
4193
  "task_number": 18,
 
4284
  "task_id": "camera_view_sync_retrieval",
4285
  "task_label": "Camera-View Synchronization Retrieval",
4286
  "series_id": "metadata128_simple",
4287
+ "method": "128ep Aligned Simple",
4288
  "status": "unsupported_without_required_target",
4289
  "status_label": "unsupported",
4290
  "scored": false,
 
4294
  "normalized_score": null,
4295
  "metric_key": "mrr",
4296
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
4297
+ "scope": "multi_episode_128_aligned_baseline",
4298
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
4299
  },
4300
  {
 
4302
  "task_id": "camera_view_sync_retrieval",
4303
  "task_label": "Camera-View Synchronization Retrieval",
4304
  "series_id": "metadata128_neural_mlp",
4305
+ "method": "128ep Aligned NN",
4306
  "status": "not_supported_by_metadata_only_package",
4307
  "status_label": "not supported",
4308
  "scored": false,
 
4312
  "normalized_score": null,
4313
  "metric_key": "mrr",
4314
  "source": null,
4315
+ "scope": "multi_episode_128_aligned_baseline",
4316
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
4317
  },
4318
  {
4319
  "task_number": 19,
 
4410
  "task_id": "time_to_transition",
4411
  "task_label": "Time-to-Next-Transition Regression",
4412
  "series_id": "metadata128_simple",
4413
+ "method": "128ep Aligned Simple",
4414
  "status": "scored",
4415
  "status_label": "scored",
4416
  "scored": true,
 
4420
  "normalized_score": 0.016864874132806403,
4421
  "metric_key": "mae",
4422
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
4423
+ "scope": "multi_episode_128_aligned_baseline",
4424
  "reason": null
4425
  },
4426
  {
 
4428
  "task_id": "time_to_transition",
4429
  "task_label": "Time-to-Next-Transition Regression",
4430
  "series_id": "metadata128_neural_mlp",
4431
+ "method": "128ep Aligned NN",
4432
  "status": "scored",
4433
  "status_label": "scored",
4434
  "scored": true,
 
4438
  "normalized_score": 0.25411768748242325,
4439
  "metric_key": "mae",
4440
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
4441
+ "scope": "multi_episode_128_aligned_baseline",
4442
  "reason": null
4443
  },
4444
  {
metrics/mirror_parity.json CHANGED
The diff for this file is too large to render. See raw diff
 
metrics/omni_model_comparison.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
- "generated_at_utc": "2026-06-13T18:14:42+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
 
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
+ "generated_at_utc": "2026-06-18T12:52:47+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
metrics/public_surface_qa.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
@@ -18,7 +18,7 @@
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
- "generated_at_utc": "2026-06-18T11:41:43+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
@@ -28,27 +28,27 @@
28
  "task_surface_integrity": {
29
  "exists": true,
30
  "status": "pass",
31
- "generated_at_utc": "2026-06-18T11:18:04+00:00"
32
  },
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
- "generated_at_utc": "2026-06-18T11:18:04+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
- "generated_at_utc": "2026-06-18T11:18:06+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
- "generated_at_utc": "2026-06-18T11:42:48+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
- "generated_at_utc": "2026-06-18T11:43:59+00:00"
52
  }
53
  },
54
  "failures": {}
 
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:53:13+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
 
18
  "website_integrity": {
19
  "exists": true,
20
  "status": "pass",
21
+ "generated_at_utc": "2026-06-18T12:09:46+00:00"
22
  },
23
  "rendered_site_check": {
24
  "exists": true,
 
28
  "task_surface_integrity": {
29
  "exists": true,
30
  "status": "pass",
31
+ "generated_at_utc": "2026-06-18T12:09:25+00:00"
32
  },
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
+ "generated_at_utc": "2026-06-18T12:09:45+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
+ "generated_at_utc": "2026-06-18T12:09:48+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
45
  "status": "pass",
46
+ "generated_at_utc": "2026-06-18T12:24:04+00:00"
47
  },
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
+ "generated_at_utc": "2026-06-18T12:24:00+00:00"
52
  }
53
  },
54
  "failures": {}
metrics/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:10:47+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
@@ -215,8 +215,8 @@
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
- "file_count": 1321,
219
- "text_file_count": 1108,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
@@ -226,8 +226,8 @@
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
- "file_count": 1103,
230
- "text_file_count": 915,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
@@ -237,8 +237,8 @@
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
- "file_count": 2582,
241
- "text_file_count": 1121,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
@@ -248,8 +248,8 @@
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
- "file_count": 3001,
252
- "text_file_count": 1283,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T13:02:10+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
215
  "github_repo": {
216
  "root": "repo",
217
  "exists": true,
218
+ "file_count": 1352,
219
+ "text_file_count": 1129,
220
  "largest_file": {
221
  "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
222
  "bytes": 55702978
 
226
  "hf_space_bundle": {
227
  "root": "hf_publish/space",
228
  "exists": true,
229
+ "file_count": 1221,
230
+ "text_file_count": 992,
231
  "largest_file": {
232
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
233
  "bytes": 135591061
 
237
  "hf_artifact_bundle": {
238
  "root": "hf_publish/artifacts",
239
  "exists": true,
240
+ "file_count": 2648,
241
+ "text_file_count": 1141,
242
  "largest_file": {
243
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
244
  "bytes": 135591061
 
248
  "hf_model_bundle": {
249
  "root": "hf_publish/model",
250
  "exists": true,
251
+ "file_count": 3112,
252
+ "text_file_count": 1309,
253
  "largest_file": {
254
  "path": "results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl",
255
  "bytes": 135591061
metrics/quality_gates.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:09:24+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:53:13+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
metrics/qwen3_full_parameter_gates.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Qwen3-Omni Full-Parameter Feasibility Gates",
3
- "generated_at_utc": "2026-06-13T18:14:32+00:00",
4
  "status": "pass",
5
  "decision": "full_parameter_feasible_for_guarded_short_runs_not_promoted",
6
  "interpretation": "The full-parameter gates prove that Qwen3-Omni full-parameter FSDP can load, prepare, run backward/optimizer steps, and complete guarded pilots up to 256 optimizer steps on an 8-GPU remote worker. They do not prove a production full-parameter fine-tune, and they intentionally save no full checkpoints or public weights.",
 
1
  {
2
  "title": "Qwen3-Omni Full-Parameter Feasibility Gates",
3
+ "generated_at_utc": "2026-06-18T12:53:13+00:00",
4
  "status": "pass",
5
  "decision": "full_parameter_feasible_for_guarded_short_runs_not_promoted",
6
  "interpretation": "The full-parameter gates prove that Qwen3-Omni full-parameter FSDP can load, prepare, run backward/optimizer steps, and complete guarded pilots up to 256 optimizer steps on an 8-GPU remote worker. They do not prove a production full-parameter fine-tune, and they intentionally save no full checkpoints or public weights.",
metrics/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:09:48+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:54:20+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
metrics/single_episode_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
@@ -13,7 +13,7 @@
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
 
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
 
13
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
14
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
15
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
16
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
17
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
18
  },
19
  "source_unified_radar": "docs/data/unified_task_model_radar.json",
metrics/source_alignment_audit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:09:45+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
 
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:54:18+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
metrics/task_method_20_gap_audit.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "generated_at_utc": "2026-06-18T12:07:14+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
- "purpose": "Keep the 53 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
@@ -45,30 +45,29 @@
45
  }
46
  },
47
  "metadata128_neural_mlp": {
48
- "kind": "partial_128_episode_metadata_baseline",
49
- "label": "128ep Metadata NN",
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
- "scope": "128 selected episodes, JSONL metadata/text only",
53
- "scored_task_count": 7,
54
- "scoreless_task_count": 13,
55
  "status_counts": {
56
- "not_supported_by_metadata_only_package": 7,
57
- "scored": 7,
58
- "unsupported_without_required_target": 6
59
  }
60
  },
61
  "metadata128_simple": {
62
- "kind": "partial_128_episode_metadata_baseline",
63
- "label": "128ep Metadata Simple",
64
  "proxy_scored_task_count": 0,
65
  "result_record_count": 20,
66
- "scope": "128 selected episodes, JSONL metadata/text only",
67
- "scored_task_count": 13,
68
- "scoreless_task_count": 7,
69
  "status_counts": {
70
- "scored": 13,
71
- "unsupported_without_required_target": 7
72
  }
73
  },
74
  "minimal": {
@@ -138,31 +137,22 @@
138
  "missing_by_method": {
139
  "cosmos3_nano_future_window": 15,
140
  "cosmos3_super_reasoner": 13,
141
- "metadata128_neural_mlp": 13,
142
- "metadata128_simple": 7,
143
  "qwen3_omni_v6_lora": 5
144
  },
145
  "missing_by_status": {
146
  "not_evaluated_in_verified_package": 33,
147
- "not_supported_by_metadata_only_package": 7,
148
- "unsupported_without_required_target": 13
149
  },
150
  "missing_by_task": {
151
- "01 Action Recognition": [
152
- "metadata128_neural_mlp"
153
- ],
154
  "02 Procedure Step Recognition": [
155
- "cosmos3_nano_future_window",
156
- "metadata128_neural_mlp"
157
- ],
158
- "04 Next-Action Prediction": [
159
- "metadata128_neural_mlp"
160
  ],
161
  "05 Hand Trajectory Forecasting": [
162
  "cosmos3_nano_future_window",
163
  "cosmos3_super_reasoner",
164
- "metadata128_neural_mlp",
165
- "metadata128_simple",
166
  "qwen3_omni_v6_lora"
167
  ],
168
  "07 Object Relevance Prediction": [
@@ -173,15 +163,11 @@
173
  "cosmos3_super_reasoner"
174
  ],
175
  "09 Cross-Modal Retrieval": [
176
- "cosmos3_super_reasoner",
177
- "metadata128_neural_mlp",
178
- "metadata128_simple"
179
  ],
180
  "10 Cross-Modal Reconstruction": [
181
  "cosmos3_nano_future_window",
182
  "cosmos3_super_reasoner",
183
- "metadata128_neural_mlp",
184
- "metadata128_simple",
185
  "qwen3_omni_v6_lora"
186
  ],
187
  "11 Temporal Order Verification": [
@@ -190,19 +176,15 @@
190
  ],
191
  "12 Multimodal Synchronization Detection": [
192
  "cosmos3_nano_future_window",
193
- "cosmos3_super_reasoner",
194
- "metadata128_neural_mlp",
195
- "metadata128_simple"
196
  ],
197
  "13 Long-Horizon Next-Action Forecasting": [
198
  "cosmos3_nano_future_window",
199
- "cosmos3_super_reasoner",
200
- "metadata128_neural_mlp"
201
  ],
202
  "14 Long-Horizon Next-Subtask Forecasting": [
203
  "cosmos3_nano_future_window",
204
- "cosmos3_super_reasoner",
205
- "metadata128_neural_mlp"
206
  ],
207
  "15 Interaction Text Prediction": [
208
  "cosmos3_nano_future_window",
@@ -212,8 +194,7 @@
212
  "qwen3_omni_v6_lora"
213
  ],
214
  "16 Action-Object Relation Prediction": [
215
- "cosmos3_nano_future_window",
216
- "metadata128_neural_mlp"
217
  ],
218
  "17 Future Object-Set Forecasting": [
219
  "cosmos3_nano_future_window",
@@ -222,8 +203,6 @@
222
  "18 IMU-to-Hand Pose Reconstruction": [
223
  "cosmos3_nano_future_window",
224
  "cosmos3_super_reasoner",
225
- "metadata128_neural_mlp",
226
- "metadata128_simple",
227
  "qwen3_omni_v6_lora"
228
  ],
229
  "19 Camera-View Synchronization Retrieval": [
@@ -239,32 +218,6 @@
239
  ]
240
  },
241
  "missing_records": [
242
- {
243
- "method": "128ep Metadata NN",
244
- "metric_key": "macro_f1",
245
- "reason": "train class count 896 exceeds --max-neural-classes 512",
246
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
247
- "scope": "multi_episode_128_metadata_baseline",
248
- "series_id": "metadata128_neural_mlp",
249
- "status": "unsupported_without_required_target",
250
- "status_label": "unsupported",
251
- "task_id": "timeline_action",
252
- "task_label": "Action Recognition",
253
- "task_number": 1
254
- },
255
- {
256
- "method": "128ep Metadata NN",
257
- "metric_key": "macro_f1",
258
- "reason": "train class count 652 exceeds --max-neural-classes 512",
259
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
260
- "scope": "multi_episode_128_metadata_baseline",
261
- "series_id": "metadata128_neural_mlp",
262
- "status": "unsupported_without_required_target",
263
- "status_label": "unsupported",
264
- "task_id": "timeline_subtask",
265
- "task_label": "Procedure Step Recognition",
266
- "task_number": 2
267
- },
268
  {
269
  "method": "Cosmos3-Nano Future Window",
270
  "metric_key": "macro_f1",
@@ -278,45 +231,6 @@
278
  "task_label": "Procedure Step Recognition",
279
  "task_number": 2
280
  },
281
- {
282
- "method": "128ep Metadata NN",
283
- "metric_key": "macro_f1",
284
- "reason": "train class count 891 exceeds --max-neural-classes 512",
285
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
286
- "scope": "multi_episode_128_metadata_baseline",
287
- "series_id": "metadata128_neural_mlp",
288
- "status": "unsupported_without_required_target",
289
- "status_label": "unsupported",
290
- "task_id": "next_action",
291
- "task_label": "Next-Action Prediction",
292
- "task_number": 4
293
- },
294
- {
295
- "method": "128ep Metadata Simple",
296
- "metric_key": "mpjpe",
297
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package",
298
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
299
- "scope": "multi_episode_128_metadata_baseline",
300
- "series_id": "metadata128_simple",
301
- "status": "unsupported_without_required_target",
302
- "status_label": "unsupported",
303
- "task_id": "hand_trajectory_forecast",
304
- "task_label": "Hand Trajectory Forecasting",
305
- "task_number": 5
306
- },
307
- {
308
- "method": "128ep Metadata NN",
309
- "metric_key": "mpjpe",
310
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
311
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
312
- "scope": "multi_episode_128_metadata_baseline",
313
- "series_id": "metadata128_neural_mlp",
314
- "status": "not_supported_by_metadata_only_package",
315
- "status_label": "not supported",
316
- "task_id": "hand_trajectory_forecast",
317
- "task_label": "Hand Trajectory Forecasting",
318
- "task_number": 5
319
- },
320
  {
321
  "method": "Qwen3-Omni v6 LoRA",
322
  "metric_key": "mpjpe",
@@ -395,32 +309,6 @@
395
  "task_label": "Language Grounding",
396
  "task_number": 8
397
  },
398
- {
399
- "method": "128ep Metadata Simple",
400
- "metric_key": "mrr",
401
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package",
402
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
403
- "scope": "multi_episode_128_metadata_baseline",
404
- "series_id": "metadata128_simple",
405
- "status": "unsupported_without_required_target",
406
- "status_label": "unsupported",
407
- "task_id": "cross_modal_retrieval",
408
- "task_label": "Cross-Modal Retrieval",
409
- "task_number": 9
410
- },
411
- {
412
- "method": "128ep Metadata NN",
413
- "metric_key": "mrr",
414
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
415
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
416
- "scope": "multi_episode_128_metadata_baseline",
417
- "series_id": "metadata128_neural_mlp",
418
- "status": "not_supported_by_metadata_only_package",
419
- "status_label": "not supported",
420
- "task_id": "cross_modal_retrieval",
421
- "task_label": "Cross-Modal Retrieval",
422
- "task_number": 9
423
- },
424
  {
425
  "method": "Cosmos3-Super Reasoner",
426
  "metric_key": "mrr",
@@ -434,32 +322,6 @@
434
  "task_label": "Cross-Modal Retrieval",
435
  "task_number": 9
436
  },
437
- {
438
- "method": "128ep Metadata Simple",
439
- "metric_key": "r2",
440
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package",
441
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
442
- "scope": "multi_episode_128_metadata_baseline",
443
- "series_id": "metadata128_simple",
444
- "status": "unsupported_without_required_target",
445
- "status_label": "unsupported",
446
- "task_id": "modality_reconstruction",
447
- "task_label": "Cross-Modal Reconstruction",
448
- "task_number": 10
449
- },
450
- {
451
- "method": "128ep Metadata NN",
452
- "metric_key": "r2",
453
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
454
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
455
- "scope": "multi_episode_128_metadata_baseline",
456
- "series_id": "metadata128_neural_mlp",
457
- "status": "not_supported_by_metadata_only_package",
458
- "status_label": "not supported",
459
- "task_id": "modality_reconstruction",
460
- "task_label": "Cross-Modal Reconstruction",
461
- "task_number": 10
462
- },
463
  {
464
  "method": "Qwen3-Omni v6 LoRA",
465
  "metric_key": "r2",
@@ -525,32 +387,6 @@
525
  "task_label": "Temporal Order Verification",
526
  "task_number": 11
527
  },
528
- {
529
- "method": "128ep Metadata Simple",
530
- "metric_key": "f1",
531
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone",
532
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
533
- "scope": "multi_episode_128_metadata_baseline",
534
- "series_id": "metadata128_simple",
535
- "status": "unsupported_without_required_target",
536
- "status_label": "unsupported",
537
- "task_id": "misalignment_detection",
538
- "task_label": "Multimodal Synchronization Detection",
539
- "task_number": 12
540
- },
541
- {
542
- "method": "128ep Metadata NN",
543
- "metric_key": "f1",
544
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
545
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
546
- "scope": "multi_episode_128_metadata_baseline",
547
- "series_id": "metadata128_neural_mlp",
548
- "status": "not_supported_by_metadata_only_package",
549
- "status_label": "not supported",
550
- "task_id": "misalignment_detection",
551
- "task_label": "Multimodal Synchronization Detection",
552
- "task_number": 12
553
- },
554
  {
555
  "method": "Cosmos3-Super Reasoner",
556
  "metric_key": "f1",
@@ -577,19 +413,6 @@
577
  "task_label": "Multimodal Synchronization Detection",
578
  "task_number": 12
579
  },
580
- {
581
- "method": "128ep Metadata NN",
582
- "metric_key": "macro_f1",
583
- "reason": "train class count 887 exceeds --max-neural-classes 512",
584
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
585
- "scope": "multi_episode_128_metadata_baseline",
586
- "series_id": "metadata128_neural_mlp",
587
- "status": "unsupported_without_required_target",
588
- "status_label": "unsupported",
589
- "task_id": "long_horizon_next_action",
590
- "task_label": "Long-Horizon Next-Action Forecasting",
591
- "task_number": 13
592
- },
593
  {
594
  "method": "Cosmos3-Super Reasoner",
595
  "metric_key": "macro_f1",
@@ -616,19 +439,6 @@
616
  "task_label": "Long-Horizon Next-Action Forecasting",
617
  "task_number": 13
618
  },
619
- {
620
- "method": "128ep Metadata NN",
621
- "metric_key": "macro_f1",
622
- "reason": "train class count 651 exceeds --max-neural-classes 512",
623
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
624
- "scope": "multi_episode_128_metadata_baseline",
625
- "series_id": "metadata128_neural_mlp",
626
- "status": "unsupported_without_required_target",
627
- "status_label": "unsupported",
628
- "task_id": "next_subtask_forecast",
629
- "task_label": "Long-Horizon Next-Subtask Forecasting",
630
- "task_number": 14
631
- },
632
  {
633
  "method": "Cosmos3-Super Reasoner",
634
  "metric_key": "macro_f1",
@@ -656,11 +466,11 @@
656
  "task_number": 14
657
  },
658
  {
659
- "method": "128ep Metadata Simple",
660
  "metric_key": "macro_f1",
661
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
662
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
663
- "scope": "multi_episode_128_metadata_baseline",
664
  "series_id": "metadata128_simple",
665
  "status": "unsupported_without_required_target",
666
  "status_label": "unsupported",
@@ -669,11 +479,11 @@
669
  "task_number": 15
670
  },
671
  {
672
- "method": "128ep Metadata NN",
673
  "metric_key": "macro_f1",
674
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
675
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
676
- "scope": "multi_episode_128_metadata_baseline",
677
  "series_id": "metadata128_neural_mlp",
678
  "status": "not_supported_by_metadata_only_package",
679
  "status_label": "not supported",
@@ -720,19 +530,6 @@
720
  "task_label": "Interaction Text Prediction",
721
  "task_number": 15
722
  },
723
- {
724
- "method": "128ep Metadata NN",
725
- "metric_key": "macro_f1",
726
- "reason": "train class count 3058 exceeds --max-neural-classes 512",
727
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
728
- "scope": "multi_episode_128_metadata_baseline",
729
- "series_id": "metadata128_neural_mlp",
730
- "status": "unsupported_without_required_target",
731
- "status_label": "unsupported",
732
- "task_id": "action_object_relation",
733
- "task_label": "Action-Object Relation Prediction",
734
- "task_number": 16
735
- },
736
  {
737
  "method": "Cosmos3-Nano Future Window",
738
  "metric_key": "macro_f1",
@@ -772,32 +569,6 @@
772
  "task_label": "Future Object-Set Forecasting",
773
  "task_number": 17
774
  },
775
- {
776
- "method": "128ep Metadata Simple",
777
- "metric_key": "mae",
778
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
779
- "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
780
- "scope": "multi_episode_128_metadata_baseline",
781
- "series_id": "metadata128_simple",
782
- "status": "unsupported_without_required_target",
783
- "status_label": "unsupported",
784
- "task_id": "imu_to_hand_pose",
785
- "task_label": "IMU-to-Hand Pose Reconstruction",
786
- "task_number": 18
787
- },
788
- {
789
- "method": "128ep Metadata NN",
790
- "metric_key": "mae",
791
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
792
- "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
793
- "scope": "multi_episode_128_metadata_baseline",
794
- "series_id": "metadata128_neural_mlp",
795
- "status": "not_supported_by_metadata_only_package",
796
- "status_label": "not supported",
797
- "task_id": "imu_to_hand_pose",
798
- "task_label": "IMU-to-Hand Pose Reconstruction",
799
- "task_number": 18
800
- },
801
  {
802
  "method": "Qwen3-Omni v6 LoRA",
803
  "metric_key": "mae",
@@ -838,11 +609,11 @@
838
  "task_number": 18
839
  },
840
  {
841
- "method": "128ep Metadata Simple",
842
  "metric_key": "mrr",
843
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
844
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
845
- "scope": "multi_episode_128_metadata_baseline",
846
  "series_id": "metadata128_simple",
847
  "status": "unsupported_without_required_target",
848
  "status_label": "unsupported",
@@ -851,11 +622,11 @@
851
  "task_number": 19
852
  },
853
  {
854
- "method": "128ep Metadata NN",
855
  "metric_key": "mrr",
856
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
857
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
858
- "scope": "multi_episode_128_metadata_baseline",
859
  "series_id": "metadata128_neural_mlp",
860
  "status": "not_supported_by_metadata_only_package",
861
  "status_label": "not supported",
@@ -975,8 +746,8 @@
975
  "method_count": 9,
976
  "method_task_record_count": 180,
977
  "proxy_scored_method_task_count": 4,
978
- "scored_method_task_count": 127,
979
- "scoreless_method_task_count": 53,
980
  "task_count": 20
981
  },
982
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
 
1
  {
2
+ "generated_at_utc": "2026-06-18T12:52:47+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
6
  "id": "gap_audit",
7
+ "purpose": "Keep the 37 scoreless cells visible and reproducible."
8
  },
9
  {
10
  "artifact": "scripts/omni/score_model_output_probes.py",
 
45
  }
46
  },
47
  "metadata128_neural_mlp": {
48
+ "kind": "partial_128_episode_aligned_baseline",
49
+ "label": "128ep Aligned NN",
50
  "proxy_scored_task_count": 0,
51
  "result_record_count": 20,
52
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
53
+ "scored_task_count": 18,
54
+ "scoreless_task_count": 2,
55
  "status_counts": {
56
+ "not_supported_by_metadata_only_package": 2,
57
+ "scored": 18
 
58
  }
59
  },
60
  "metadata128_simple": {
61
+ "kind": "partial_128_episode_aligned_baseline",
62
+ "label": "128ep Aligned Simple",
63
  "proxy_scored_task_count": 0,
64
  "result_record_count": 20,
65
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
66
+ "scored_task_count": 18,
67
+ "scoreless_task_count": 2,
68
  "status_counts": {
69
+ "scored": 18,
70
+ "unsupported_without_required_target": 2
71
  }
72
  },
73
  "minimal": {
 
137
  "missing_by_method": {
138
  "cosmos3_nano_future_window": 15,
139
  "cosmos3_super_reasoner": 13,
140
+ "metadata128_neural_mlp": 2,
141
+ "metadata128_simple": 2,
142
  "qwen3_omni_v6_lora": 5
143
  },
144
  "missing_by_status": {
145
  "not_evaluated_in_verified_package": 33,
146
+ "not_supported_by_metadata_only_package": 2,
147
+ "unsupported_without_required_target": 2
148
  },
149
  "missing_by_task": {
 
 
 
150
  "02 Procedure Step Recognition": [
151
+ "cosmos3_nano_future_window"
 
 
 
 
152
  ],
153
  "05 Hand Trajectory Forecasting": [
154
  "cosmos3_nano_future_window",
155
  "cosmos3_super_reasoner",
 
 
156
  "qwen3_omni_v6_lora"
157
  ],
158
  "07 Object Relevance Prediction": [
 
163
  "cosmos3_super_reasoner"
164
  ],
165
  "09 Cross-Modal Retrieval": [
166
+ "cosmos3_super_reasoner"
 
 
167
  ],
168
  "10 Cross-Modal Reconstruction": [
169
  "cosmos3_nano_future_window",
170
  "cosmos3_super_reasoner",
 
 
171
  "qwen3_omni_v6_lora"
172
  ],
173
  "11 Temporal Order Verification": [
 
176
  ],
177
  "12 Multimodal Synchronization Detection": [
178
  "cosmos3_nano_future_window",
179
+ "cosmos3_super_reasoner"
 
 
180
  ],
181
  "13 Long-Horizon Next-Action Forecasting": [
182
  "cosmos3_nano_future_window",
183
+ "cosmos3_super_reasoner"
 
184
  ],
185
  "14 Long-Horizon Next-Subtask Forecasting": [
186
  "cosmos3_nano_future_window",
187
+ "cosmos3_super_reasoner"
 
188
  ],
189
  "15 Interaction Text Prediction": [
190
  "cosmos3_nano_future_window",
 
194
  "qwen3_omni_v6_lora"
195
  ],
196
  "16 Action-Object Relation Prediction": [
197
+ "cosmos3_nano_future_window"
 
198
  ],
199
  "17 Future Object-Set Forecasting": [
200
  "cosmos3_nano_future_window",
 
203
  "18 IMU-to-Hand Pose Reconstruction": [
204
  "cosmos3_nano_future_window",
205
  "cosmos3_super_reasoner",
 
 
206
  "qwen3_omni_v6_lora"
207
  ],
208
  "19 Camera-View Synchronization Retrieval": [
 
218
  ]
219
  },
220
  "missing_records": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
  {
222
  "method": "Cosmos3-Nano Future Window",
223
  "metric_key": "macro_f1",
 
231
  "task_label": "Procedure Step Recognition",
232
  "task_number": 2
233
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
  {
235
  "method": "Qwen3-Omni v6 LoRA",
236
  "metric_key": "mpjpe",
 
309
  "task_label": "Language Grounding",
310
  "task_number": 8
311
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
312
  {
313
  "method": "Cosmos3-Super Reasoner",
314
  "metric_key": "mrr",
 
322
  "task_label": "Cross-Modal Retrieval",
323
  "task_number": 9
324
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
325
  {
326
  "method": "Qwen3-Omni v6 LoRA",
327
  "metric_key": "r2",
 
387
  "task_label": "Temporal Order Verification",
388
  "task_number": 11
389
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
390
  {
391
  "method": "Cosmos3-Super Reasoner",
392
  "metric_key": "f1",
 
413
  "task_label": "Multimodal Synchronization Detection",
414
  "task_number": 12
415
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
416
  {
417
  "method": "Cosmos3-Super Reasoner",
418
  "metric_key": "macro_f1",
 
439
  "task_label": "Long-Horizon Next-Action Forecasting",
440
  "task_number": 13
441
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
442
  {
443
  "method": "Cosmos3-Super Reasoner",
444
  "metric_key": "macro_f1",
 
466
  "task_number": 14
467
  },
468
  {
469
+ "method": "128ep Aligned Simple",
470
  "metric_key": "macro_f1",
471
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
472
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
473
+ "scope": "multi_episode_128_aligned_baseline",
474
  "series_id": "metadata128_simple",
475
  "status": "unsupported_without_required_target",
476
  "status_label": "unsupported",
 
479
  "task_number": 15
480
  },
481
  {
482
+ "method": "128ep Aligned NN",
483
  "metric_key": "macro_f1",
484
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
485
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
486
+ "scope": "multi_episode_128_aligned_baseline",
487
  "series_id": "metadata128_neural_mlp",
488
  "status": "not_supported_by_metadata_only_package",
489
  "status_label": "not supported",
 
530
  "task_label": "Interaction Text Prediction",
531
  "task_number": 15
532
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
533
  {
534
  "method": "Cosmos3-Nano Future Window",
535
  "metric_key": "macro_f1",
 
569
  "task_label": "Future Object-Set Forecasting",
570
  "task_number": 17
571
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
572
  {
573
  "method": "Qwen3-Omni v6 LoRA",
574
  "metric_key": "mae",
 
609
  "task_number": 18
610
  },
611
  {
612
+ "method": "128ep Aligned Simple",
613
  "metric_key": "mrr",
614
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
615
  "recommended_next_step": "Export the missing target field for this 128-episode method, then rerun the same train/validation/test split.",
616
+ "scope": "multi_episode_128_aligned_baseline",
617
  "series_id": "metadata128_simple",
618
  "status": "unsupported_without_required_target",
619
  "status_label": "unsupported",
 
622
  "task_number": 19
623
  },
624
  {
625
+ "method": "128ep Aligned NN",
626
  "metric_key": "mrr",
627
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
628
  "recommended_next_step": "Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score.",
629
+ "scope": "multi_episode_128_aligned_baseline",
630
  "series_id": "metadata128_neural_mlp",
631
  "status": "not_supported_by_metadata_only_package",
632
  "status_label": "not supported",
 
746
  "method_count": 9,
747
  "method_task_record_count": 180,
748
  "proxy_scored_method_task_count": 4,
749
+ "scored_method_task_count": 143,
750
+ "scoreless_method_task_count": 37,
751
  "task_count": 20
752
  },
753
  "source_matrix": "docs/data/task_method_20_result_matrix.json",
metrics/task_method_20_result_matrix.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 133,
9
  "series": [
10
  {
11
  "id": "minimal",
@@ -55,50 +55,50 @@
55
  },
56
  {
57
  "id": "metadata128_simple",
58
- "label": "128ep Metadata Simple",
59
  "short_label": "128-S",
60
  "color": "#ffd166",
61
- "kind": "partial_128_episode_metadata_baseline",
62
- "scope": "128 selected episodes, JSONL metadata/text only",
63
  "stroke_dasharray": "9 6",
64
- "method_detail": "128-episode JSONL metadata/text simple baselines.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
- "scored_task_count": 13,
68
- "covered_task_count": 13,
69
  "proxy_scored_task_count": 0,
70
- "scoreless_task_count": 7,
71
- "unsupported_task_count": 7,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
- "scored": 13,
75
- "unsupported_without_required_target": 7
76
  },
77
- "coverage_fraction": 0.65,
78
  "result_record_fraction": 1.0
79
  },
80
  {
81
  "id": "metadata128_neural_mlp",
82
- "label": "128ep Metadata NN",
83
  "short_label": "128-NN",
84
  "color": "#f472b6",
85
- "kind": "partial_128_episode_metadata_baseline",
86
- "scope": "128 selected episodes, JSONL metadata/text only",
87
  "stroke_dasharray": "3 6",
88
- "method_detail": "128-episode JSONL metadata/text MLP baselines.",
89
  "plotted_as": "colored point overlay",
90
  "result_record_count": 20,
91
- "scored_task_count": 13,
92
- "covered_task_count": 13,
93
  "proxy_scored_task_count": 0,
94
- "scoreless_task_count": 7,
95
- "unsupported_task_count": 7,
96
  "not_evaluated_task_count": 0,
97
  "status_counts": {
98
- "not_supported_by_metadata_only_package": 7,
99
- "scored": 13
100
  },
101
- "coverage_fraction": 0.65,
102
  "result_record_fraction": 1.0
103
  },
104
  {
@@ -264,7 +264,7 @@
264
  "task_id": "timeline_action",
265
  "task_label": "Action Recognition",
266
  "series_id": "metadata128_simple",
267
- "method": "128ep Metadata Simple",
268
  "status": "scored",
269
  "status_label": "scored",
270
  "scored": true,
@@ -274,7 +274,7 @@
274
  "normalized_score": 0.008252821966746326,
275
  "metric_key": "macro_f1",
276
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
277
- "scope": "multi_episode_128_metadata_baseline",
278
  "reason": null
279
  },
280
  {
@@ -282,7 +282,7 @@
282
  "task_id": "timeline_action",
283
  "task_label": "Action Recognition",
284
  "series_id": "metadata128_neural_mlp",
285
- "method": "128ep Metadata NN",
286
  "status": "scored",
287
  "status_label": "scored",
288
  "scored": true,
@@ -292,7 +292,7 @@
292
  "normalized_score": 0.004175793689174209,
293
  "metric_key": "macro_f1",
294
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
295
- "scope": "multi_episode_128_metadata_baseline",
296
  "reason": null
297
  },
298
  {
@@ -426,7 +426,7 @@
426
  "task_id": "timeline_subtask",
427
  "task_label": "Procedure Step Recognition",
428
  "series_id": "metadata128_simple",
429
- "method": "128ep Metadata Simple",
430
  "status": "scored",
431
  "status_label": "scored",
432
  "scored": true,
@@ -436,7 +436,7 @@
436
  "normalized_score": 0.00019512195121951218,
437
  "metric_key": "macro_f1",
438
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
439
- "scope": "multi_episode_128_metadata_baseline",
440
  "reason": null
441
  },
442
  {
@@ -444,7 +444,7 @@
444
  "task_id": "timeline_subtask",
445
  "task_label": "Procedure Step Recognition",
446
  "series_id": "metadata128_neural_mlp",
447
- "method": "128ep Metadata NN",
448
  "status": "scored",
449
  "status_label": "scored",
450
  "scored": true,
@@ -454,7 +454,7 @@
454
  "normalized_score": 7.207207207207208e-05,
455
  "metric_key": "macro_f1",
456
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
457
- "scope": "multi_episode_128_metadata_baseline",
458
  "reason": null
459
  },
460
  {
@@ -588,7 +588,7 @@
588
  "task_id": "transition_detection",
589
  "task_label": "Action Boundary Detection",
590
  "series_id": "metadata128_simple",
591
- "method": "128ep Metadata Simple",
592
  "status": "scored",
593
  "status_label": "scored",
594
  "scored": true,
@@ -598,7 +598,7 @@
598
  "normalized_score": 0.29652162550029315,
599
  "metric_key": "macro_f1",
600
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
601
- "scope": "multi_episode_128_metadata_baseline",
602
  "reason": null
603
  },
604
  {
@@ -606,7 +606,7 @@
606
  "task_id": "transition_detection",
607
  "task_label": "Action Boundary Detection",
608
  "series_id": "metadata128_neural_mlp",
609
- "method": "128ep Metadata NN",
610
  "status": "scored",
611
  "status_label": "scored",
612
  "scored": true,
@@ -616,7 +616,7 @@
616
  "normalized_score": 0.4841733292368365,
617
  "metric_key": "macro_f1",
618
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
619
- "scope": "multi_episode_128_metadata_baseline",
620
  "reason": null
621
  },
622
  {
@@ -750,7 +750,7 @@
750
  "task_id": "next_action",
751
  "task_label": "Next-Action Prediction",
752
  "series_id": "metadata128_simple",
753
- "method": "128ep Metadata Simple",
754
  "status": "scored",
755
  "status_label": "scored",
756
  "scored": true,
@@ -760,7 +760,7 @@
760
  "normalized_score": 0.006514774539765508,
761
  "metric_key": "macro_f1",
762
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
763
- "scope": "multi_episode_128_metadata_baseline",
764
  "reason": null
765
  },
766
  {
@@ -768,7 +768,7 @@
768
  "task_id": "next_action",
769
  "task_label": "Next-Action Prediction",
770
  "series_id": "metadata128_neural_mlp",
771
- "method": "128ep Metadata NN",
772
  "status": "scored",
773
  "status_label": "scored",
774
  "scored": true,
@@ -778,7 +778,7 @@
778
  "normalized_score": 0.004910507980164745,
779
  "metric_key": "macro_f1",
780
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
781
- "scope": "multi_episode_128_metadata_baseline",
782
  "reason": null
783
  },
784
  {
@@ -912,36 +912,36 @@
912
  "task_id": "hand_trajectory_forecast",
913
  "task_label": "Hand Trajectory Forecasting",
914
  "series_id": "metadata128_simple",
915
- "method": "128ep Metadata Simple",
916
- "status": "unsupported_without_required_target",
917
- "status_label": "unsupported",
918
- "scored": false,
919
  "proxy_scored": false,
920
- "raw": null,
921
- "raw_text": "n/a",
922
- "normalized_score": null,
923
  "metric_key": "mpjpe",
924
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
925
- "scope": "multi_episode_128_metadata_baseline",
926
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package"
927
  },
928
  {
929
  "task_number": 5,
930
  "task_id": "hand_trajectory_forecast",
931
  "task_label": "Hand Trajectory Forecasting",
932
  "series_id": "metadata128_neural_mlp",
933
- "method": "128ep Metadata NN",
934
- "status": "not_supported_by_metadata_only_package",
935
- "status_label": "not supported",
936
- "scored": false,
937
  "proxy_scored": false,
938
- "raw": null,
939
- "raw_text": "n/a",
940
- "normalized_score": null,
941
  "metric_key": "mpjpe",
942
- "source": null,
943
- "scope": "multi_episode_128_metadata_baseline",
944
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
945
  },
946
  {
947
  "task_number": 5,
@@ -1074,7 +1074,7 @@
1074
  "task_id": "contact_prediction",
1075
  "task_label": "Contact State Prediction",
1076
  "series_id": "metadata128_simple",
1077
- "method": "128ep Metadata Simple",
1078
  "status": "scored",
1079
  "status_label": "scored",
1080
  "scored": true,
@@ -1084,7 +1084,7 @@
1084
  "normalized_score": 0.4381481308057444,
1085
  "metric_key": "macro_f1",
1086
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
1087
- "scope": "multi_episode_128_metadata_baseline",
1088
  "reason": null
1089
  },
1090
  {
@@ -1092,7 +1092,7 @@
1092
  "task_id": "contact_prediction",
1093
  "task_label": "Contact State Prediction",
1094
  "series_id": "metadata128_neural_mlp",
1095
- "method": "128ep Metadata NN",
1096
  "status": "scored",
1097
  "status_label": "scored",
1098
  "scored": true,
@@ -1102,7 +1102,7 @@
1102
  "normalized_score": 0.5682695682695682,
1103
  "metric_key": "macro_f1",
1104
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
1105
- "scope": "multi_episode_128_metadata_baseline",
1106
  "reason": null
1107
  },
1108
  {
@@ -1236,7 +1236,7 @@
1236
  "task_id": "object_relevance",
1237
  "task_label": "Object Relevance Prediction",
1238
  "series_id": "metadata128_simple",
1239
- "method": "128ep Metadata Simple",
1240
  "status": "scored",
1241
  "status_label": "scored",
1242
  "scored": true,
@@ -1246,7 +1246,7 @@
1246
  "normalized_score": 0.17764578833693304,
1247
  "metric_key": "micro_f1",
1248
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
1249
- "scope": "multi_episode_128_metadata_baseline",
1250
  "reason": null
1251
  },
1252
  {
@@ -1254,7 +1254,7 @@
1254
  "task_id": "object_relevance",
1255
  "task_label": "Object Relevance Prediction",
1256
  "series_id": "metadata128_neural_mlp",
1257
- "method": "128ep Metadata NN",
1258
  "status": "scored",
1259
  "status_label": "scored",
1260
  "scored": true,
@@ -1264,7 +1264,7 @@
1264
  "normalized_score": 0.18662723837686876,
1265
  "metric_key": "micro_f1",
1266
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
1267
- "scope": "multi_episode_128_metadata_baseline",
1268
  "reason": null
1269
  },
1270
  {
@@ -1398,7 +1398,7 @@
1398
  "task_id": "caption_grounding",
1399
  "task_label": "Language Grounding",
1400
  "series_id": "metadata128_simple",
1401
- "method": "128ep Metadata Simple",
1402
  "status": "scored",
1403
  "status_label": "scored",
1404
  "scored": true,
@@ -1408,7 +1408,7 @@
1408
  "normalized_score": 0.002332374220713973,
1409
  "metric_key": "mrr",
1410
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1411
- "scope": "multi_episode_128_metadata_baseline",
1412
  "reason": null
1413
  },
1414
  {
@@ -1416,7 +1416,7 @@
1416
  "task_id": "caption_grounding",
1417
  "task_label": "Language Grounding",
1418
  "series_id": "metadata128_neural_mlp",
1419
- "method": "128ep Metadata NN",
1420
  "status": "scored",
1421
  "status_label": "scored",
1422
  "scored": true,
@@ -1426,7 +1426,7 @@
1426
  "normalized_score": 0.008236799389123917,
1427
  "metric_key": "mrr",
1428
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1429
- "scope": "multi_episode_128_metadata_baseline",
1430
  "reason": null
1431
  },
1432
  {
@@ -1560,36 +1560,36 @@
1560
  "task_id": "cross_modal_retrieval",
1561
  "task_label": "Cross-Modal Retrieval",
1562
  "series_id": "metadata128_simple",
1563
- "method": "128ep Metadata Simple",
1564
- "status": "unsupported_without_required_target",
1565
- "status_label": "unsupported",
1566
- "scored": false,
1567
  "proxy_scored": false,
1568
- "raw": null,
1569
- "raw_text": "n/a",
1570
- "normalized_score": null,
1571
  "metric_key": "mrr",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1573
- "scope": "multi_episode_128_metadata_baseline",
1574
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package"
1575
  },
1576
  {
1577
  "task_number": 9,
1578
  "task_id": "cross_modal_retrieval",
1579
  "task_label": "Cross-Modal Retrieval",
1580
  "series_id": "metadata128_neural_mlp",
1581
- "method": "128ep Metadata NN",
1582
- "status": "not_supported_by_metadata_only_package",
1583
- "status_label": "not supported",
1584
- "scored": false,
1585
  "proxy_scored": false,
1586
- "raw": null,
1587
- "raw_text": "n/a",
1588
- "normalized_score": null,
1589
  "metric_key": "mrr",
1590
- "source": null,
1591
- "scope": "multi_episode_128_metadata_baseline",
1592
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
1593
  },
1594
  {
1595
  "task_number": 9,
@@ -1722,36 +1722,36 @@
1722
  "task_id": "modality_reconstruction",
1723
  "task_label": "Cross-Modal Reconstruction",
1724
  "series_id": "metadata128_simple",
1725
- "method": "128ep Metadata Simple",
1726
- "status": "unsupported_without_required_target",
1727
- "status_label": "unsupported",
1728
- "scored": false,
1729
  "proxy_scored": false,
1730
- "raw": null,
1731
- "raw_text": "n/a",
1732
- "normalized_score": null,
1733
  "metric_key": "r2",
1734
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1735
- "scope": "multi_episode_128_metadata_baseline",
1736
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package"
1737
  },
1738
  {
1739
  "task_number": 10,
1740
  "task_id": "modality_reconstruction",
1741
  "task_label": "Cross-Modal Reconstruction",
1742
  "series_id": "metadata128_neural_mlp",
1743
- "method": "128ep Metadata NN",
1744
- "status": "not_supported_by_metadata_only_package",
1745
- "status_label": "not supported",
1746
- "scored": false,
1747
  "proxy_scored": false,
1748
- "raw": null,
1749
- "raw_text": "n/a",
1750
- "normalized_score": null,
1751
  "metric_key": "r2",
1752
- "source": null,
1753
- "scope": "multi_episode_128_metadata_baseline",
1754
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
1755
  },
1756
  {
1757
  "task_number": 10,
@@ -1884,7 +1884,7 @@
1884
  "task_id": "temporal_order",
1885
  "task_label": "Temporal Order Verification",
1886
  "series_id": "metadata128_simple",
1887
- "method": "128ep Metadata Simple",
1888
  "status": "scored",
1889
  "status_label": "scored",
1890
  "scored": true,
@@ -1894,7 +1894,7 @@
1894
  "normalized_score": 0.4198864140782312,
1895
  "metric_key": "f1",
1896
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1897
- "scope": "multi_episode_128_metadata_baseline",
1898
  "reason": null
1899
  },
1900
  {
@@ -1902,7 +1902,7 @@
1902
  "task_id": "temporal_order",
1903
  "task_label": "Temporal Order Verification",
1904
  "series_id": "metadata128_neural_mlp",
1905
- "method": "128ep Metadata NN",
1906
  "status": "scored",
1907
  "status_label": "scored",
1908
  "scored": true,
@@ -1912,7 +1912,7 @@
1912
  "normalized_score": 0.8252408266656923,
1913
  "metric_key": "f1",
1914
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1915
- "scope": "multi_episode_128_metadata_baseline",
1916
  "reason": null
1917
  },
1918
  {
@@ -2046,36 +2046,36 @@
2046
  "task_id": "misalignment_detection",
2047
  "task_label": "Multimodal Synchronization Detection",
2048
  "series_id": "metadata128_simple",
2049
- "method": "128ep Metadata Simple",
2050
- "status": "unsupported_without_required_target",
2051
- "status_label": "unsupported",
2052
- "scored": false,
2053
  "proxy_scored": false,
2054
- "raw": null,
2055
- "raw_text": "n/a",
2056
- "normalized_score": null,
2057
  "metric_key": "f1",
2058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
2059
- "scope": "multi_episode_128_metadata_baseline",
2060
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone"
2061
  },
2062
  {
2063
  "task_number": 12,
2064
  "task_id": "misalignment_detection",
2065
  "task_label": "Multimodal Synchronization Detection",
2066
  "series_id": "metadata128_neural_mlp",
2067
- "method": "128ep Metadata NN",
2068
- "status": "not_supported_by_metadata_only_package",
2069
- "status_label": "not supported",
2070
- "scored": false,
2071
  "proxy_scored": false,
2072
- "raw": null,
2073
- "raw_text": "n/a",
2074
- "normalized_score": null,
2075
  "metric_key": "f1",
2076
- "source": null,
2077
- "scope": "multi_episode_128_metadata_baseline",
2078
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2079
  },
2080
  {
2081
  "task_number": 12,
@@ -2208,7 +2208,7 @@
2208
  "task_id": "long_horizon_next_action",
2209
  "task_label": "Long-Horizon Next-Action Forecasting",
2210
  "series_id": "metadata128_simple",
2211
- "method": "128ep Metadata Simple",
2212
  "status": "scored",
2213
  "status_label": "scored",
2214
  "scored": true,
@@ -2218,7 +2218,7 @@
2218
  "normalized_score": 0.004579592783699693,
2219
  "metric_key": "macro_f1",
2220
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
2221
- "scope": "multi_episode_128_metadata_baseline",
2222
  "reason": null
2223
  },
2224
  {
@@ -2226,7 +2226,7 @@
2226
  "task_id": "long_horizon_next_action",
2227
  "task_label": "Long-Horizon Next-Action Forecasting",
2228
  "series_id": "metadata128_neural_mlp",
2229
- "method": "128ep Metadata NN",
2230
  "status": "scored",
2231
  "status_label": "scored",
2232
  "scored": true,
@@ -2236,7 +2236,7 @@
2236
  "normalized_score": 0.0029821307969142615,
2237
  "metric_key": "macro_f1",
2238
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
2239
- "scope": "multi_episode_128_metadata_baseline",
2240
  "reason": null
2241
  },
2242
  {
@@ -2370,7 +2370,7 @@
2370
  "task_id": "next_subtask_forecast",
2371
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2372
  "series_id": "metadata128_simple",
2373
- "method": "128ep Metadata Simple",
2374
  "status": "scored",
2375
  "status_label": "scored",
2376
  "scored": true,
@@ -2380,7 +2380,7 @@
2380
  "normalized_score": 0.0001206030150753769,
2381
  "metric_key": "macro_f1",
2382
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
2383
- "scope": "multi_episode_128_metadata_baseline",
2384
  "reason": null
2385
  },
2386
  {
@@ -2388,7 +2388,7 @@
2388
  "task_id": "next_subtask_forecast",
2389
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2390
  "series_id": "metadata128_neural_mlp",
2391
- "method": "128ep Metadata NN",
2392
  "status": "scored",
2393
  "status_label": "scored",
2394
  "scored": true,
@@ -2398,7 +2398,7 @@
2398
  "normalized_score": 2.086049543676662e-05,
2399
  "metric_key": "macro_f1",
2400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
2401
- "scope": "multi_episode_128_metadata_baseline",
2402
  "reason": null
2403
  },
2404
  {
@@ -2532,7 +2532,7 @@
2532
  "task_id": "interaction_text_prediction",
2533
  "task_label": "Interaction Text Prediction",
2534
  "series_id": "metadata128_simple",
2535
- "method": "128ep Metadata Simple",
2536
  "status": "unsupported_without_required_target",
2537
  "status_label": "unsupported",
2538
  "scored": false,
@@ -2542,7 +2542,7 @@
2542
  "normalized_score": null,
2543
  "metric_key": "macro_f1",
2544
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
2545
- "scope": "multi_episode_128_metadata_baseline",
2546
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
2547
  },
2548
  {
@@ -2550,7 +2550,7 @@
2550
  "task_id": "interaction_text_prediction",
2551
  "task_label": "Interaction Text Prediction",
2552
  "series_id": "metadata128_neural_mlp",
2553
- "method": "128ep Metadata NN",
2554
  "status": "not_supported_by_metadata_only_package",
2555
  "status_label": "not supported",
2556
  "scored": false,
@@ -2560,8 +2560,8 @@
2560
  "normalized_score": null,
2561
  "metric_key": "macro_f1",
2562
  "source": null,
2563
- "scope": "multi_episode_128_metadata_baseline",
2564
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
2565
  },
2566
  {
2567
  "task_number": 15,
@@ -2694,7 +2694,7 @@
2694
  "task_id": "action_object_relation",
2695
  "task_label": "Action-Object Relation Prediction",
2696
  "series_id": "metadata128_simple",
2697
- "method": "128ep Metadata Simple",
2698
  "status": "scored",
2699
  "status_label": "scored",
2700
  "scored": true,
@@ -2704,7 +2704,7 @@
2704
  "normalized_score": 0.0,
2705
  "metric_key": "macro_f1",
2706
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
2707
- "scope": "multi_episode_128_metadata_baseline",
2708
  "reason": null
2709
  },
2710
  {
@@ -2712,7 +2712,7 @@
2712
  "task_id": "action_object_relation",
2713
  "task_label": "Action-Object Relation Prediction",
2714
  "series_id": "metadata128_neural_mlp",
2715
- "method": "128ep Metadata NN",
2716
  "status": "scored",
2717
  "status_label": "scored",
2718
  "scored": true,
@@ -2722,7 +2722,7 @@
2722
  "normalized_score": 0.0,
2723
  "metric_key": "macro_f1",
2724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
2725
- "scope": "multi_episode_128_metadata_baseline",
2726
  "reason": null
2727
  },
2728
  {
@@ -2856,7 +2856,7 @@
2856
  "task_id": "object_set_forecast",
2857
  "task_label": "Future Object-Set Forecasting",
2858
  "series_id": "metadata128_simple",
2859
- "method": "128ep Metadata Simple",
2860
  "status": "scored",
2861
  "status_label": "scored",
2862
  "scored": true,
@@ -2866,7 +2866,7 @@
2866
  "normalized_score": 0.17656983343047333,
2867
  "metric_key": "micro_f1",
2868
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2869
- "scope": "multi_episode_128_metadata_baseline",
2870
  "reason": null
2871
  },
2872
  {
@@ -2874,7 +2874,7 @@
2874
  "task_id": "object_set_forecast",
2875
  "task_label": "Future Object-Set Forecasting",
2876
  "series_id": "metadata128_neural_mlp",
2877
- "method": "128ep Metadata NN",
2878
  "status": "scored",
2879
  "status_label": "scored",
2880
  "scored": true,
@@ -2884,7 +2884,7 @@
2884
  "normalized_score": 0.17418550827844048,
2885
  "metric_key": "micro_f1",
2886
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2887
- "scope": "multi_episode_128_metadata_baseline",
2888
  "reason": null
2889
  },
2890
  {
@@ -3018,36 +3018,36 @@
3018
  "task_id": "imu_to_hand_pose",
3019
  "task_label": "IMU-to-Hand Pose Reconstruction",
3020
  "series_id": "metadata128_simple",
3021
- "method": "128ep Metadata Simple",
3022
- "status": "unsupported_without_required_target",
3023
- "status_label": "unsupported",
3024
- "scored": false,
3025
  "proxy_scored": false,
3026
- "raw": null,
3027
- "raw_text": "n/a",
3028
- "normalized_score": null,
3029
  "metric_key": "mae",
3030
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
3031
- "scope": "multi_episode_128_metadata_baseline",
3032
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
3033
  },
3034
  {
3035
  "task_number": 18,
3036
  "task_id": "imu_to_hand_pose",
3037
  "task_label": "IMU-to-Hand Pose Reconstruction",
3038
  "series_id": "metadata128_neural_mlp",
3039
- "method": "128ep Metadata NN",
3040
- "status": "not_supported_by_metadata_only_package",
3041
- "status_label": "not supported",
3042
- "scored": false,
3043
  "proxy_scored": false,
3044
- "raw": null,
3045
- "raw_text": "n/a",
3046
- "normalized_score": null,
3047
  "metric_key": "mae",
3048
- "source": null,
3049
- "scope": "multi_episode_128_metadata_baseline",
3050
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3051
  },
3052
  {
3053
  "task_number": 18,
@@ -3180,7 +3180,7 @@
3180
  "task_id": "camera_view_sync_retrieval",
3181
  "task_label": "Camera-View Synchronization Retrieval",
3182
  "series_id": "metadata128_simple",
3183
- "method": "128ep Metadata Simple",
3184
  "status": "unsupported_without_required_target",
3185
  "status_label": "unsupported",
3186
  "scored": false,
@@ -3190,7 +3190,7 @@
3190
  "normalized_score": null,
3191
  "metric_key": "mrr",
3192
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
3193
- "scope": "multi_episode_128_metadata_baseline",
3194
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
3195
  },
3196
  {
@@ -3198,7 +3198,7 @@
3198
  "task_id": "camera_view_sync_retrieval",
3199
  "task_label": "Camera-View Synchronization Retrieval",
3200
  "series_id": "metadata128_neural_mlp",
3201
- "method": "128ep Metadata NN",
3202
  "status": "not_supported_by_metadata_only_package",
3203
  "status_label": "not supported",
3204
  "scored": false,
@@ -3208,8 +3208,8 @@
3208
  "normalized_score": null,
3209
  "metric_key": "mrr",
3210
  "source": null,
3211
- "scope": "multi_episode_128_metadata_baseline",
3212
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3213
  },
3214
  {
3215
  "task_number": 19,
@@ -3342,7 +3342,7 @@
3342
  "task_id": "time_to_transition",
3343
  "task_label": "Time-to-Next-Transition Regression",
3344
  "series_id": "metadata128_simple",
3345
- "method": "128ep Metadata Simple",
3346
  "status": "scored",
3347
  "status_label": "scored",
3348
  "scored": true,
@@ -3352,7 +3352,7 @@
3352
  "normalized_score": 0.016864874132806403,
3353
  "metric_key": "mae",
3354
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
3355
- "scope": "multi_episode_128_metadata_baseline",
3356
  "reason": null
3357
  },
3358
  {
@@ -3360,7 +3360,7 @@
3360
  "task_id": "time_to_transition",
3361
  "task_label": "Time-to-Next-Transition Regression",
3362
  "series_id": "metadata128_neural_mlp",
3363
- "method": "128ep Metadata NN",
3364
  "status": "scored",
3365
  "status_label": "scored",
3366
  "scored": true,
@@ -3370,7 +3370,7 @@
3370
  "normalized_score": 0.25411768748242325,
3371
  "metric_key": "mae",
3372
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
3373
- "scope": "multi_episode_128_metadata_baseline",
3374
  "reason": null
3375
  },
3376
  {
 
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 143,
9
  "series": [
10
  {
11
  "id": "minimal",
 
55
  },
56
  {
57
  "id": "metadata128_simple",
58
+ "label": "128ep Aligned Simple",
59
  "short_label": "128-S",
60
  "color": "#ffd166",
61
+ "kind": "partial_128_episode_aligned_baseline",
62
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
63
  "stroke_dasharray": "9 6",
64
+ "method_detail": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
65
  "plotted_as": "colored point overlay",
66
  "result_record_count": 20,
67
+ "scored_task_count": 18,
68
+ "covered_task_count": 18,
69
  "proxy_scored_task_count": 0,
70
+ "scoreless_task_count": 2,
71
+ "unsupported_task_count": 2,
72
  "not_evaluated_task_count": 0,
73
  "status_counts": {
74
+ "scored": 18,
75
+ "unsupported_without_required_target": 2
76
  },
77
+ "coverage_fraction": 0.9,
78
  "result_record_fraction": 1.0
79
  },
80
  {
81
  "id": "metadata128_neural_mlp",
82
+ "label": "128ep Aligned NN",
83
  "short_label": "128-NN",
84
  "color": "#f472b6",
85
+ "kind": "partial_128_episode_aligned_baseline",
86
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
87
  "stroke_dasharray": "3 6",
88
+ "method_detail": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
89
  "plotted_as": "colored point overlay",
90
  "result_record_count": 20,
91
+ "scored_task_count": 18,
92
+ "covered_task_count": 18,
93
  "proxy_scored_task_count": 0,
94
+ "scoreless_task_count": 2,
95
+ "unsupported_task_count": 2,
96
  "not_evaluated_task_count": 0,
97
  "status_counts": {
98
+ "not_supported_by_metadata_only_package": 2,
99
+ "scored": 18
100
  },
101
+ "coverage_fraction": 0.9,
102
  "result_record_fraction": 1.0
103
  },
104
  {
 
264
  "task_id": "timeline_action",
265
  "task_label": "Action Recognition",
266
  "series_id": "metadata128_simple",
267
+ "method": "128ep Aligned Simple",
268
  "status": "scored",
269
  "status_label": "scored",
270
  "scored": true,
 
274
  "normalized_score": 0.008252821966746326,
275
  "metric_key": "macro_f1",
276
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
277
+ "scope": "multi_episode_128_aligned_baseline",
278
  "reason": null
279
  },
280
  {
 
282
  "task_id": "timeline_action",
283
  "task_label": "Action Recognition",
284
  "series_id": "metadata128_neural_mlp",
285
+ "method": "128ep Aligned NN",
286
  "status": "scored",
287
  "status_label": "scored",
288
  "scored": true,
 
292
  "normalized_score": 0.004175793689174209,
293
  "metric_key": "macro_f1",
294
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
295
+ "scope": "multi_episode_128_aligned_baseline",
296
  "reason": null
297
  },
298
  {
 
426
  "task_id": "timeline_subtask",
427
  "task_label": "Procedure Step Recognition",
428
  "series_id": "metadata128_simple",
429
+ "method": "128ep Aligned Simple",
430
  "status": "scored",
431
  "status_label": "scored",
432
  "scored": true,
 
436
  "normalized_score": 0.00019512195121951218,
437
  "metric_key": "macro_f1",
438
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
439
+ "scope": "multi_episode_128_aligned_baseline",
440
  "reason": null
441
  },
442
  {
 
444
  "task_id": "timeline_subtask",
445
  "task_label": "Procedure Step Recognition",
446
  "series_id": "metadata128_neural_mlp",
447
+ "method": "128ep Aligned NN",
448
  "status": "scored",
449
  "status_label": "scored",
450
  "scored": true,
 
454
  "normalized_score": 7.207207207207208e-05,
455
  "metric_key": "macro_f1",
456
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
457
+ "scope": "multi_episode_128_aligned_baseline",
458
  "reason": null
459
  },
460
  {
 
588
  "task_id": "transition_detection",
589
  "task_label": "Action Boundary Detection",
590
  "series_id": "metadata128_simple",
591
+ "method": "128ep Aligned Simple",
592
  "status": "scored",
593
  "status_label": "scored",
594
  "scored": true,
 
598
  "normalized_score": 0.29652162550029315,
599
  "metric_key": "macro_f1",
600
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
601
+ "scope": "multi_episode_128_aligned_baseline",
602
  "reason": null
603
  },
604
  {
 
606
  "task_id": "transition_detection",
607
  "task_label": "Action Boundary Detection",
608
  "series_id": "metadata128_neural_mlp",
609
+ "method": "128ep Aligned NN",
610
  "status": "scored",
611
  "status_label": "scored",
612
  "scored": true,
 
616
  "normalized_score": 0.4841733292368365,
617
  "metric_key": "macro_f1",
618
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
619
+ "scope": "multi_episode_128_aligned_baseline",
620
  "reason": null
621
  },
622
  {
 
750
  "task_id": "next_action",
751
  "task_label": "Next-Action Prediction",
752
  "series_id": "metadata128_simple",
753
+ "method": "128ep Aligned Simple",
754
  "status": "scored",
755
  "status_label": "scored",
756
  "scored": true,
 
760
  "normalized_score": 0.006514774539765508,
761
  "metric_key": "macro_f1",
762
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
763
+ "scope": "multi_episode_128_aligned_baseline",
764
  "reason": null
765
  },
766
  {
 
768
  "task_id": "next_action",
769
  "task_label": "Next-Action Prediction",
770
  "series_id": "metadata128_neural_mlp",
771
+ "method": "128ep Aligned NN",
772
  "status": "scored",
773
  "status_label": "scored",
774
  "scored": true,
 
778
  "normalized_score": 0.004910507980164745,
779
  "metric_key": "macro_f1",
780
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
781
+ "scope": "multi_episode_128_aligned_baseline",
782
  "reason": null
783
  },
784
  {
 
912
  "task_id": "hand_trajectory_forecast",
913
  "task_label": "Hand Trajectory Forecasting",
914
  "series_id": "metadata128_simple",
915
+ "method": "128ep Aligned Simple",
916
+ "status": "scored",
917
+ "status_label": "scored",
918
+ "scored": true,
919
  "proxy_scored": false,
920
+ "raw": 8.817333221435547,
921
+ "raw_text": "8.817",
922
+ "normalized_score": 0.012231610603598841,
923
  "metric_key": "mpjpe",
924
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
925
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
926
+ "reason": null
927
  },
928
  {
929
  "task_number": 5,
930
  "task_id": "hand_trajectory_forecast",
931
  "task_label": "Hand Trajectory Forecasting",
932
  "series_id": "metadata128_neural_mlp",
933
+ "method": "128ep Aligned NN",
934
+ "status": "scored",
935
+ "status_label": "scored",
936
+ "scored": true,
937
  "proxy_scored": false,
938
+ "raw": 0.429434210062027,
939
+ "raw_text": "0.4294",
940
+ "normalized_score": 0.25114484128127007,
941
  "metric_key": "mpjpe",
942
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
943
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
944
+ "reason": null
945
  },
946
  {
947
  "task_number": 5,
 
1074
  "task_id": "contact_prediction",
1075
  "task_label": "Contact State Prediction",
1076
  "series_id": "metadata128_simple",
1077
+ "method": "128ep Aligned Simple",
1078
  "status": "scored",
1079
  "status_label": "scored",
1080
  "scored": true,
 
1084
  "normalized_score": 0.4381481308057444,
1085
  "metric_key": "macro_f1",
1086
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
1087
+ "scope": "multi_episode_128_aligned_baseline",
1088
  "reason": null
1089
  },
1090
  {
 
1092
  "task_id": "contact_prediction",
1093
  "task_label": "Contact State Prediction",
1094
  "series_id": "metadata128_neural_mlp",
1095
+ "method": "128ep Aligned NN",
1096
  "status": "scored",
1097
  "status_label": "scored",
1098
  "scored": true,
 
1102
  "normalized_score": 0.5682695682695682,
1103
  "metric_key": "macro_f1",
1104
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
1105
+ "scope": "multi_episode_128_aligned_baseline",
1106
  "reason": null
1107
  },
1108
  {
 
1236
  "task_id": "object_relevance",
1237
  "task_label": "Object Relevance Prediction",
1238
  "series_id": "metadata128_simple",
1239
+ "method": "128ep Aligned Simple",
1240
  "status": "scored",
1241
  "status_label": "scored",
1242
  "scored": true,
 
1246
  "normalized_score": 0.17764578833693304,
1247
  "metric_key": "micro_f1",
1248
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
1249
+ "scope": "multi_episode_128_aligned_baseline",
1250
  "reason": null
1251
  },
1252
  {
 
1254
  "task_id": "object_relevance",
1255
  "task_label": "Object Relevance Prediction",
1256
  "series_id": "metadata128_neural_mlp",
1257
+ "method": "128ep Aligned NN",
1258
  "status": "scored",
1259
  "status_label": "scored",
1260
  "scored": true,
 
1264
  "normalized_score": 0.18662723837686876,
1265
  "metric_key": "micro_f1",
1266
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
1267
+ "scope": "multi_episode_128_aligned_baseline",
1268
  "reason": null
1269
  },
1270
  {
 
1398
  "task_id": "caption_grounding",
1399
  "task_label": "Language Grounding",
1400
  "series_id": "metadata128_simple",
1401
+ "method": "128ep Aligned Simple",
1402
  "status": "scored",
1403
  "status_label": "scored",
1404
  "scored": true,
 
1408
  "normalized_score": 0.002332374220713973,
1409
  "metric_key": "mrr",
1410
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1411
+ "scope": "multi_episode_128_aligned_baseline",
1412
  "reason": null
1413
  },
1414
  {
 
1416
  "task_id": "caption_grounding",
1417
  "task_label": "Language Grounding",
1418
  "series_id": "metadata128_neural_mlp",
1419
+ "method": "128ep Aligned NN",
1420
  "status": "scored",
1421
  "status_label": "scored",
1422
  "scored": true,
 
1426
  "normalized_score": 0.008236799389123917,
1427
  "metric_key": "mrr",
1428
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1429
+ "scope": "multi_episode_128_aligned_baseline",
1430
  "reason": null
1431
  },
1432
  {
 
1560
  "task_id": "cross_modal_retrieval",
1561
  "task_label": "Cross-Modal Retrieval",
1562
  "series_id": "metadata128_simple",
1563
+ "method": "128ep Aligned Simple",
1564
+ "status": "scored",
1565
+ "status_label": "scored",
1566
+ "scored": true,
1567
  "proxy_scored": false,
1568
+ "raw": 0.002587692579254508,
1569
+ "raw_text": "0.0026",
1570
+ "normalized_score": 0.002587692579254508,
1571
  "metric_key": "mrr",
1572
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1573
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1574
+ "reason": null
1575
  },
1576
  {
1577
  "task_number": 9,
1578
  "task_id": "cross_modal_retrieval",
1579
  "task_label": "Cross-Modal Retrieval",
1580
  "series_id": "metadata128_neural_mlp",
1581
+ "method": "128ep Aligned NN",
1582
+ "status": "scored",
1583
+ "status_label": "scored",
1584
+ "scored": true,
1585
  "proxy_scored": false,
1586
+ "raw": 0.0026067993603646755,
1587
+ "raw_text": "0.0026",
1588
+ "normalized_score": 0.0026067993603646755,
1589
  "metric_key": "mrr",
1590
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
1591
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1592
+ "reason": null
1593
  },
1594
  {
1595
  "task_number": 9,
 
1722
  "task_id": "modality_reconstruction",
1723
  "task_label": "Cross-Modal Reconstruction",
1724
  "series_id": "metadata128_simple",
1725
+ "method": "128ep Aligned Simple",
1726
+ "status": "scored",
1727
+ "status_label": "scored",
1728
+ "scored": true,
1729
  "proxy_scored": false,
1730
+ "raw": -190.66106203944798,
1731
+ "raw_text": "-190.66",
1732
+ "normalized_score": 0.0,
1733
  "metric_key": "r2",
1734
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1735
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1736
+ "reason": null
1737
  },
1738
  {
1739
  "task_number": 10,
1740
  "task_id": "modality_reconstruction",
1741
  "task_label": "Cross-Modal Reconstruction",
1742
  "series_id": "metadata128_neural_mlp",
1743
+ "method": "128ep Aligned NN",
1744
+ "status": "scored",
1745
+ "status_label": "scored",
1746
+ "scored": true,
1747
  "proxy_scored": false,
1748
+ "raw": -0.43481132003942147,
1749
+ "raw_text": "-0.4348",
1750
+ "normalized_score": 0.0,
1751
  "metric_key": "r2",
1752
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
1753
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1754
+ "reason": null
1755
  },
1756
  {
1757
  "task_number": 10,
 
1884
  "task_id": "temporal_order",
1885
  "task_label": "Temporal Order Verification",
1886
  "series_id": "metadata128_simple",
1887
+ "method": "128ep Aligned Simple",
1888
  "status": "scored",
1889
  "status_label": "scored",
1890
  "scored": true,
 
1894
  "normalized_score": 0.4198864140782312,
1895
  "metric_key": "f1",
1896
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1897
+ "scope": "multi_episode_128_aligned_baseline",
1898
  "reason": null
1899
  },
1900
  {
 
1902
  "task_id": "temporal_order",
1903
  "task_label": "Temporal Order Verification",
1904
  "series_id": "metadata128_neural_mlp",
1905
+ "method": "128ep Aligned NN",
1906
  "status": "scored",
1907
  "status_label": "scored",
1908
  "scored": true,
 
1912
  "normalized_score": 0.8252408266656923,
1913
  "metric_key": "f1",
1914
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1915
+ "scope": "multi_episode_128_aligned_baseline",
1916
  "reason": null
1917
  },
1918
  {
 
2046
  "task_id": "misalignment_detection",
2047
  "task_label": "Multimodal Synchronization Detection",
2048
  "series_id": "metadata128_simple",
2049
+ "method": "128ep Aligned Simple",
2050
+ "status": "scored",
2051
+ "status_label": "scored",
2052
+ "scored": true,
2053
  "proxy_scored": false,
2054
+ "raw": 0.49980060227663614,
2055
+ "raw_text": "0.4998",
2056
+ "normalized_score": 0.49980060227663614,
2057
  "metric_key": "f1",
2058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
2059
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2060
+ "reason": null
2061
  },
2062
  {
2063
  "task_number": 12,
2064
  "task_id": "misalignment_detection",
2065
  "task_label": "Multimodal Synchronization Detection",
2066
  "series_id": "metadata128_neural_mlp",
2067
+ "method": "128ep Aligned NN",
2068
+ "status": "scored",
2069
+ "status_label": "scored",
2070
+ "scored": true,
2071
  "proxy_scored": false,
2072
+ "raw": 0.7773773780941162,
2073
+ "raw_text": "0.7774",
2074
+ "normalized_score": 0.7773773780941162,
2075
  "metric_key": "f1",
2076
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
2077
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2078
+ "reason": null
2079
  },
2080
  {
2081
  "task_number": 12,
 
2208
  "task_id": "long_horizon_next_action",
2209
  "task_label": "Long-Horizon Next-Action Forecasting",
2210
  "series_id": "metadata128_simple",
2211
+ "method": "128ep Aligned Simple",
2212
  "status": "scored",
2213
  "status_label": "scored",
2214
  "scored": true,
 
2218
  "normalized_score": 0.004579592783699693,
2219
  "metric_key": "macro_f1",
2220
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
2221
+ "scope": "multi_episode_128_aligned_baseline",
2222
  "reason": null
2223
  },
2224
  {
 
2226
  "task_id": "long_horizon_next_action",
2227
  "task_label": "Long-Horizon Next-Action Forecasting",
2228
  "series_id": "metadata128_neural_mlp",
2229
+ "method": "128ep Aligned NN",
2230
  "status": "scored",
2231
  "status_label": "scored",
2232
  "scored": true,
 
2236
  "normalized_score": 0.0029821307969142615,
2237
  "metric_key": "macro_f1",
2238
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
2239
+ "scope": "multi_episode_128_aligned_baseline",
2240
  "reason": null
2241
  },
2242
  {
 
2370
  "task_id": "next_subtask_forecast",
2371
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2372
  "series_id": "metadata128_simple",
2373
+ "method": "128ep Aligned Simple",
2374
  "status": "scored",
2375
  "status_label": "scored",
2376
  "scored": true,
 
2380
  "normalized_score": 0.0001206030150753769,
2381
  "metric_key": "macro_f1",
2382
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
2383
+ "scope": "multi_episode_128_aligned_baseline",
2384
  "reason": null
2385
  },
2386
  {
 
2388
  "task_id": "next_subtask_forecast",
2389
  "task_label": "Long-Horizon Next-Subtask Forecasting",
2390
  "series_id": "metadata128_neural_mlp",
2391
+ "method": "128ep Aligned NN",
2392
  "status": "scored",
2393
  "status_label": "scored",
2394
  "scored": true,
 
2398
  "normalized_score": 2.086049543676662e-05,
2399
  "metric_key": "macro_f1",
2400
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
2401
+ "scope": "multi_episode_128_aligned_baseline",
2402
  "reason": null
2403
  },
2404
  {
 
2532
  "task_id": "interaction_text_prediction",
2533
  "task_label": "Interaction Text Prediction",
2534
  "series_id": "metadata128_simple",
2535
+ "method": "128ep Aligned Simple",
2536
  "status": "unsupported_without_required_target",
2537
  "status_label": "unsupported",
2538
  "scored": false,
 
2542
  "normalized_score": null,
2543
  "metric_key": "macro_f1",
2544
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
2545
+ "scope": "multi_episode_128_aligned_baseline",
2546
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
2547
  },
2548
  {
 
2550
  "task_id": "interaction_text_prediction",
2551
  "task_label": "Interaction Text Prediction",
2552
  "series_id": "metadata128_neural_mlp",
2553
+ "method": "128ep Aligned NN",
2554
  "status": "not_supported_by_metadata_only_package",
2555
  "status_label": "not supported",
2556
  "scored": false,
 
2560
  "normalized_score": null,
2561
  "metric_key": "macro_f1",
2562
  "source": null,
2563
+ "scope": "multi_episode_128_aligned_baseline",
2564
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
2565
  },
2566
  {
2567
  "task_number": 15,
 
2694
  "task_id": "action_object_relation",
2695
  "task_label": "Action-Object Relation Prediction",
2696
  "series_id": "metadata128_simple",
2697
+ "method": "128ep Aligned Simple",
2698
  "status": "scored",
2699
  "status_label": "scored",
2700
  "scored": true,
 
2704
  "normalized_score": 0.0,
2705
  "metric_key": "macro_f1",
2706
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
2707
+ "scope": "multi_episode_128_aligned_baseline",
2708
  "reason": null
2709
  },
2710
  {
 
2712
  "task_id": "action_object_relation",
2713
  "task_label": "Action-Object Relation Prediction",
2714
  "series_id": "metadata128_neural_mlp",
2715
+ "method": "128ep Aligned NN",
2716
  "status": "scored",
2717
  "status_label": "scored",
2718
  "scored": true,
 
2722
  "normalized_score": 0.0,
2723
  "metric_key": "macro_f1",
2724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
2725
+ "scope": "multi_episode_128_aligned_baseline",
2726
  "reason": null
2727
  },
2728
  {
 
2856
  "task_id": "object_set_forecast",
2857
  "task_label": "Future Object-Set Forecasting",
2858
  "series_id": "metadata128_simple",
2859
+ "method": "128ep Aligned Simple",
2860
  "status": "scored",
2861
  "status_label": "scored",
2862
  "scored": true,
 
2866
  "normalized_score": 0.17656983343047333,
2867
  "metric_key": "micro_f1",
2868
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2869
+ "scope": "multi_episode_128_aligned_baseline",
2870
  "reason": null
2871
  },
2872
  {
 
2874
  "task_id": "object_set_forecast",
2875
  "task_label": "Future Object-Set Forecasting",
2876
  "series_id": "metadata128_neural_mlp",
2877
+ "method": "128ep Aligned NN",
2878
  "status": "scored",
2879
  "status_label": "scored",
2880
  "scored": true,
 
2884
  "normalized_score": 0.17418550827844048,
2885
  "metric_key": "micro_f1",
2886
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2887
+ "scope": "multi_episode_128_aligned_baseline",
2888
  "reason": null
2889
  },
2890
  {
 
3018
  "task_id": "imu_to_hand_pose",
3019
  "task_label": "IMU-to-Hand Pose Reconstruction",
3020
  "series_id": "metadata128_simple",
3021
+ "method": "128ep Aligned Simple",
3022
+ "status": "scored",
3023
+ "status_label": "scored",
3024
+ "scored": true,
3025
  "proxy_scored": false,
3026
+ "raw": 0.2294670194387436,
3027
+ "raw_text": "0.2295",
3028
+ "normalized_score": 0.18324815505876868,
3029
  "metric_key": "mae",
3030
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
3031
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3032
+ "reason": null
3033
  },
3034
  {
3035
  "task_number": 18,
3036
  "task_id": "imu_to_hand_pose",
3037
  "task_label": "IMU-to-Hand Pose Reconstruction",
3038
  "series_id": "metadata128_neural_mlp",
3039
+ "method": "128ep Aligned NN",
3040
+ "status": "scored",
3041
+ "status_label": "scored",
3042
+ "scored": true,
3043
  "proxy_scored": false,
3044
+ "raw": 0.2555866539478302,
3045
+ "raw_text": "0.2556",
3046
+ "normalized_score": 0.16452114110609004,
3047
  "metric_key": "mae",
3048
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
3049
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3050
+ "reason": null
3051
  },
3052
  {
3053
  "task_number": 18,
 
3180
  "task_id": "camera_view_sync_retrieval",
3181
  "task_label": "Camera-View Synchronization Retrieval",
3182
  "series_id": "metadata128_simple",
3183
+ "method": "128ep Aligned Simple",
3184
  "status": "unsupported_without_required_target",
3185
  "status_label": "unsupported",
3186
  "scored": false,
 
3190
  "normalized_score": null,
3191
  "metric_key": "mrr",
3192
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
3193
+ "scope": "multi_episode_128_aligned_baseline",
3194
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
3195
  },
3196
  {
 
3198
  "task_id": "camera_view_sync_retrieval",
3199
  "task_label": "Camera-View Synchronization Retrieval",
3200
  "series_id": "metadata128_neural_mlp",
3201
+ "method": "128ep Aligned NN",
3202
  "status": "not_supported_by_metadata_only_package",
3203
  "status_label": "not supported",
3204
  "scored": false,
 
3208
  "normalized_score": null,
3209
  "metric_key": "mrr",
3210
  "source": null,
3211
+ "scope": "multi_episode_128_aligned_baseline",
3212
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
3213
  },
3214
  {
3215
  "task_number": 19,
 
3342
  "task_id": "time_to_transition",
3343
  "task_label": "Time-to-Next-Transition Regression",
3344
  "series_id": "metadata128_simple",
3345
+ "method": "128ep Aligned Simple",
3346
  "status": "scored",
3347
  "status_label": "scored",
3348
  "scored": true,
 
3352
  "normalized_score": 0.016864874132806403,
3353
  "metric_key": "mae",
3354
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
3355
+ "scope": "multi_episode_128_aligned_baseline",
3356
  "reason": null
3357
  },
3358
  {
 
3360
  "task_id": "time_to_transition",
3361
  "task_label": "Time-to-Next-Transition Regression",
3362
  "series_id": "metadata128_neural_mlp",
3363
+ "method": "128ep Aligned NN",
3364
  "status": "scored",
3365
  "status_label": "scored",
3366
  "scored": true,
 
3370
  "normalized_score": 0.25411768748242325,
3371
  "metric_key": "mae",
3372
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
3373
+ "scope": "multi_episode_128_aligned_baseline",
3374
  "reason": null
3375
  },
3376
  {
metrics/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:09:25+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:54:18+00:00",
4
  "summary": {
5
  "task_count": 12,
6
  "expected_task_count": 12,
metrics/unified_task_model_radar.json CHANGED
@@ -1,18 +1,18 @@
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-18T12:07:15+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
- "scored_method_task_count": 133,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
12
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
13
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
14
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
15
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
16
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
17
  },
18
  "series": [
@@ -64,50 +64,50 @@
64
  },
65
  {
66
  "id": "metadata128_simple",
67
- "label": "128ep Metadata Simple",
68
  "short_label": "128-S",
69
  "color": "#ffd166",
70
- "kind": "partial_128_episode_metadata_baseline",
71
- "scope": "128 selected episodes, JSONL metadata/text only",
72
  "stroke_dasharray": "9 6",
73
- "method_detail": "128-episode JSONL metadata/text simple baselines.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
- "scored_task_count": 13,
77
- "covered_task_count": 13,
78
  "proxy_scored_task_count": 0,
79
- "scoreless_task_count": 7,
80
- "unsupported_task_count": 7,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
- "scored": 13,
84
- "unsupported_without_required_target": 7
85
  },
86
- "coverage_fraction": 0.65,
87
  "result_record_fraction": 1.0
88
  },
89
  {
90
  "id": "metadata128_neural_mlp",
91
- "label": "128ep Metadata NN",
92
  "short_label": "128-NN",
93
  "color": "#f472b6",
94
- "kind": "partial_128_episode_metadata_baseline",
95
- "scope": "128 selected episodes, JSONL metadata/text only",
96
  "stroke_dasharray": "3 6",
97
- "method_detail": "128-episode JSONL metadata/text MLP baselines.",
98
  "plotted_as": "colored point overlay",
99
  "result_record_count": 20,
100
- "scored_task_count": 13,
101
- "covered_task_count": 13,
102
  "proxy_scored_task_count": 0,
103
- "scoreless_task_count": 7,
104
- "unsupported_task_count": 7,
105
  "not_evaluated_task_count": 0,
106
  "status_counts": {
107
- "not_supported_by_metadata_only_package": 7,
108
- "scored": 13
109
  },
110
- "coverage_fraction": 0.65,
111
  "result_record_fraction": 1.0
112
  },
113
  {
@@ -301,7 +301,7 @@
301
  "raw": 0.008252821966746326,
302
  "metric_key": "macro_f1",
303
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
304
- "scope": "multi_episode_128_metadata_baseline",
305
  "status": "scored",
306
  "reason": null,
307
  "normalized_score": 0.008252821966746326,
@@ -312,7 +312,7 @@
312
  "raw": 0.004175793689174209,
313
  "metric_key": "macro_f1",
314
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
315
- "scope": "multi_episode_128_metadata_baseline",
316
  "status": "scored",
317
  "reason": null,
318
  "normalized_score": 0.004175793689174209,
@@ -401,7 +401,7 @@
401
  "raw": 0.00019512195121951218,
402
  "metric_key": "macro_f1",
403
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
404
- "scope": "multi_episode_128_metadata_baseline",
405
  "status": "scored",
406
  "reason": null,
407
  "normalized_score": 0.00019512195121951218,
@@ -412,7 +412,7 @@
412
  "raw": 7.207207207207208e-05,
413
  "metric_key": "macro_f1",
414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
415
- "scope": "multi_episode_128_metadata_baseline",
416
  "status": "scored",
417
  "reason": null,
418
  "normalized_score": 7.207207207207208e-05,
@@ -523,7 +523,7 @@
523
  "raw": 0.29652162550029315,
524
  "metric_key": "macro_f1",
525
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
526
- "scope": "multi_episode_128_metadata_baseline",
527
  "status": "scored",
528
  "reason": null,
529
  "normalized_score": 0.29652162550029315,
@@ -534,7 +534,7 @@
534
  "raw": 0.4841733292368365,
535
  "metric_key": "macro_f1",
536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
537
- "scope": "multi_episode_128_metadata_baseline",
538
  "status": "scored",
539
  "reason": null,
540
  "normalized_score": 0.4841733292368365,
@@ -634,7 +634,7 @@
634
  "raw": 0.006514774539765508,
635
  "metric_key": "macro_f1",
636
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
637
- "scope": "multi_episode_128_metadata_baseline",
638
  "status": "scored",
639
  "reason": null,
640
  "normalized_score": 0.006514774539765508,
@@ -645,7 +645,7 @@
645
  "raw": 0.004910507980164745,
646
  "metric_key": "macro_f1",
647
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
648
- "scope": "multi_episode_128_metadata_baseline",
649
  "status": "scored",
650
  "reason": null,
651
  "normalized_score": 0.004910507980164745,
@@ -709,15 +709,26 @@
709
  "status_label": "scored"
710
  },
711
  "metadata128_simple": {
712
- "raw": null,
713
  "metric_key": "mpjpe",
714
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
715
- "scope": "multi_episode_128_metadata_baseline",
716
- "status": "unsupported_without_required_target",
717
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package",
718
- "normalized_score": null,
719
- "raw_text": "n/a",
720
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
721
  },
722
  "raw128_simple": {
723
  "raw": 0.2729249894618988,
@@ -741,17 +752,6 @@
741
  "raw_text": "0.1848",
742
  "status_label": "scored"
743
  },
744
- "metadata128_neural_mlp": {
745
- "raw": null,
746
- "metric_key": "mpjpe",
747
- "source": null,
748
- "scope": "multi_episode_128_metadata_baseline",
749
- "status": "not_supported_by_metadata_only_package",
750
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
751
- "normalized_score": null,
752
- "raw_text": "n/a",
753
- "status_label": "not supported"
754
- },
755
  "qwen3_omni_v6_lora": {
756
  "raw": null,
757
  "metric_key": "mpjpe",
@@ -856,7 +856,7 @@
856
  "raw": 0.4381481308057444,
857
  "metric_key": "macro_f1",
858
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
859
- "scope": "multi_episode_128_metadata_baseline",
860
  "status": "scored",
861
  "reason": null,
862
  "normalized_score": 0.4381481308057444,
@@ -867,7 +867,7 @@
867
  "raw": 0.5682695682695682,
868
  "metric_key": "macro_f1",
869
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
870
- "scope": "multi_episode_128_metadata_baseline",
871
  "status": "scored",
872
  "reason": null,
873
  "normalized_score": 0.5682695682695682,
@@ -956,7 +956,7 @@
956
  "raw": 0.17764578833693304,
957
  "metric_key": "micro_f1",
958
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
959
- "scope": "multi_episode_128_metadata_baseline",
960
  "status": "scored",
961
  "reason": null,
962
  "normalized_score": 0.17764578833693304,
@@ -967,7 +967,7 @@
967
  "raw": 0.18662723837686876,
968
  "metric_key": "micro_f1",
969
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
970
- "scope": "multi_episode_128_metadata_baseline",
971
  "status": "scored",
972
  "reason": null,
973
  "normalized_score": 0.18662723837686876,
@@ -1056,7 +1056,7 @@
1056
  "raw": 0.002332374220713973,
1057
  "metric_key": "mrr",
1058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1059
- "scope": "multi_episode_128_metadata_baseline",
1060
  "status": "scored",
1061
  "reason": null,
1062
  "normalized_score": 0.002332374220713973,
@@ -1067,7 +1067,7 @@
1067
  "raw": 0.008236799389123917,
1068
  "metric_key": "mrr",
1069
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1070
- "scope": "multi_episode_128_metadata_baseline",
1071
  "status": "scored",
1072
  "reason": null,
1073
  "normalized_score": 0.008236799389123917,
@@ -1175,15 +1175,26 @@
1175
  "status_label": "scored"
1176
  },
1177
  "metadata128_simple": {
1178
- "raw": null,
1179
  "metric_key": "mrr",
1180
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1181
- "scope": "multi_episode_128_metadata_baseline",
1182
- "status": "unsupported_without_required_target",
1183
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package",
1184
- "normalized_score": null,
1185
- "raw_text": "n/a",
1186
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
1187
  },
1188
  "raw128_simple": {
1189
  "raw": 0.003459817497059703,
@@ -1207,17 +1218,6 @@
1207
  "raw_text": "0.0025",
1208
  "status_label": "scored"
1209
  },
1210
- "metadata128_neural_mlp": {
1211
- "raw": null,
1212
- "metric_key": "mrr",
1213
- "source": null,
1214
- "scope": "multi_episode_128_metadata_baseline",
1215
- "status": "not_supported_by_metadata_only_package",
1216
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1217
- "normalized_score": null,
1218
- "raw_text": "n/a",
1219
- "status_label": "not supported"
1220
- },
1221
  "cosmos3_super_reasoner": {
1222
  "raw": null,
1223
  "metric_key": "mrr",
@@ -1264,15 +1264,26 @@
1264
  "status_label": "scored"
1265
  },
1266
  "metadata128_simple": {
1267
- "raw": null,
1268
  "metric_key": "r2",
1269
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1270
- "scope": "multi_episode_128_metadata_baseline",
1271
- "status": "unsupported_without_required_target",
1272
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package",
1273
- "normalized_score": null,
1274
- "raw_text": "n/a",
1275
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
1276
  },
1277
  "raw128_simple": {
1278
  "raw": -1.3450960391924882,
@@ -1296,17 +1307,6 @@
1296
  "raw_text": "-1.397",
1297
  "status_label": "scored"
1298
  },
1299
- "metadata128_neural_mlp": {
1300
- "raw": null,
1301
- "metric_key": "r2",
1302
- "source": null,
1303
- "scope": "multi_episode_128_metadata_baseline",
1304
- "status": "not_supported_by_metadata_only_package",
1305
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1306
- "normalized_score": null,
1307
- "raw_text": "n/a",
1308
- "status_label": "not supported"
1309
- },
1310
  "qwen3_omni_v6_lora": {
1311
  "raw": null,
1312
  "metric_key": "r2",
@@ -1389,7 +1389,7 @@
1389
  "raw": 0.4198864140782312,
1390
  "metric_key": "f1",
1391
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1392
- "scope": "multi_episode_128_metadata_baseline",
1393
  "status": "scored",
1394
  "reason": null,
1395
  "normalized_score": 0.4198864140782312,
@@ -1400,7 +1400,7 @@
1400
  "raw": 0.8252408266656923,
1401
  "metric_key": "f1",
1402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1403
- "scope": "multi_episode_128_metadata_baseline",
1404
  "status": "scored",
1405
  "reason": null,
1406
  "normalized_score": 0.8252408266656923,
@@ -1497,15 +1497,26 @@
1497
  "status_label": "scored"
1498
  },
1499
  "metadata128_simple": {
1500
- "raw": null,
1501
  "metric_key": "f1",
1502
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1503
- "scope": "multi_episode_128_metadata_baseline",
1504
- "status": "unsupported_without_required_target",
1505
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone",
1506
- "normalized_score": null,
1507
- "raw_text": "n/a",
1508
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
1509
  },
1510
  "raw128_simple": {
1511
  "raw": 0.4958867673901769,
@@ -1529,17 +1540,6 @@
1529
  "raw_text": "0.8273",
1530
  "status_label": "scored"
1531
  },
1532
- "metadata128_neural_mlp": {
1533
- "raw": null,
1534
- "metric_key": "f1",
1535
- "source": null,
1536
- "scope": "multi_episode_128_metadata_baseline",
1537
- "status": "not_supported_by_metadata_only_package",
1538
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1539
- "normalized_score": null,
1540
- "raw_text": "n/a",
1541
- "status_label": "not supported"
1542
- },
1543
  "cosmos3_super_reasoner": {
1544
  "raw": null,
1545
  "metric_key": "f1",
@@ -1611,7 +1611,7 @@
1611
  "raw": 0.004579592783699693,
1612
  "metric_key": "macro_f1",
1613
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1614
- "scope": "multi_episode_128_metadata_baseline",
1615
  "status": "scored",
1616
  "reason": null,
1617
  "normalized_score": 0.004579592783699693,
@@ -1622,7 +1622,7 @@
1622
  "raw": 0.0029821307969142615,
1623
  "metric_key": "macro_f1",
1624
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1625
- "scope": "multi_episode_128_metadata_baseline",
1626
  "status": "scored",
1627
  "reason": null,
1628
  "normalized_score": 0.0029821307969142615,
@@ -1722,7 +1722,7 @@
1722
  "raw": 0.0001206030150753769,
1723
  "metric_key": "macro_f1",
1724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1725
- "scope": "multi_episode_128_metadata_baseline",
1726
  "status": "scored",
1727
  "reason": null,
1728
  "normalized_score": 0.0001206030150753769,
@@ -1733,7 +1733,7 @@
1733
  "raw": 2.086049543676662e-05,
1734
  "metric_key": "macro_f1",
1735
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1736
- "scope": "multi_episode_128_metadata_baseline",
1737
  "status": "scored",
1738
  "reason": null,
1739
  "normalized_score": 2.086049543676662e-05,
@@ -1822,7 +1822,7 @@
1822
  "raw": null,
1823
  "metric_key": "macro_f1",
1824
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1825
- "scope": "multi_episode_128_metadata_baseline",
1826
  "status": "unsupported_without_required_target",
1827
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1828
  "normalized_score": null,
@@ -1855,9 +1855,9 @@
1855
  "raw": null,
1856
  "metric_key": "macro_f1",
1857
  "source": null,
1858
- "scope": "multi_episode_128_metadata_baseline",
1859
  "status": "not_supported_by_metadata_only_package",
1860
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
1861
  "normalized_score": null,
1862
  "raw_text": "n/a",
1863
  "status_label": "not supported"
@@ -1955,7 +1955,7 @@
1955
  "raw": 0.0,
1956
  "metric_key": "macro_f1",
1957
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1958
- "scope": "multi_episode_128_metadata_baseline",
1959
  "status": "scored",
1960
  "reason": null,
1961
  "normalized_score": 0.0,
@@ -1966,7 +1966,7 @@
1966
  "raw": 0.0,
1967
  "metric_key": "macro_f1",
1968
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1969
- "scope": "multi_episode_128_metadata_baseline",
1970
  "status": "scored",
1971
  "reason": null,
1972
  "normalized_score": 0.0,
@@ -2055,7 +2055,7 @@
2055
  "raw": 0.17656983343047333,
2056
  "metric_key": "micro_f1",
2057
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2058
- "scope": "multi_episode_128_metadata_baseline",
2059
  "status": "scored",
2060
  "reason": null,
2061
  "normalized_score": 0.17656983343047333,
@@ -2066,7 +2066,7 @@
2066
  "raw": 0.17418550827844048,
2067
  "metric_key": "micro_f1",
2068
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2069
- "scope": "multi_episode_128_metadata_baseline",
2070
  "status": "scored",
2071
  "reason": null,
2072
  "normalized_score": 0.17418550827844048,
@@ -2152,15 +2152,26 @@
2152
  "status_label": "scored"
2153
  },
2154
  "metadata128_simple": {
2155
- "raw": null,
2156
  "metric_key": "mae",
2157
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
2158
- "scope": "multi_episode_128_metadata_baseline",
2159
- "status": "unsupported_without_required_target",
2160
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package",
2161
- "normalized_score": null,
2162
- "raw_text": "n/a",
2163
- "status_label": "unsupported"
 
 
 
 
 
 
 
 
 
 
 
2164
  },
2165
  "raw128_simple": {
2166
  "raw": 0.22941437363624573,
@@ -2184,17 +2195,6 @@
2184
  "raw_text": "0.2530",
2185
  "status_label": "scored"
2186
  },
2187
- "metadata128_neural_mlp": {
2188
- "raw": null,
2189
- "metric_key": "mae",
2190
- "source": null,
2191
- "scope": "multi_episode_128_metadata_baseline",
2192
- "status": "not_supported_by_metadata_only_package",
2193
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2194
- "normalized_score": null,
2195
- "raw_text": "n/a",
2196
- "status_label": "not supported"
2197
- },
2198
  "qwen3_omni_v6_lora": {
2199
  "raw": null,
2200
  "metric_key": "mae",
@@ -2266,7 +2266,7 @@
2266
  "raw": null,
2267
  "metric_key": "mrr",
2268
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
2269
- "scope": "multi_episode_128_metadata_baseline",
2270
  "status": "unsupported_without_required_target",
2271
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
2272
  "normalized_score": null,
@@ -2299,9 +2299,9 @@
2299
  "raw": null,
2300
  "metric_key": "mrr",
2301
  "source": null,
2302
- "scope": "multi_episode_128_metadata_baseline",
2303
  "status": "not_supported_by_metadata_only_package",
2304
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required",
2305
  "normalized_score": null,
2306
  "raw_text": "n/a",
2307
  "status_label": "not supported"
@@ -2388,7 +2388,7 @@
2388
  "raw": 624.8108520507812,
2389
  "metric_key": "mae",
2390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
2391
- "scope": "multi_episode_128_metadata_baseline",
2392
  "status": "scored",
2393
  "reason": null,
2394
  "normalized_score": 0.016864874132806403,
@@ -2399,7 +2399,7 @@
2399
  "raw": 41.4664421081543,
2400
  "metric_key": "mae",
2401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
2402
- "scope": "multi_episode_128_metadata_baseline",
2403
  "status": "scored",
2404
  "reason": null,
2405
  "normalized_score": 0.25411768748242325,
@@ -2456,18 +2456,18 @@
2456
  "model_branch_cards": [
2457
  {
2458
  "id": "metadata128_simple",
2459
- "title": "128ep Metadata Simple",
2460
  "status": "a100_rerun_pass",
2461
- "coverage": "20 records / 13 scored JSONL-supported axes",
2462
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2463
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2464
  },
2465
  {
2466
  "id": "metadata128_neural_mlp",
2467
- "title": "128ep Metadata NN",
2468
  "status": "a100_rerun_pass",
2469
- "coverage": "20 records / 13 scored JSONL-supported axes",
2470
- "headline": "compact MLP heads over metadata/text features",
2471
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2472
  },
2473
  {
@@ -2562,7 +2562,7 @@
2562
  "task_id": "timeline_action",
2563
  "task_label": "Action Recognition",
2564
  "series_id": "metadata128_simple",
2565
- "method": "128ep Metadata Simple",
2566
  "status": "scored",
2567
  "status_label": "scored",
2568
  "scored": true,
@@ -2572,7 +2572,7 @@
2572
  "normalized_score": 0.008252821966746326,
2573
  "metric_key": "macro_f1",
2574
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2575
- "scope": "multi_episode_128_metadata_baseline",
2576
  "reason": null
2577
  },
2578
  {
@@ -2580,7 +2580,7 @@
2580
  "task_id": "timeline_action",
2581
  "task_label": "Action Recognition",
2582
  "series_id": "metadata128_neural_mlp",
2583
- "method": "128ep Metadata NN",
2584
  "status": "scored",
2585
  "status_label": "scored",
2586
  "scored": true,
@@ -2590,7 +2590,7 @@
2590
  "normalized_score": 0.004175793689174209,
2591
  "metric_key": "macro_f1",
2592
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2593
- "scope": "multi_episode_128_metadata_baseline",
2594
  "reason": null
2595
  },
2596
  {
@@ -2724,7 +2724,7 @@
2724
  "task_id": "timeline_subtask",
2725
  "task_label": "Procedure Step Recognition",
2726
  "series_id": "metadata128_simple",
2727
- "method": "128ep Metadata Simple",
2728
  "status": "scored",
2729
  "status_label": "scored",
2730
  "scored": true,
@@ -2734,7 +2734,7 @@
2734
  "normalized_score": 0.00019512195121951218,
2735
  "metric_key": "macro_f1",
2736
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2737
- "scope": "multi_episode_128_metadata_baseline",
2738
  "reason": null
2739
  },
2740
  {
@@ -2742,7 +2742,7 @@
2742
  "task_id": "timeline_subtask",
2743
  "task_label": "Procedure Step Recognition",
2744
  "series_id": "metadata128_neural_mlp",
2745
- "method": "128ep Metadata NN",
2746
  "status": "scored",
2747
  "status_label": "scored",
2748
  "scored": true,
@@ -2752,7 +2752,7 @@
2752
  "normalized_score": 7.207207207207208e-05,
2753
  "metric_key": "macro_f1",
2754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2755
- "scope": "multi_episode_128_metadata_baseline",
2756
  "reason": null
2757
  },
2758
  {
@@ -2886,7 +2886,7 @@
2886
  "task_id": "transition_detection",
2887
  "task_label": "Action Boundary Detection",
2888
  "series_id": "metadata128_simple",
2889
- "method": "128ep Metadata Simple",
2890
  "status": "scored",
2891
  "status_label": "scored",
2892
  "scored": true,
@@ -2896,7 +2896,7 @@
2896
  "normalized_score": 0.29652162550029315,
2897
  "metric_key": "macro_f1",
2898
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2899
- "scope": "multi_episode_128_metadata_baseline",
2900
  "reason": null
2901
  },
2902
  {
@@ -2904,7 +2904,7 @@
2904
  "task_id": "transition_detection",
2905
  "task_label": "Action Boundary Detection",
2906
  "series_id": "metadata128_neural_mlp",
2907
- "method": "128ep Metadata NN",
2908
  "status": "scored",
2909
  "status_label": "scored",
2910
  "scored": true,
@@ -2914,7 +2914,7 @@
2914
  "normalized_score": 0.4841733292368365,
2915
  "metric_key": "macro_f1",
2916
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2917
- "scope": "multi_episode_128_metadata_baseline",
2918
  "reason": null
2919
  },
2920
  {
@@ -3048,7 +3048,7 @@
3048
  "task_id": "next_action",
3049
  "task_label": "Next-Action Prediction",
3050
  "series_id": "metadata128_simple",
3051
- "method": "128ep Metadata Simple",
3052
  "status": "scored",
3053
  "status_label": "scored",
3054
  "scored": true,
@@ -3058,7 +3058,7 @@
3058
  "normalized_score": 0.006514774539765508,
3059
  "metric_key": "macro_f1",
3060
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
3061
- "scope": "multi_episode_128_metadata_baseline",
3062
  "reason": null
3063
  },
3064
  {
@@ -3066,7 +3066,7 @@
3066
  "task_id": "next_action",
3067
  "task_label": "Next-Action Prediction",
3068
  "series_id": "metadata128_neural_mlp",
3069
- "method": "128ep Metadata NN",
3070
  "status": "scored",
3071
  "status_label": "scored",
3072
  "scored": true,
@@ -3076,7 +3076,7 @@
3076
  "normalized_score": 0.004910507980164745,
3077
  "metric_key": "macro_f1",
3078
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
3079
- "scope": "multi_episode_128_metadata_baseline",
3080
  "reason": null
3081
  },
3082
  {
@@ -3210,36 +3210,36 @@
3210
  "task_id": "hand_trajectory_forecast",
3211
  "task_label": "Hand Trajectory Forecasting",
3212
  "series_id": "metadata128_simple",
3213
- "method": "128ep Metadata Simple",
3214
- "status": "unsupported_without_required_target",
3215
- "status_label": "unsupported",
3216
- "scored": false,
3217
  "proxy_scored": false,
3218
- "raw": null,
3219
- "raw_text": "n/a",
3220
- "normalized_score": null,
3221
  "metric_key": "mpjpe",
3222
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
3223
- "scope": "multi_episode_128_metadata_baseline",
3224
- "reason": "requires future hand-joint trajectories from raw sensor feature NPZ blocks, which are not in the public 128 package"
3225
  },
3226
  {
3227
  "task_number": 5,
3228
  "task_id": "hand_trajectory_forecast",
3229
  "task_label": "Hand Trajectory Forecasting",
3230
  "series_id": "metadata128_neural_mlp",
3231
- "method": "128ep Metadata NN",
3232
- "status": "not_supported_by_metadata_only_package",
3233
- "status_label": "not supported",
3234
- "scored": false,
3235
  "proxy_scored": false,
3236
- "raw": null,
3237
- "raw_text": "n/a",
3238
- "normalized_score": null,
3239
  "metric_key": "mpjpe",
3240
- "source": null,
3241
- "scope": "multi_episode_128_metadata_baseline",
3242
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3243
  },
3244
  {
3245
  "task_number": 5,
@@ -3372,7 +3372,7 @@
3372
  "task_id": "contact_prediction",
3373
  "task_label": "Contact State Prediction",
3374
  "series_id": "metadata128_simple",
3375
- "method": "128ep Metadata Simple",
3376
  "status": "scored",
3377
  "status_label": "scored",
3378
  "scored": true,
@@ -3382,7 +3382,7 @@
3382
  "normalized_score": 0.4381481308057444,
3383
  "metric_key": "macro_f1",
3384
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
3385
- "scope": "multi_episode_128_metadata_baseline",
3386
  "reason": null
3387
  },
3388
  {
@@ -3390,7 +3390,7 @@
3390
  "task_id": "contact_prediction",
3391
  "task_label": "Contact State Prediction",
3392
  "series_id": "metadata128_neural_mlp",
3393
- "method": "128ep Metadata NN",
3394
  "status": "scored",
3395
  "status_label": "scored",
3396
  "scored": true,
@@ -3400,7 +3400,7 @@
3400
  "normalized_score": 0.5682695682695682,
3401
  "metric_key": "macro_f1",
3402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
3403
- "scope": "multi_episode_128_metadata_baseline",
3404
  "reason": null
3405
  },
3406
  {
@@ -3534,7 +3534,7 @@
3534
  "task_id": "object_relevance",
3535
  "task_label": "Object Relevance Prediction",
3536
  "series_id": "metadata128_simple",
3537
- "method": "128ep Metadata Simple",
3538
  "status": "scored",
3539
  "status_label": "scored",
3540
  "scored": true,
@@ -3544,7 +3544,7 @@
3544
  "normalized_score": 0.17764578833693304,
3545
  "metric_key": "micro_f1",
3546
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
3547
- "scope": "multi_episode_128_metadata_baseline",
3548
  "reason": null
3549
  },
3550
  {
@@ -3552,7 +3552,7 @@
3552
  "task_id": "object_relevance",
3553
  "task_label": "Object Relevance Prediction",
3554
  "series_id": "metadata128_neural_mlp",
3555
- "method": "128ep Metadata NN",
3556
  "status": "scored",
3557
  "status_label": "scored",
3558
  "scored": true,
@@ -3562,7 +3562,7 @@
3562
  "normalized_score": 0.18662723837686876,
3563
  "metric_key": "micro_f1",
3564
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
3565
- "scope": "multi_episode_128_metadata_baseline",
3566
  "reason": null
3567
  },
3568
  {
@@ -3696,7 +3696,7 @@
3696
  "task_id": "caption_grounding",
3697
  "task_label": "Language Grounding",
3698
  "series_id": "metadata128_simple",
3699
- "method": "128ep Metadata Simple",
3700
  "status": "scored",
3701
  "status_label": "scored",
3702
  "scored": true,
@@ -3706,7 +3706,7 @@
3706
  "normalized_score": 0.002332374220713973,
3707
  "metric_key": "mrr",
3708
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
3709
- "scope": "multi_episode_128_metadata_baseline",
3710
  "reason": null
3711
  },
3712
  {
@@ -3714,7 +3714,7 @@
3714
  "task_id": "caption_grounding",
3715
  "task_label": "Language Grounding",
3716
  "series_id": "metadata128_neural_mlp",
3717
- "method": "128ep Metadata NN",
3718
  "status": "scored",
3719
  "status_label": "scored",
3720
  "scored": true,
@@ -3724,7 +3724,7 @@
3724
  "normalized_score": 0.008236799389123917,
3725
  "metric_key": "mrr",
3726
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
3727
- "scope": "multi_episode_128_metadata_baseline",
3728
  "reason": null
3729
  },
3730
  {
@@ -3858,36 +3858,36 @@
3858
  "task_id": "cross_modal_retrieval",
3859
  "task_label": "Cross-Modal Retrieval",
3860
  "series_id": "metadata128_simple",
3861
- "method": "128ep Metadata Simple",
3862
- "status": "unsupported_without_required_target",
3863
- "status_label": "unsupported",
3864
- "scored": false,
3865
  "proxy_scored": false,
3866
- "raw": null,
3867
- "raw_text": "n/a",
3868
- "normalized_score": null,
3869
  "metric_key": "mrr",
3870
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3871
- "scope": "multi_episode_128_metadata_baseline",
3872
- "reason": "requires paired motion/IMU/camera/audio/depth feature blocks, which are not in the public 128 package"
3873
  },
3874
  {
3875
  "task_number": 9,
3876
  "task_id": "cross_modal_retrieval",
3877
  "task_label": "Cross-Modal Retrieval",
3878
  "series_id": "metadata128_neural_mlp",
3879
- "method": "128ep Metadata NN",
3880
- "status": "not_supported_by_metadata_only_package",
3881
- "status_label": "not supported",
3882
- "scored": false,
3883
  "proxy_scored": false,
3884
- "raw": null,
3885
- "raw_text": "n/a",
3886
- "normalized_score": null,
3887
  "metric_key": "mrr",
3888
- "source": null,
3889
- "scope": "multi_episode_128_metadata_baseline",
3890
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
3891
  },
3892
  {
3893
  "task_number": 9,
@@ -4020,36 +4020,36 @@
4020
  "task_id": "modality_reconstruction",
4021
  "task_label": "Cross-Modal Reconstruction",
4022
  "series_id": "metadata128_simple",
4023
- "method": "128ep Metadata Simple",
4024
- "status": "unsupported_without_required_target",
4025
- "status_label": "unsupported",
4026
- "scored": false,
4027
  "proxy_scored": false,
4028
- "raw": null,
4029
- "raw_text": "n/a",
4030
- "normalized_score": null,
4031
  "metric_key": "r2",
4032
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
4033
- "scope": "multi_episode_128_metadata_baseline",
4034
- "reason": "requires source and target modality feature blocks such as depth/video vectors, which are not in the public 128 package"
4035
  },
4036
  {
4037
  "task_number": 10,
4038
  "task_id": "modality_reconstruction",
4039
  "task_label": "Cross-Modal Reconstruction",
4040
  "series_id": "metadata128_neural_mlp",
4041
- "method": "128ep Metadata NN",
4042
- "status": "not_supported_by_metadata_only_package",
4043
- "status_label": "not supported",
4044
- "scored": false,
4045
  "proxy_scored": false,
4046
- "raw": null,
4047
- "raw_text": "n/a",
4048
- "normalized_score": null,
4049
  "metric_key": "r2",
4050
- "source": null,
4051
- "scope": "multi_episode_128_metadata_baseline",
4052
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4053
  },
4054
  {
4055
  "task_number": 10,
@@ -4182,7 +4182,7 @@
4182
  "task_id": "temporal_order",
4183
  "task_label": "Temporal Order Verification",
4184
  "series_id": "metadata128_simple",
4185
- "method": "128ep Metadata Simple",
4186
  "status": "scored",
4187
  "status_label": "scored",
4188
  "scored": true,
@@ -4192,7 +4192,7 @@
4192
  "normalized_score": 0.4198864140782312,
4193
  "metric_key": "f1",
4194
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
4195
- "scope": "multi_episode_128_metadata_baseline",
4196
  "reason": null
4197
  },
4198
  {
@@ -4200,7 +4200,7 @@
4200
  "task_id": "temporal_order",
4201
  "task_label": "Temporal Order Verification",
4202
  "series_id": "metadata128_neural_mlp",
4203
- "method": "128ep Metadata NN",
4204
  "status": "scored",
4205
  "status_label": "scored",
4206
  "scored": true,
@@ -4210,7 +4210,7 @@
4210
  "normalized_score": 0.8252408266656923,
4211
  "metric_key": "f1",
4212
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
4213
- "scope": "multi_episode_128_metadata_baseline",
4214
  "reason": null
4215
  },
4216
  {
@@ -4344,36 +4344,36 @@
4344
  "task_id": "misalignment_detection",
4345
  "task_label": "Multimodal Synchronization Detection",
4346
  "series_id": "metadata128_simple",
4347
- "method": "128ep Metadata Simple",
4348
- "status": "unsupported_without_required_target",
4349
- "status_label": "unsupported",
4350
- "scored": false,
4351
  "proxy_scored": false,
4352
- "raw": null,
4353
- "raw_text": "n/a",
4354
- "normalized_score": null,
4355
  "metric_key": "f1",
4356
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
4357
- "scope": "multi_episode_128_metadata_baseline",
4358
- "reason": "requires deliberately shifted cross-modal feature pairs, which cannot be reconstructed from the public JSONL labels alone"
4359
  },
4360
  {
4361
  "task_number": 12,
4362
  "task_id": "misalignment_detection",
4363
  "task_label": "Multimodal Synchronization Detection",
4364
  "series_id": "metadata128_neural_mlp",
4365
- "method": "128ep Metadata NN",
4366
- "status": "not_supported_by_metadata_only_package",
4367
- "status_label": "not supported",
4368
- "scored": false,
4369
  "proxy_scored": false,
4370
- "raw": null,
4371
- "raw_text": "n/a",
4372
- "normalized_score": null,
4373
  "metric_key": "f1",
4374
- "source": null,
4375
- "scope": "multi_episode_128_metadata_baseline",
4376
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4377
  },
4378
  {
4379
  "task_number": 12,
@@ -4506,7 +4506,7 @@
4506
  "task_id": "long_horizon_next_action",
4507
  "task_label": "Long-Horizon Next-Action Forecasting",
4508
  "series_id": "metadata128_simple",
4509
- "method": "128ep Metadata Simple",
4510
  "status": "scored",
4511
  "status_label": "scored",
4512
  "scored": true,
@@ -4516,7 +4516,7 @@
4516
  "normalized_score": 0.004579592783699693,
4517
  "metric_key": "macro_f1",
4518
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
4519
- "scope": "multi_episode_128_metadata_baseline",
4520
  "reason": null
4521
  },
4522
  {
@@ -4524,7 +4524,7 @@
4524
  "task_id": "long_horizon_next_action",
4525
  "task_label": "Long-Horizon Next-Action Forecasting",
4526
  "series_id": "metadata128_neural_mlp",
4527
- "method": "128ep Metadata NN",
4528
  "status": "scored",
4529
  "status_label": "scored",
4530
  "scored": true,
@@ -4534,7 +4534,7 @@
4534
  "normalized_score": 0.0029821307969142615,
4535
  "metric_key": "macro_f1",
4536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
4537
- "scope": "multi_episode_128_metadata_baseline",
4538
  "reason": null
4539
  },
4540
  {
@@ -4668,7 +4668,7 @@
4668
  "task_id": "next_subtask_forecast",
4669
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4670
  "series_id": "metadata128_simple",
4671
- "method": "128ep Metadata Simple",
4672
  "status": "scored",
4673
  "status_label": "scored",
4674
  "scored": true,
@@ -4678,7 +4678,7 @@
4678
  "normalized_score": 0.0001206030150753769,
4679
  "metric_key": "macro_f1",
4680
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
4681
- "scope": "multi_episode_128_metadata_baseline",
4682
  "reason": null
4683
  },
4684
  {
@@ -4686,7 +4686,7 @@
4686
  "task_id": "next_subtask_forecast",
4687
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4688
  "series_id": "metadata128_neural_mlp",
4689
- "method": "128ep Metadata NN",
4690
  "status": "scored",
4691
  "status_label": "scored",
4692
  "scored": true,
@@ -4696,7 +4696,7 @@
4696
  "normalized_score": 2.086049543676662e-05,
4697
  "metric_key": "macro_f1",
4698
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
4699
- "scope": "multi_episode_128_metadata_baseline",
4700
  "reason": null
4701
  },
4702
  {
@@ -4830,7 +4830,7 @@
4830
  "task_id": "interaction_text_prediction",
4831
  "task_label": "Interaction Text Prediction",
4832
  "series_id": "metadata128_simple",
4833
- "method": "128ep Metadata Simple",
4834
  "status": "unsupported_without_required_target",
4835
  "status_label": "unsupported",
4836
  "scored": false,
@@ -4840,7 +4840,7 @@
4840
  "normalized_score": null,
4841
  "metric_key": "macro_f1",
4842
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
4843
- "scope": "multi_episode_128_metadata_baseline",
4844
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
4845
  },
4846
  {
@@ -4848,7 +4848,7 @@
4848
  "task_id": "interaction_text_prediction",
4849
  "task_label": "Interaction Text Prediction",
4850
  "series_id": "metadata128_neural_mlp",
4851
- "method": "128ep Metadata NN",
4852
  "status": "not_supported_by_metadata_only_package",
4853
  "status_label": "not supported",
4854
  "scored": false,
@@ -4858,8 +4858,8 @@
4858
  "normalized_score": null,
4859
  "metric_key": "macro_f1",
4860
  "source": null,
4861
- "scope": "multi_episode_128_metadata_baseline",
4862
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
4863
  },
4864
  {
4865
  "task_number": 15,
@@ -4992,7 +4992,7 @@
4992
  "task_id": "action_object_relation",
4993
  "task_label": "Action-Object Relation Prediction",
4994
  "series_id": "metadata128_simple",
4995
- "method": "128ep Metadata Simple",
4996
  "status": "scored",
4997
  "status_label": "scored",
4998
  "scored": true,
@@ -5002,7 +5002,7 @@
5002
  "normalized_score": 0.0,
5003
  "metric_key": "macro_f1",
5004
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
5005
- "scope": "multi_episode_128_metadata_baseline",
5006
  "reason": null
5007
  },
5008
  {
@@ -5010,7 +5010,7 @@
5010
  "task_id": "action_object_relation",
5011
  "task_label": "Action-Object Relation Prediction",
5012
  "series_id": "metadata128_neural_mlp",
5013
- "method": "128ep Metadata NN",
5014
  "status": "scored",
5015
  "status_label": "scored",
5016
  "scored": true,
@@ -5020,7 +5020,7 @@
5020
  "normalized_score": 0.0,
5021
  "metric_key": "macro_f1",
5022
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
5023
- "scope": "multi_episode_128_metadata_baseline",
5024
  "reason": null
5025
  },
5026
  {
@@ -5154,7 +5154,7 @@
5154
  "task_id": "object_set_forecast",
5155
  "task_label": "Future Object-Set Forecasting",
5156
  "series_id": "metadata128_simple",
5157
- "method": "128ep Metadata Simple",
5158
  "status": "scored",
5159
  "status_label": "scored",
5160
  "scored": true,
@@ -5164,7 +5164,7 @@
5164
  "normalized_score": 0.17656983343047333,
5165
  "metric_key": "micro_f1",
5166
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
5167
- "scope": "multi_episode_128_metadata_baseline",
5168
  "reason": null
5169
  },
5170
  {
@@ -5172,7 +5172,7 @@
5172
  "task_id": "object_set_forecast",
5173
  "task_label": "Future Object-Set Forecasting",
5174
  "series_id": "metadata128_neural_mlp",
5175
- "method": "128ep Metadata NN",
5176
  "status": "scored",
5177
  "status_label": "scored",
5178
  "scored": true,
@@ -5182,7 +5182,7 @@
5182
  "normalized_score": 0.17418550827844048,
5183
  "metric_key": "micro_f1",
5184
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
5185
- "scope": "multi_episode_128_metadata_baseline",
5186
  "reason": null
5187
  },
5188
  {
@@ -5316,36 +5316,36 @@
5316
  "task_id": "imu_to_hand_pose",
5317
  "task_label": "IMU-to-Hand Pose Reconstruction",
5318
  "series_id": "metadata128_simple",
5319
- "method": "128ep Metadata Simple",
5320
- "status": "unsupported_without_required_target",
5321
- "status_label": "unsupported",
5322
- "scored": false,
5323
  "proxy_scored": false,
5324
- "raw": null,
5325
- "raw_text": "n/a",
5326
- "normalized_score": null,
5327
  "metric_key": "mae",
5328
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
5329
- "scope": "multi_episode_128_metadata_baseline",
5330
- "reason": "requires raw IMU and hand-joint feature blocks, which are not in the public 128 JSONL metadata package"
5331
  },
5332
  {
5333
  "task_number": 18,
5334
  "task_id": "imu_to_hand_pose",
5335
  "task_label": "IMU-to-Hand Pose Reconstruction",
5336
  "series_id": "metadata128_neural_mlp",
5337
- "method": "128ep Metadata NN",
5338
- "status": "not_supported_by_metadata_only_package",
5339
- "status_label": "not supported",
5340
- "scored": false,
5341
  "proxy_scored": false,
5342
- "raw": null,
5343
- "raw_text": "n/a",
5344
- "normalized_score": null,
5345
  "metric_key": "mae",
5346
- "source": null,
5347
- "scope": "multi_episode_128_metadata_baseline",
5348
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5349
  },
5350
  {
5351
  "task_number": 18,
@@ -5478,7 +5478,7 @@
5478
  "task_id": "camera_view_sync_retrieval",
5479
  "task_label": "Camera-View Synchronization Retrieval",
5480
  "series_id": "metadata128_simple",
5481
- "method": "128ep Metadata Simple",
5482
  "status": "unsupported_without_required_target",
5483
  "status_label": "unsupported",
5484
  "scored": false,
@@ -5488,7 +5488,7 @@
5488
  "normalized_score": null,
5489
  "metric_key": "mrr",
5490
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
5491
- "scope": "multi_episode_128_metadata_baseline",
5492
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
5493
  },
5494
  {
@@ -5496,7 +5496,7 @@
5496
  "task_id": "camera_view_sync_retrieval",
5497
  "task_label": "Camera-View Synchronization Retrieval",
5498
  "series_id": "metadata128_neural_mlp",
5499
- "method": "128ep Metadata NN",
5500
  "status": "not_supported_by_metadata_only_package",
5501
  "status_label": "not supported",
5502
  "scored": false,
@@ -5506,8 +5506,8 @@
5506
  "normalized_score": null,
5507
  "metric_key": "mrr",
5508
  "source": null,
5509
- "scope": "multi_episode_128_metadata_baseline",
5510
- "reason": "the 128-episode metadata/text rerun did not produce this task target; raw sensor blocks or a task-specific metadata target builder are required"
5511
  },
5512
  {
5513
  "task_number": 19,
@@ -5640,7 +5640,7 @@
5640
  "task_id": "time_to_transition",
5641
  "task_label": "Time-to-Next-Transition Regression",
5642
  "series_id": "metadata128_simple",
5643
- "method": "128ep Metadata Simple",
5644
  "status": "scored",
5645
  "status_label": "scored",
5646
  "scored": true,
@@ -5650,7 +5650,7 @@
5650
  "normalized_score": 0.016864874132806403,
5651
  "metric_key": "mae",
5652
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
5653
- "scope": "multi_episode_128_metadata_baseline",
5654
  "reason": null
5655
  },
5656
  {
@@ -5658,7 +5658,7 @@
5658
  "task_id": "time_to_transition",
5659
  "task_label": "Time-to-Next-Transition Regression",
5660
  "series_id": "metadata128_neural_mlp",
5661
- "method": "128ep Metadata NN",
5662
  "status": "scored",
5663
  "status_label": "scored",
5664
  "scored": true,
@@ -5668,7 +5668,7 @@
5668
  "normalized_score": 0.25411768748242325,
5669
  "metric_key": "mae",
5670
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
5671
- "scope": "multi_episode_128_metadata_baseline",
5672
  "reason": null
5673
  },
5674
  {
 
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-18T12:52:26+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
8
+ "scored_method_task_count": 143,
9
  "normalization_policy": {
10
  "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
11
  "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
12
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
13
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
14
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
15
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
16
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export."
17
  },
18
  "series": [
 
64
  },
65
  {
66
  "id": "metadata128_simple",
67
+ "label": "128ep Aligned Simple",
68
  "short_label": "128-S",
69
  "color": "#ffd166",
70
+ "kind": "partial_128_episode_aligned_baseline",
71
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
72
  "stroke_dasharray": "9 6",
73
+ "method_detail": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
74
  "plotted_as": "colored point overlay",
75
  "result_record_count": 20,
76
+ "scored_task_count": 18,
77
+ "covered_task_count": 18,
78
  "proxy_scored_task_count": 0,
79
+ "scoreless_task_count": 2,
80
+ "unsupported_task_count": 2,
81
  "not_evaluated_task_count": 0,
82
  "status_counts": {
83
+ "scored": 18,
84
+ "unsupported_without_required_target": 2
85
  },
86
+ "coverage_fraction": 0.9,
87
  "result_record_fraction": 1.0
88
  },
89
  {
90
  "id": "metadata128_neural_mlp",
91
+ "label": "128ep Aligned NN",
92
  "short_label": "128-NN",
93
  "color": "#f472b6",
94
+ "kind": "partial_128_episode_aligned_baseline",
95
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
96
  "stroke_dasharray": "3 6",
97
+ "method_detail": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
98
  "plotted_as": "colored point overlay",
99
  "result_record_count": 20,
100
+ "scored_task_count": 18,
101
+ "covered_task_count": 18,
102
  "proxy_scored_task_count": 0,
103
+ "scoreless_task_count": 2,
104
+ "unsupported_task_count": 2,
105
  "not_evaluated_task_count": 0,
106
  "status_counts": {
107
+ "not_supported_by_metadata_only_package": 2,
108
+ "scored": 18
109
  },
110
+ "coverage_fraction": 0.9,
111
  "result_record_fraction": 1.0
112
  },
113
  {
 
301
  "raw": 0.008252821966746326,
302
  "metric_key": "macro_f1",
303
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
304
+ "scope": "multi_episode_128_aligned_baseline",
305
  "status": "scored",
306
  "reason": null,
307
  "normalized_score": 0.008252821966746326,
 
312
  "raw": 0.004175793689174209,
313
  "metric_key": "macro_f1",
314
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
315
+ "scope": "multi_episode_128_aligned_baseline",
316
  "status": "scored",
317
  "reason": null,
318
  "normalized_score": 0.004175793689174209,
 
401
  "raw": 0.00019512195121951218,
402
  "metric_key": "macro_f1",
403
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
404
+ "scope": "multi_episode_128_aligned_baseline",
405
  "status": "scored",
406
  "reason": null,
407
  "normalized_score": 0.00019512195121951218,
 
412
  "raw": 7.207207207207208e-05,
413
  "metric_key": "macro_f1",
414
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
415
+ "scope": "multi_episode_128_aligned_baseline",
416
  "status": "scored",
417
  "reason": null,
418
  "normalized_score": 7.207207207207208e-05,
 
523
  "raw": 0.29652162550029315,
524
  "metric_key": "macro_f1",
525
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
526
+ "scope": "multi_episode_128_aligned_baseline",
527
  "status": "scored",
528
  "reason": null,
529
  "normalized_score": 0.29652162550029315,
 
534
  "raw": 0.4841733292368365,
535
  "metric_key": "macro_f1",
536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
537
+ "scope": "multi_episode_128_aligned_baseline",
538
  "status": "scored",
539
  "reason": null,
540
  "normalized_score": 0.4841733292368365,
 
634
  "raw": 0.006514774539765508,
635
  "metric_key": "macro_f1",
636
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
637
+ "scope": "multi_episode_128_aligned_baseline",
638
  "status": "scored",
639
  "reason": null,
640
  "normalized_score": 0.006514774539765508,
 
645
  "raw": 0.004910507980164745,
646
  "metric_key": "macro_f1",
647
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
648
+ "scope": "multi_episode_128_aligned_baseline",
649
  "status": "scored",
650
  "reason": null,
651
  "normalized_score": 0.004910507980164745,
 
709
  "status_label": "scored"
710
  },
711
  "metadata128_simple": {
712
+ "raw": 8.817333221435547,
713
  "metric_key": "mpjpe",
714
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
715
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
716
+ "status": "scored",
717
+ "reason": null,
718
+ "normalized_score": 0.012231610603598841,
719
+ "raw_text": "8.817",
720
+ "status_label": "scored"
721
+ },
722
+ "metadata128_neural_mlp": {
723
+ "raw": 0.429434210062027,
724
+ "metric_key": "mpjpe",
725
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
726
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
727
+ "status": "scored",
728
+ "reason": null,
729
+ "normalized_score": 0.25114484128127007,
730
+ "raw_text": "0.4294",
731
+ "status_label": "scored"
732
  },
733
  "raw128_simple": {
734
  "raw": 0.2729249894618988,
 
752
  "raw_text": "0.1848",
753
  "status_label": "scored"
754
  },
 
 
 
 
 
 
 
 
 
 
 
755
  "qwen3_omni_v6_lora": {
756
  "raw": null,
757
  "metric_key": "mpjpe",
 
856
  "raw": 0.4381481308057444,
857
  "metric_key": "macro_f1",
858
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
859
+ "scope": "multi_episode_128_aligned_baseline",
860
  "status": "scored",
861
  "reason": null,
862
  "normalized_score": 0.4381481308057444,
 
867
  "raw": 0.5682695682695682,
868
  "metric_key": "macro_f1",
869
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
870
+ "scope": "multi_episode_128_aligned_baseline",
871
  "status": "scored",
872
  "reason": null,
873
  "normalized_score": 0.5682695682695682,
 
956
  "raw": 0.17764578833693304,
957
  "metric_key": "micro_f1",
958
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
959
+ "scope": "multi_episode_128_aligned_baseline",
960
  "status": "scored",
961
  "reason": null,
962
  "normalized_score": 0.17764578833693304,
 
967
  "raw": 0.18662723837686876,
968
  "metric_key": "micro_f1",
969
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
970
+ "scope": "multi_episode_128_aligned_baseline",
971
  "status": "scored",
972
  "reason": null,
973
  "normalized_score": 0.18662723837686876,
 
1056
  "raw": 0.002332374220713973,
1057
  "metric_key": "mrr",
1058
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
1059
+ "scope": "multi_episode_128_aligned_baseline",
1060
  "status": "scored",
1061
  "reason": null,
1062
  "normalized_score": 0.002332374220713973,
 
1067
  "raw": 0.008236799389123917,
1068
  "metric_key": "mrr",
1069
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
1070
+ "scope": "multi_episode_128_aligned_baseline",
1071
  "status": "scored",
1072
  "reason": null,
1073
  "normalized_score": 0.008236799389123917,
 
1175
  "status_label": "scored"
1176
  },
1177
  "metadata128_simple": {
1178
+ "raw": 0.002587692579254508,
1179
  "metric_key": "mrr",
1180
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
1181
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1182
+ "status": "scored",
1183
+ "reason": null,
1184
+ "normalized_score": 0.002587692579254508,
1185
+ "raw_text": "0.0026",
1186
+ "status_label": "scored"
1187
+ },
1188
+ "metadata128_neural_mlp": {
1189
+ "raw": 0.0026067993603646755,
1190
+ "metric_key": "mrr",
1191
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
1192
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1193
+ "status": "scored",
1194
+ "reason": null,
1195
+ "normalized_score": 0.0026067993603646755,
1196
+ "raw_text": "0.0026",
1197
+ "status_label": "scored"
1198
  },
1199
  "raw128_simple": {
1200
  "raw": 0.003459817497059703,
 
1218
  "raw_text": "0.0025",
1219
  "status_label": "scored"
1220
  },
 
 
 
 
 
 
 
 
 
 
 
1221
  "cosmos3_super_reasoner": {
1222
  "raw": null,
1223
  "metric_key": "mrr",
 
1264
  "status_label": "scored"
1265
  },
1266
  "metadata128_simple": {
1267
+ "raw": -190.66106203944798,
1268
  "metric_key": "r2",
1269
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
1270
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1271
+ "status": "scored",
1272
+ "reason": null,
1273
+ "normalized_score": 0.0,
1274
+ "raw_text": "-190.66",
1275
+ "status_label": "scored"
1276
+ },
1277
+ "metadata128_neural_mlp": {
1278
+ "raw": -0.43481132003942147,
1279
+ "metric_key": "r2",
1280
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
1281
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1282
+ "status": "scored",
1283
+ "reason": null,
1284
+ "normalized_score": 0.0,
1285
+ "raw_text": "-0.4348",
1286
+ "status_label": "scored"
1287
  },
1288
  "raw128_simple": {
1289
  "raw": -1.3450960391924882,
 
1307
  "raw_text": "-1.397",
1308
  "status_label": "scored"
1309
  },
 
 
 
 
 
 
 
 
 
 
 
1310
  "qwen3_omni_v6_lora": {
1311
  "raw": null,
1312
  "metric_key": "r2",
 
1389
  "raw": 0.4198864140782312,
1390
  "metric_key": "f1",
1391
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
1392
+ "scope": "multi_episode_128_aligned_baseline",
1393
  "status": "scored",
1394
  "reason": null,
1395
  "normalized_score": 0.4198864140782312,
 
1400
  "raw": 0.8252408266656923,
1401
  "metric_key": "f1",
1402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
1403
+ "scope": "multi_episode_128_aligned_baseline",
1404
  "status": "scored",
1405
  "reason": null,
1406
  "normalized_score": 0.8252408266656923,
 
1497
  "status_label": "scored"
1498
  },
1499
  "metadata128_simple": {
1500
+ "raw": 0.49980060227663614,
1501
  "metric_key": "f1",
1502
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
1503
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1504
+ "status": "scored",
1505
+ "reason": null,
1506
+ "normalized_score": 0.49980060227663614,
1507
+ "raw_text": "0.4998",
1508
+ "status_label": "scored"
1509
+ },
1510
+ "metadata128_neural_mlp": {
1511
+ "raw": 0.7773773780941162,
1512
+ "metric_key": "f1",
1513
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
1514
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
1515
+ "status": "scored",
1516
+ "reason": null,
1517
+ "normalized_score": 0.7773773780941162,
1518
+ "raw_text": "0.7774",
1519
+ "status_label": "scored"
1520
  },
1521
  "raw128_simple": {
1522
  "raw": 0.4958867673901769,
 
1540
  "raw_text": "0.8273",
1541
  "status_label": "scored"
1542
  },
 
 
 
 
 
 
 
 
 
 
 
1543
  "cosmos3_super_reasoner": {
1544
  "raw": null,
1545
  "metric_key": "f1",
 
1611
  "raw": 0.004579592783699693,
1612
  "metric_key": "macro_f1",
1613
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
1614
+ "scope": "multi_episode_128_aligned_baseline",
1615
  "status": "scored",
1616
  "reason": null,
1617
  "normalized_score": 0.004579592783699693,
 
1622
  "raw": 0.0029821307969142615,
1623
  "metric_key": "macro_f1",
1624
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
1625
+ "scope": "multi_episode_128_aligned_baseline",
1626
  "status": "scored",
1627
  "reason": null,
1628
  "normalized_score": 0.0029821307969142615,
 
1722
  "raw": 0.0001206030150753769,
1723
  "metric_key": "macro_f1",
1724
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
1725
+ "scope": "multi_episode_128_aligned_baseline",
1726
  "status": "scored",
1727
  "reason": null,
1728
  "normalized_score": 0.0001206030150753769,
 
1733
  "raw": 2.086049543676662e-05,
1734
  "metric_key": "macro_f1",
1735
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
1736
+ "scope": "multi_episode_128_aligned_baseline",
1737
  "status": "scored",
1738
  "reason": null,
1739
  "normalized_score": 2.086049543676662e-05,
 
1822
  "raw": null,
1823
  "metric_key": "macro_f1",
1824
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
1825
+ "scope": "multi_episode_128_aligned_baseline",
1826
  "status": "unsupported_without_required_target",
1827
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata",
1828
  "normalized_score": null,
 
1855
  "raw": null,
1856
  "metric_key": "macro_f1",
1857
  "source": null,
1858
+ "scope": "multi_episode_128_aligned_baseline",
1859
  "status": "not_supported_by_metadata_only_package",
1860
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
1861
  "normalized_score": null,
1862
  "raw_text": "n/a",
1863
  "status_label": "not supported"
 
1955
  "raw": 0.0,
1956
  "metric_key": "macro_f1",
1957
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
1958
+ "scope": "multi_episode_128_aligned_baseline",
1959
  "status": "scored",
1960
  "reason": null,
1961
  "normalized_score": 0.0,
 
1966
  "raw": 0.0,
1967
  "metric_key": "macro_f1",
1968
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
1969
+ "scope": "multi_episode_128_aligned_baseline",
1970
  "status": "scored",
1971
  "reason": null,
1972
  "normalized_score": 0.0,
 
2055
  "raw": 0.17656983343047333,
2056
  "metric_key": "micro_f1",
2057
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
2058
+ "scope": "multi_episode_128_aligned_baseline",
2059
  "status": "scored",
2060
  "reason": null,
2061
  "normalized_score": 0.17656983343047333,
 
2066
  "raw": 0.17418550827844048,
2067
  "metric_key": "micro_f1",
2068
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
2069
+ "scope": "multi_episode_128_aligned_baseline",
2070
  "status": "scored",
2071
  "reason": null,
2072
  "normalized_score": 0.17418550827844048,
 
2152
  "status_label": "scored"
2153
  },
2154
  "metadata128_simple": {
2155
+ "raw": 0.2294670194387436,
2156
  "metric_key": "mae",
2157
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
2158
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2159
+ "status": "scored",
2160
+ "reason": null,
2161
+ "normalized_score": 0.18324815505876868,
2162
+ "raw_text": "0.2295",
2163
+ "status_label": "scored"
2164
+ },
2165
+ "metadata128_neural_mlp": {
2166
+ "raw": 0.2555866539478302,
2167
+ "metric_key": "mae",
2168
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
2169
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
2170
+ "status": "scored",
2171
+ "reason": null,
2172
+ "normalized_score": 0.16452114110609004,
2173
+ "raw_text": "0.2556",
2174
+ "status_label": "scored"
2175
  },
2176
  "raw128_simple": {
2177
  "raw": 0.22941437363624573,
 
2195
  "raw_text": "0.2530",
2196
  "status_label": "scored"
2197
  },
 
 
 
 
 
 
 
 
 
 
 
2198
  "qwen3_omni_v6_lora": {
2199
  "raw": null,
2200
  "metric_key": "mae",
 
2266
  "raw": null,
2267
  "metric_key": "mrr",
2268
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
2269
+ "scope": "multi_episode_128_aligned_baseline",
2270
  "status": "unsupported_without_required_target",
2271
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package",
2272
  "normalized_score": null,
 
2299
  "raw": null,
2300
  "metric_key": "mrr",
2301
  "source": null,
2302
+ "scope": "multi_episode_128_aligned_baseline",
2303
  "status": "not_supported_by_metadata_only_package",
2304
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required",
2305
  "normalized_score": null,
2306
  "raw_text": "n/a",
2307
  "status_label": "not supported"
 
2388
  "raw": 624.8108520507812,
2389
  "metric_key": "mae",
2390
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
2391
+ "scope": "multi_episode_128_aligned_baseline",
2392
  "status": "scored",
2393
  "reason": null,
2394
  "normalized_score": 0.016864874132806403,
 
2399
  "raw": 41.4664421081543,
2400
  "metric_key": "mae",
2401
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
2402
+ "scope": "multi_episode_128_aligned_baseline",
2403
  "status": "scored",
2404
  "reason": null,
2405
  "normalized_score": 0.25411768748242325,
 
2456
  "model_branch_cards": [
2457
  {
2458
  "id": "metadata128_simple",
2459
+ "title": "128ep Aligned Simple",
2460
  "status": "a100_rerun_pass",
2461
+ "coverage": "20 records / 18 scored aligned axes",
2462
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
2463
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2464
  },
2465
  {
2466
  "id": "metadata128_neural_mlp",
2467
+ "title": "128ep Aligned NN",
2468
  "status": "a100_rerun_pass",
2469
+ "coverage": "20 records / 18 scored aligned axes",
2470
+ "headline": "compact MLP heads over metadata/text and staged block features",
2471
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/summary_report.json"
2472
  },
2473
  {
 
2562
  "task_id": "timeline_action",
2563
  "task_label": "Action Recognition",
2564
  "series_id": "metadata128_simple",
2565
+ "method": "128ep Aligned Simple",
2566
  "status": "scored",
2567
  "status_label": "scored",
2568
  "scored": true,
 
2572
  "normalized_score": 0.008252821966746326,
2573
  "metric_key": "macro_f1",
2574
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_action/metrics.json",
2575
+ "scope": "multi_episode_128_aligned_baseline",
2576
  "reason": null
2577
  },
2578
  {
 
2580
  "task_id": "timeline_action",
2581
  "task_label": "Action Recognition",
2582
  "series_id": "metadata128_neural_mlp",
2583
+ "method": "128ep Aligned NN",
2584
  "status": "scored",
2585
  "status_label": "scored",
2586
  "scored": true,
 
2590
  "normalized_score": 0.004175793689174209,
2591
  "metric_key": "macro_f1",
2592
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_action/metrics.json",
2593
+ "scope": "multi_episode_128_aligned_baseline",
2594
  "reason": null
2595
  },
2596
  {
 
2724
  "task_id": "timeline_subtask",
2725
  "task_label": "Procedure Step Recognition",
2726
  "series_id": "metadata128_simple",
2727
+ "method": "128ep Aligned Simple",
2728
  "status": "scored",
2729
  "status_label": "scored",
2730
  "scored": true,
 
2734
  "normalized_score": 0.00019512195121951218,
2735
  "metric_key": "macro_f1",
2736
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/timeline_subtask/metrics.json",
2737
+ "scope": "multi_episode_128_aligned_baseline",
2738
  "reason": null
2739
  },
2740
  {
 
2742
  "task_id": "timeline_subtask",
2743
  "task_label": "Procedure Step Recognition",
2744
  "series_id": "metadata128_neural_mlp",
2745
+ "method": "128ep Aligned NN",
2746
  "status": "scored",
2747
  "status_label": "scored",
2748
  "scored": true,
 
2752
  "normalized_score": 7.207207207207208e-05,
2753
  "metric_key": "macro_f1",
2754
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/timeline_subtask/metrics.json",
2755
+ "scope": "multi_episode_128_aligned_baseline",
2756
  "reason": null
2757
  },
2758
  {
 
2886
  "task_id": "transition_detection",
2887
  "task_label": "Action Boundary Detection",
2888
  "series_id": "metadata128_simple",
2889
+ "method": "128ep Aligned Simple",
2890
  "status": "scored",
2891
  "status_label": "scored",
2892
  "scored": true,
 
2896
  "normalized_score": 0.29652162550029315,
2897
  "metric_key": "macro_f1",
2898
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/transition_detection/metrics.json",
2899
+ "scope": "multi_episode_128_aligned_baseline",
2900
  "reason": null
2901
  },
2902
  {
 
2904
  "task_id": "transition_detection",
2905
  "task_label": "Action Boundary Detection",
2906
  "series_id": "metadata128_neural_mlp",
2907
+ "method": "128ep Aligned NN",
2908
  "status": "scored",
2909
  "status_label": "scored",
2910
  "scored": true,
 
2914
  "normalized_score": 0.4841733292368365,
2915
  "metric_key": "macro_f1",
2916
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/transition_detection/metrics.json",
2917
+ "scope": "multi_episode_128_aligned_baseline",
2918
  "reason": null
2919
  },
2920
  {
 
3048
  "task_id": "next_action",
3049
  "task_label": "Next-Action Prediction",
3050
  "series_id": "metadata128_simple",
3051
+ "method": "128ep Aligned Simple",
3052
  "status": "scored",
3053
  "status_label": "scored",
3054
  "scored": true,
 
3058
  "normalized_score": 0.006514774539765508,
3059
  "metric_key": "macro_f1",
3060
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_action/metrics.json",
3061
+ "scope": "multi_episode_128_aligned_baseline",
3062
  "reason": null
3063
  },
3064
  {
 
3066
  "task_id": "next_action",
3067
  "task_label": "Next-Action Prediction",
3068
  "series_id": "metadata128_neural_mlp",
3069
+ "method": "128ep Aligned NN",
3070
  "status": "scored",
3071
  "status_label": "scored",
3072
  "scored": true,
 
3076
  "normalized_score": 0.004910507980164745,
3077
  "metric_key": "macro_f1",
3078
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_action/metrics.json",
3079
+ "scope": "multi_episode_128_aligned_baseline",
3080
  "reason": null
3081
  },
3082
  {
 
3210
  "task_id": "hand_trajectory_forecast",
3211
  "task_label": "Hand Trajectory Forecasting",
3212
  "series_id": "metadata128_simple",
3213
+ "method": "128ep Aligned Simple",
3214
+ "status": "scored",
3215
+ "status_label": "scored",
3216
+ "scored": true,
3217
  "proxy_scored": false,
3218
+ "raw": 8.817333221435547,
3219
+ "raw_text": "8.817",
3220
+ "normalized_score": 0.012231610603598841,
3221
  "metric_key": "mpjpe",
3222
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/hand_trajectory_forecast/metrics.json",
3223
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3224
+ "reason": null
3225
  },
3226
  {
3227
  "task_number": 5,
3228
  "task_id": "hand_trajectory_forecast",
3229
  "task_label": "Hand Trajectory Forecasting",
3230
  "series_id": "metadata128_neural_mlp",
3231
+ "method": "128ep Aligned NN",
3232
+ "status": "scored",
3233
+ "status_label": "scored",
3234
+ "scored": true,
3235
  "proxy_scored": false,
3236
+ "raw": 0.429434210062027,
3237
+ "raw_text": "0.4294",
3238
+ "normalized_score": 0.25114484128127007,
3239
  "metric_key": "mpjpe",
3240
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/hand_trajectory_forecast/metrics.json",
3241
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3242
+ "reason": null
3243
  },
3244
  {
3245
  "task_number": 5,
 
3372
  "task_id": "contact_prediction",
3373
  "task_label": "Contact State Prediction",
3374
  "series_id": "metadata128_simple",
3375
+ "method": "128ep Aligned Simple",
3376
  "status": "scored",
3377
  "status_label": "scored",
3378
  "scored": true,
 
3382
  "normalized_score": 0.4381481308057444,
3383
  "metric_key": "macro_f1",
3384
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/contact_prediction/metrics.json",
3385
+ "scope": "multi_episode_128_aligned_baseline",
3386
  "reason": null
3387
  },
3388
  {
 
3390
  "task_id": "contact_prediction",
3391
  "task_label": "Contact State Prediction",
3392
  "series_id": "metadata128_neural_mlp",
3393
+ "method": "128ep Aligned NN",
3394
  "status": "scored",
3395
  "status_label": "scored",
3396
  "scored": true,
 
3400
  "normalized_score": 0.5682695682695682,
3401
  "metric_key": "macro_f1",
3402
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/contact_prediction/metrics.json",
3403
+ "scope": "multi_episode_128_aligned_baseline",
3404
  "reason": null
3405
  },
3406
  {
 
3534
  "task_id": "object_relevance",
3535
  "task_label": "Object Relevance Prediction",
3536
  "series_id": "metadata128_simple",
3537
+ "method": "128ep Aligned Simple",
3538
  "status": "scored",
3539
  "status_label": "scored",
3540
  "scored": true,
 
3544
  "normalized_score": 0.17764578833693304,
3545
  "metric_key": "micro_f1",
3546
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_relevance/metrics.json",
3547
+ "scope": "multi_episode_128_aligned_baseline",
3548
  "reason": null
3549
  },
3550
  {
 
3552
  "task_id": "object_relevance",
3553
  "task_label": "Object Relevance Prediction",
3554
  "series_id": "metadata128_neural_mlp",
3555
+ "method": "128ep Aligned NN",
3556
  "status": "scored",
3557
  "status_label": "scored",
3558
  "scored": true,
 
3562
  "normalized_score": 0.18662723837686876,
3563
  "metric_key": "micro_f1",
3564
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_relevance/metrics.json",
3565
+ "scope": "multi_episode_128_aligned_baseline",
3566
  "reason": null
3567
  },
3568
  {
 
3696
  "task_id": "caption_grounding",
3697
  "task_label": "Language Grounding",
3698
  "series_id": "metadata128_simple",
3699
+ "method": "128ep Aligned Simple",
3700
  "status": "scored",
3701
  "status_label": "scored",
3702
  "scored": true,
 
3706
  "normalized_score": 0.002332374220713973,
3707
  "metric_key": "mrr",
3708
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/caption_grounding/metrics.json",
3709
+ "scope": "multi_episode_128_aligned_baseline",
3710
  "reason": null
3711
  },
3712
  {
 
3714
  "task_id": "caption_grounding",
3715
  "task_label": "Language Grounding",
3716
  "series_id": "metadata128_neural_mlp",
3717
+ "method": "128ep Aligned NN",
3718
  "status": "scored",
3719
  "status_label": "scored",
3720
  "scored": true,
 
3724
  "normalized_score": 0.008236799389123917,
3725
  "metric_key": "mrr",
3726
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/caption_grounding/metrics.json",
3727
+ "scope": "multi_episode_128_aligned_baseline",
3728
  "reason": null
3729
  },
3730
  {
 
3858
  "task_id": "cross_modal_retrieval",
3859
  "task_label": "Cross-Modal Retrieval",
3860
  "series_id": "metadata128_simple",
3861
+ "method": "128ep Aligned Simple",
3862
+ "status": "scored",
3863
+ "status_label": "scored",
3864
+ "scored": true,
3865
  "proxy_scored": false,
3866
+ "raw": 0.002587692579254508,
3867
+ "raw_text": "0.0026",
3868
+ "normalized_score": 0.002587692579254508,
3869
  "metric_key": "mrr",
3870
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/cross_modal_retrieval/metrics.json",
3871
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3872
+ "reason": null
3873
  },
3874
  {
3875
  "task_number": 9,
3876
  "task_id": "cross_modal_retrieval",
3877
  "task_label": "Cross-Modal Retrieval",
3878
  "series_id": "metadata128_neural_mlp",
3879
+ "method": "128ep Aligned NN",
3880
+ "status": "scored",
3881
+ "status_label": "scored",
3882
+ "scored": true,
3883
  "proxy_scored": false,
3884
+ "raw": 0.0026067993603646755,
3885
+ "raw_text": "0.0026",
3886
+ "normalized_score": 0.0026067993603646755,
3887
  "metric_key": "mrr",
3888
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/metrics.json",
3889
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
3890
+ "reason": null
3891
  },
3892
  {
3893
  "task_number": 9,
 
4020
  "task_id": "modality_reconstruction",
4021
  "task_label": "Cross-Modal Reconstruction",
4022
  "series_id": "metadata128_simple",
4023
+ "method": "128ep Aligned Simple",
4024
+ "status": "scored",
4025
+ "status_label": "scored",
4026
+ "scored": true,
4027
  "proxy_scored": false,
4028
+ "raw": -190.66106203944798,
4029
+ "raw_text": "-190.66",
4030
+ "normalized_score": 0.0,
4031
  "metric_key": "r2",
4032
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/modality_reconstruction/metrics.json",
4033
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4034
+ "reason": null
4035
  },
4036
  {
4037
  "task_number": 10,
4038
  "task_id": "modality_reconstruction",
4039
  "task_label": "Cross-Modal Reconstruction",
4040
  "series_id": "metadata128_neural_mlp",
4041
+ "method": "128ep Aligned NN",
4042
+ "status": "scored",
4043
+ "status_label": "scored",
4044
+ "scored": true,
4045
  "proxy_scored": false,
4046
+ "raw": -0.43481132003942147,
4047
+ "raw_text": "-0.4348",
4048
+ "normalized_score": 0.0,
4049
  "metric_key": "r2",
4050
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/modality_reconstruction/metrics.json",
4051
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4052
+ "reason": null
4053
  },
4054
  {
4055
  "task_number": 10,
 
4182
  "task_id": "temporal_order",
4183
  "task_label": "Temporal Order Verification",
4184
  "series_id": "metadata128_simple",
4185
+ "method": "128ep Aligned Simple",
4186
  "status": "scored",
4187
  "status_label": "scored",
4188
  "scored": true,
 
4192
  "normalized_score": 0.4198864140782312,
4193
  "metric_key": "f1",
4194
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/temporal_order/metrics.json",
4195
+ "scope": "multi_episode_128_aligned_baseline",
4196
  "reason": null
4197
  },
4198
  {
 
4200
  "task_id": "temporal_order",
4201
  "task_label": "Temporal Order Verification",
4202
  "series_id": "metadata128_neural_mlp",
4203
+ "method": "128ep Aligned NN",
4204
  "status": "scored",
4205
  "status_label": "scored",
4206
  "scored": true,
 
4210
  "normalized_score": 0.8252408266656923,
4211
  "metric_key": "f1",
4212
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/temporal_order/metrics.json",
4213
+ "scope": "multi_episode_128_aligned_baseline",
4214
  "reason": null
4215
  },
4216
  {
 
4344
  "task_id": "misalignment_detection",
4345
  "task_label": "Multimodal Synchronization Detection",
4346
  "series_id": "metadata128_simple",
4347
+ "method": "128ep Aligned Simple",
4348
+ "status": "scored",
4349
+ "status_label": "scored",
4350
+ "scored": true,
4351
  "proxy_scored": false,
4352
+ "raw": 0.49980060227663614,
4353
+ "raw_text": "0.4998",
4354
+ "normalized_score": 0.49980060227663614,
4355
  "metric_key": "f1",
4356
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/misalignment_detection/metrics.json",
4357
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4358
+ "reason": null
4359
  },
4360
  {
4361
  "task_number": 12,
4362
  "task_id": "misalignment_detection",
4363
  "task_label": "Multimodal Synchronization Detection",
4364
  "series_id": "metadata128_neural_mlp",
4365
+ "method": "128ep Aligned NN",
4366
+ "status": "scored",
4367
+ "status_label": "scored",
4368
+ "scored": true,
4369
  "proxy_scored": false,
4370
+ "raw": 0.7773773780941162,
4371
+ "raw_text": "0.7774",
4372
+ "normalized_score": 0.7773773780941162,
4373
  "metric_key": "f1",
4374
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/misalignment_detection/metrics.json",
4375
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
4376
+ "reason": null
4377
  },
4378
  {
4379
  "task_number": 12,
 
4506
  "task_id": "long_horizon_next_action",
4507
  "task_label": "Long-Horizon Next-Action Forecasting",
4508
  "series_id": "metadata128_simple",
4509
+ "method": "128ep Aligned Simple",
4510
  "status": "scored",
4511
  "status_label": "scored",
4512
  "scored": true,
 
4516
  "normalized_score": 0.004579592783699693,
4517
  "metric_key": "macro_f1",
4518
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/long_horizon_next_action/metrics.json",
4519
+ "scope": "multi_episode_128_aligned_baseline",
4520
  "reason": null
4521
  },
4522
  {
 
4524
  "task_id": "long_horizon_next_action",
4525
  "task_label": "Long-Horizon Next-Action Forecasting",
4526
  "series_id": "metadata128_neural_mlp",
4527
+ "method": "128ep Aligned NN",
4528
  "status": "scored",
4529
  "status_label": "scored",
4530
  "scored": true,
 
4534
  "normalized_score": 0.0029821307969142615,
4535
  "metric_key": "macro_f1",
4536
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/long_horizon_next_action/metrics.json",
4537
+ "scope": "multi_episode_128_aligned_baseline",
4538
  "reason": null
4539
  },
4540
  {
 
4668
  "task_id": "next_subtask_forecast",
4669
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4670
  "series_id": "metadata128_simple",
4671
+ "method": "128ep Aligned Simple",
4672
  "status": "scored",
4673
  "status_label": "scored",
4674
  "scored": true,
 
4678
  "normalized_score": 0.0001206030150753769,
4679
  "metric_key": "macro_f1",
4680
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/next_subtask_forecast/metrics.json",
4681
+ "scope": "multi_episode_128_aligned_baseline",
4682
  "reason": null
4683
  },
4684
  {
 
4686
  "task_id": "next_subtask_forecast",
4687
  "task_label": "Long-Horizon Next-Subtask Forecasting",
4688
  "series_id": "metadata128_neural_mlp",
4689
+ "method": "128ep Aligned NN",
4690
  "status": "scored",
4691
  "status_label": "scored",
4692
  "scored": true,
 
4696
  "normalized_score": 2.086049543676662e-05,
4697
  "metric_key": "macro_f1",
4698
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/next_subtask_forecast/metrics.json",
4699
+ "scope": "multi_episode_128_aligned_baseline",
4700
  "reason": null
4701
  },
4702
  {
 
4830
  "task_id": "interaction_text_prediction",
4831
  "task_label": "Interaction Text Prediction",
4832
  "series_id": "metadata128_simple",
4833
+ "method": "128ep Aligned Simple",
4834
  "status": "unsupported_without_required_target",
4835
  "status_label": "unsupported",
4836
  "scored": false,
 
4840
  "normalized_score": null,
4841
  "metric_key": "macro_f1",
4842
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/interaction_text_prediction/metrics.json",
4843
+ "scope": "multi_episode_128_aligned_baseline",
4844
  "reason": "requires raw annotation.hdf5 caption interaction text; the public 128 JSONL keeps only structured labels and derived metadata"
4845
  },
4846
  {
 
4848
  "task_id": "interaction_text_prediction",
4849
  "task_label": "Interaction Text Prediction",
4850
  "series_id": "metadata128_neural_mlp",
4851
+ "method": "128ep Aligned NN",
4852
  "status": "not_supported_by_metadata_only_package",
4853
  "status_label": "not supported",
4854
  "scored": false,
 
4858
  "normalized_score": null,
4859
  "metric_key": "macro_f1",
4860
  "source": null,
4861
+ "scope": "multi_episode_128_aligned_baseline",
4862
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
4863
  },
4864
  {
4865
  "task_number": 15,
 
4992
  "task_id": "action_object_relation",
4993
  "task_label": "Action-Object Relation Prediction",
4994
  "series_id": "metadata128_simple",
4995
+ "method": "128ep Aligned Simple",
4996
  "status": "scored",
4997
  "status_label": "scored",
4998
  "scored": true,
 
5002
  "normalized_score": 0.0,
5003
  "metric_key": "macro_f1",
5004
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/action_object_relation/metrics.json",
5005
+ "scope": "multi_episode_128_aligned_baseline",
5006
  "reason": null
5007
  },
5008
  {
 
5010
  "task_id": "action_object_relation",
5011
  "task_label": "Action-Object Relation Prediction",
5012
  "series_id": "metadata128_neural_mlp",
5013
+ "method": "128ep Aligned NN",
5014
  "status": "scored",
5015
  "status_label": "scored",
5016
  "scored": true,
 
5020
  "normalized_score": 0.0,
5021
  "metric_key": "macro_f1",
5022
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/action_object_relation/metrics.json",
5023
+ "scope": "multi_episode_128_aligned_baseline",
5024
  "reason": null
5025
  },
5026
  {
 
5154
  "task_id": "object_set_forecast",
5155
  "task_label": "Future Object-Set Forecasting",
5156
  "series_id": "metadata128_simple",
5157
+ "method": "128ep Aligned Simple",
5158
  "status": "scored",
5159
  "status_label": "scored",
5160
  "scored": true,
 
5164
  "normalized_score": 0.17656983343047333,
5165
  "metric_key": "micro_f1",
5166
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/object_set_forecast/metrics.json",
5167
+ "scope": "multi_episode_128_aligned_baseline",
5168
  "reason": null
5169
  },
5170
  {
 
5172
  "task_id": "object_set_forecast",
5173
  "task_label": "Future Object-Set Forecasting",
5174
  "series_id": "metadata128_neural_mlp",
5175
+ "method": "128ep Aligned NN",
5176
  "status": "scored",
5177
  "status_label": "scored",
5178
  "scored": true,
 
5182
  "normalized_score": 0.17418550827844048,
5183
  "metric_key": "micro_f1",
5184
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/object_set_forecast/metrics.json",
5185
+ "scope": "multi_episode_128_aligned_baseline",
5186
  "reason": null
5187
  },
5188
  {
 
5316
  "task_id": "imu_to_hand_pose",
5317
  "task_label": "IMU-to-Hand Pose Reconstruction",
5318
  "series_id": "metadata128_simple",
5319
+ "method": "128ep Aligned Simple",
5320
+ "status": "scored",
5321
+ "status_label": "scored",
5322
+ "scored": true,
5323
  "proxy_scored": false,
5324
+ "raw": 0.2294670194387436,
5325
+ "raw_text": "0.2295",
5326
+ "normalized_score": 0.18324815505876868,
5327
  "metric_key": "mae",
5328
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/imu_to_hand_pose/metrics.json",
5329
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
5330
+ "reason": null
5331
  },
5332
  {
5333
  "task_number": 18,
5334
  "task_id": "imu_to_hand_pose",
5335
  "task_label": "IMU-to-Hand Pose Reconstruction",
5336
  "series_id": "metadata128_neural_mlp",
5337
+ "method": "128ep Aligned NN",
5338
+ "status": "scored",
5339
+ "status_label": "scored",
5340
+ "scored": true,
5341
  "proxy_scored": false,
5342
+ "raw": 0.2555866539478302,
5343
+ "raw_text": "0.2556",
5344
+ "normalized_score": 0.16452114110609004,
5345
  "metric_key": "mae",
5346
+ "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/metrics.json",
5347
+ "scope": "multi_episode_128_aligned_sensor_block_baseline",
5348
+ "reason": null
5349
  },
5350
  {
5351
  "task_number": 18,
 
5478
  "task_id": "camera_view_sync_retrieval",
5479
  "task_label": "Camera-View Synchronization Retrieval",
5480
  "series_id": "metadata128_simple",
5481
+ "method": "128ep Aligned Simple",
5482
  "status": "unsupported_without_required_target",
5483
  "status_label": "unsupported",
5484
  "scored": false,
 
5488
  "normalized_score": null,
5489
  "metric_key": "mrr",
5490
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/camera_view_sync_retrieval/metrics.json",
5491
+ "scope": "multi_episode_128_aligned_baseline",
5492
  "reason": "requires paired camera-view feature blocks, which are not in the public 128 JSONL metadata package"
5493
  },
5494
  {
 
5496
  "task_id": "camera_view_sync_retrieval",
5497
  "task_label": "Camera-View Synchronization Retrieval",
5498
  "series_id": "metadata128_neural_mlp",
5499
+ "method": "128ep Aligned NN",
5500
  "status": "not_supported_by_metadata_only_package",
5501
  "status_label": "not supported",
5502
  "scored": false,
 
5506
  "normalized_score": null,
5507
  "metric_key": "mrr",
5508
  "source": null,
5509
+ "scope": "multi_episode_128_aligned_baseline",
5510
+ "reason": "the 128-episode aligned rerun did not produce this task target; raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
5511
  },
5512
  {
5513
  "task_number": 19,
 
5640
  "task_id": "time_to_transition",
5641
  "task_label": "Time-to-Next-Transition Regression",
5642
  "series_id": "metadata128_simple",
5643
+ "method": "128ep Aligned Simple",
5644
  "status": "scored",
5645
  "status_label": "scored",
5646
  "scored": true,
 
5650
  "normalized_score": 0.016864874132806403,
5651
  "metric_key": "mae",
5652
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/time_to_transition/metrics.json",
5653
+ "scope": "multi_episode_128_aligned_baseline",
5654
  "reason": null
5655
  },
5656
  {
 
5658
  "task_id": "time_to_transition",
5659
  "task_label": "Time-to-Next-Transition Regression",
5660
  "series_id": "metadata128_neural_mlp",
5661
+ "method": "128ep Aligned NN",
5662
  "status": "scored",
5663
  "status_label": "scored",
5664
  "scored": true,
 
5668
  "normalized_score": 0.25411768748242325,
5669
  "metric_key": "mae",
5670
  "source": "results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/time_to_transition/metrics.json",
5671
+ "scope": "multi_episode_128_aligned_baseline",
5672
  "reason": null
5673
  },
5674
  {
metrics/website_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T12:09:46+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
- "bytes": 116110,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -316,7 +316,7 @@
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
- "bytes": 186443,
320
  "top_level_type": "dict"
321
  },
322
  {
@@ -351,7 +351,7 @@
351
  },
352
  {
353
  "path": "data/mirror_parity.json",
354
- "bytes": 994053,
355
  "top_level_type": "dict"
356
  },
357
  {
@@ -471,7 +471,7 @@
471
  },
472
  {
473
  "path": "data/single_episode_task_model_radar.json",
474
- "bytes": 50973,
475
  "top_level_type": "dict"
476
  },
477
  {
@@ -486,12 +486,12 @@
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
- "bytes": 46902,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
- "bytes": 129242,
495
  "top_level_type": "dict"
496
  },
497
  {
@@ -526,7 +526,7 @@
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
- "bytes": 230297,
530
  "top_level_type": "dict"
531
  },
532
  {
@@ -571,7 +571,7 @@
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
- "bytes": 45937,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
@@ -641,7 +641,7 @@
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
- "bytes": 51953,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
@@ -752,7 +752,7 @@
752
  {
753
  "path": "assets/task_suite_infographic.png",
754
  "exists": true,
755
- "bytes": 2627286,
756
  "width": 1800,
757
  "height": 6600,
758
  "format": "PNG"
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T12:54:19+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
 
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
+ "bytes": 116111,
305
  "top_level_type": "dict"
306
  },
307
  {
 
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
+ "bytes": 185447,
320
  "top_level_type": "dict"
321
  },
322
  {
 
351
  },
352
  {
353
  "path": "data/mirror_parity.json",
354
+ "bytes": 1059014,
355
  "top_level_type": "dict"
356
  },
357
  {
 
471
  },
472
  {
473
  "path": "data/single_episode_task_model_radar.json",
474
+ "bytes": 51064,
475
  "top_level_type": "dict"
476
  },
477
  {
 
486
  },
487
  {
488
  "path": "data/task_method_20_gap_audit.json",
489
+ "bytes": 35883,
490
  "top_level_type": "dict"
491
  },
492
  {
493
  "path": "data/task_method_20_result_matrix.json",
494
+ "bytes": 128794,
495
  "top_level_type": "dict"
496
  },
497
  {
 
526
  },
527
  {
528
  "path": "data/unified_task_model_radar.json",
529
+ "bytes": 229299,
530
  "top_level_type": "dict"
531
  },
532
  {
 
571
  {
572
  "path": "assets/charts/episode128_task_model_radar.svg",
573
  "exists": true,
574
+ "bytes": 47540,
575
  "format": "SVG",
576
  "has_viewbox": true
577
  },
 
641
  {
642
  "path": "assets/charts/unified_task_model_radar.svg",
643
  "exists": true,
644
+ "bytes": 53553,
645
  "format": "SVG",
646
  "has_viewbox": true
647
  },
 
752
  {
753
  "path": "assets/task_suite_infographic.png",
754
  "exists": true,
755
+ "bytes": 1591194,
756
  "width": 1800,
757
  "height": 6600,
758
  "format": "PNG"
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/cross_modal_retrieval/ranks.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/omni_finetune/a100_128_metadata_task_baselines_20260616_v2/neural_mlp/imu_to_hand_pose/predictions.csv ADDED
The diff for this file is too large to render. See raw diff
 
scripts/build_unified_task_model_radar.py CHANGED
@@ -114,19 +114,19 @@ SERIES = {
114
  "stroke_dasharray": None,
115
  },
116
  "metadata128_simple": {
117
- "label": "128ep Metadata Simple",
118
  "short_label": "128-S",
119
  "color": "#ffd166",
120
- "kind": "partial_128_episode_metadata_baseline",
121
- "scope": "128 selected episodes, JSONL metadata/text only",
122
  "stroke_dasharray": "9 6",
123
  },
124
  "metadata128_neural_mlp": {
125
- "label": "128ep Metadata NN",
126
  "short_label": "128-NN",
127
  "color": "#f472b6",
128
- "kind": "partial_128_episode_metadata_baseline",
129
- "scope": "128 selected episodes, JSONL metadata/text only",
130
  "stroke_dasharray": "3 6",
131
  },
132
  "raw128_simple": {
@@ -254,8 +254,8 @@ SHORT_TASK_LABELS = {
254
  METHOD_DETAILS = {
255
  "minimal": "Single-episode simple heads over the public sample split.",
256
  "neural_mlp": "Single-episode compact PyTorch MLP heads on the same 20 task contracts.",
257
- "metadata128_simple": "128-episode JSONL metadata/text simple baselines.",
258
- "metadata128_neural_mlp": "128-episode JSONL metadata/text MLP baselines.",
259
  "raw128_simple": "128-episode 4430-dim sensor NPZ simple heads; tasks 15/19 use compact proxies.",
260
  "raw128_neural_mlp": "128-episode 4430-dim sensor NPZ MLP heads; tasks 15/19 use compact proxies.",
261
  "qwen3_omni_v6_lora": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
@@ -322,12 +322,12 @@ def read_a100_metadata_record(task_id: str, *, neural: bool = False) -> dict[str
322
  "raw": score,
323
  "metric_key": payload.get("primary_metric"),
324
  "source": str(path.relative_to(ROOT)),
325
- "scope": "multi_episode_128_metadata_baseline",
326
  "status": "scored" if status == "pass" and score is not None else "unsupported_without_required_target",
327
  "reason": payload.get("reason")
328
  or payload.get("error")
329
  or (
330
- "metadata-only package has a metrics artifact for this task, but it does not contain a numeric public score"
331
  if status != "pass"
332
  else None
333
  ),
@@ -398,10 +398,10 @@ def make_missing_record(series_id: str, task_id: str, metric_key: str | None) ->
398
  if series_id.startswith("metadata128"):
399
  status = "not_supported_by_metadata_only_package"
400
  reason = (
401
- "the 128-episode metadata/text rerun did not produce this task target; "
402
- "raw sensor blocks or a task-specific metadata target builder are required"
403
  )
404
- scope = "multi_episode_128_metadata_baseline"
405
  elif series_id in {"qwen3_omni_v6_lora", "cosmos3_super_reasoner", "cosmos3_nano_future_window"}:
406
  status = "not_evaluated_in_verified_package"
407
  reason = (
@@ -745,7 +745,7 @@ def build_payload() -> dict[str, Any]:
745
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
746
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
747
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
748
- "metadata_128_overlay": "128-episode metadata baselines have 20 records, but numeric scores only where the public JSONL contains enough task labels without raw feature blocks.",
749
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export.",
750
  },
751
  "series": series_records,
@@ -753,18 +753,18 @@ def build_payload() -> dict[str, Any]:
753
  "model_branch_cards": [
754
  {
755
  "id": "metadata128_simple",
756
- "title": "128ep Metadata Simple",
757
  "status": "a100_rerun_pass",
758
- "coverage": f"20 records / {next(item for item in series_records if item['id'] == 'metadata128_simple')['scored_task_count']} scored JSONL-supported axes",
759
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
760
  "source": str((METADATA128_BASELINE_DIR / "summary_report.json").relative_to(ROOT)),
761
  },
762
  {
763
  "id": "metadata128_neural_mlp",
764
- "title": "128ep Metadata NN",
765
  "status": "a100_rerun_pass",
766
- "coverage": f"20 records / {next(item for item in series_records if item['id'] == 'metadata128_neural_mlp')['scored_task_count']} scored JSONL-supported axes",
767
- "headline": "compact MLP heads over metadata/text features",
768
  "source": str((METADATA128_BASELINE_DIR / "summary_report.json").relative_to(ROOT)),
769
  },
770
  {
 
114
  "stroke_dasharray": None,
115
  },
116
  "metadata128_simple": {
117
+ "label": "128ep Aligned Simple",
118
  "short_label": "128-S",
119
  "color": "#ffd166",
120
+ "kind": "partial_128_episode_aligned_baseline",
121
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
122
  "stroke_dasharray": "9 6",
123
  },
124
  "metadata128_neural_mlp": {
125
+ "label": "128ep Aligned NN",
126
  "short_label": "128-NN",
127
  "color": "#f472b6",
128
+ "kind": "partial_128_episode_aligned_baseline",
129
+ "scope": "128 selected episodes, JSONL metadata/text plus staged sensor-block targets where available",
130
  "stroke_dasharray": "3 6",
131
  },
132
  "raw128_simple": {
 
254
  METHOD_DETAILS = {
255
  "minimal": "Single-episode simple heads over the public sample split.",
256
  "neural_mlp": "Single-episode compact PyTorch MLP heads on the same 20 task contracts.",
257
+ "metadata128_simple": "128-episode aligned simple baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
258
+ "metadata128_neural_mlp": "128-episode aligned MLP baselines: JSONL metadata/text tasks plus staged sensor-block tasks where the processed target exists.",
259
  "raw128_simple": "128-episode 4430-dim sensor NPZ simple heads; tasks 15/19 use compact proxies.",
260
  "raw128_neural_mlp": "128-episode 4430-dim sensor NPZ MLP heads; tasks 15/19 use compact proxies.",
261
  "qwen3_omni_v6_lora": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
 
322
  "raw": score,
323
  "metric_key": payload.get("primary_metric"),
324
  "source": str(path.relative_to(ROOT)),
325
+ "scope": payload.get("scope") or "multi_episode_128_aligned_baseline",
326
  "status": "scored" if status == "pass" and score is not None else "unsupported_without_required_target",
327
  "reason": payload.get("reason")
328
  or payload.get("error")
329
  or (
330
+ "the 128-episode aligned artifact for this task does not contain a numeric public score"
331
  if status != "pass"
332
  else None
333
  ),
 
398
  if series_id.startswith("metadata128"):
399
  status = "not_supported_by_metadata_only_package"
400
  reason = (
401
+ "the 128-episode aligned rerun did not produce this task target; "
402
+ "raw interaction text, paired camera-view embeddings, or a task-specific target builder is required"
403
  )
404
+ scope = "multi_episode_128_aligned_baseline"
405
  elif series_id in {"qwen3_omni_v6_lora", "cosmos3_super_reasoner", "cosmos3_nano_future_window"}:
406
  status = "not_evaluated_in_verified_package"
407
  reason = (
 
745
  "raw_values": "raw metric values, metric keys, and sources are retained in this JSON; the SVG is an overview, not a replacement for the metric table",
746
  "result_record_policy": "every method has 20 task records; records without a numeric score carry explicit unsupported/not-evaluated status and reason fields",
747
  "foundation_model_overlay": "Qwen3/Cosmos points are plotted only on task-aligned axes. Scoreless records mean the public result does not evaluate that task contract.",
748
+ "metadata_128_overlay": "128-episode aligned baselines have 20 records. Numeric scores come from JSONL metadata/text tasks plus staged sensor-block targets when the processed target exists; raw interaction text and paired camera-view embeddings remain explicit gaps.",
749
  "raw_128_overlay": "128-episode raw-feature baselines use staged sensor NPZ features. Eighteen axes use direct task targets; interaction text and camera-view sync are completed with documented compact proxies because raw interaction strings and paired video-view embeddings are absent from the 128 export.",
750
  },
751
  "series": series_records,
 
753
  "model_branch_cards": [
754
  {
755
  "id": "metadata128_simple",
756
+ "title": "128ep Aligned Simple",
757
  "status": "a100_rerun_pass",
758
+ "coverage": f"20 records / {next(item for item in series_records if item['id'] == 'metadata128_simple')['scored_task_count']} scored aligned axes",
759
  "headline": "34,269 rows; train/val/test 25,629/4,608/4,032",
760
  "source": str((METADATA128_BASELINE_DIR / "summary_report.json").relative_to(ROOT)),
761
  },
762
  {
763
  "id": "metadata128_neural_mlp",
764
+ "title": "128ep Aligned NN",
765
  "status": "a100_rerun_pass",
766
+ "coverage": f"20 records / {next(item for item in series_records if item['id'] == 'metadata128_neural_mlp')['scored_task_count']} scored aligned axes",
767
+ "headline": "compact MLP heads over metadata/text and staged block features",
768
  "source": str((METADATA128_BASELINE_DIR / "summary_report.json").relative_to(ROOT)),
769
  },
770
  {
scripts/omni/run_128_task_baselines.py CHANGED
@@ -1463,12 +1463,28 @@ def unsupported_record(task_id: str, out_root: Path, reason: str, primary_metric
1463
 
1464
 
1465
  def build_markdown(summary: dict[str, Any]) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1466
  lines = [
1467
  "# 128-Episode Aligned Baselines",
1468
  "",
1469
  "These results align the earlier simple and neural baseline framing to the same selected 128-episode split used by the Qwen3-Omni pilot.",
1470
  "",
1471
- "The runner uses the derived Qwen JSONL export and public-safe metadata. It does not use raw Xperience-10M videos, HDF5 files, sensor NPZ blocks, Qwen weights, or LoRA weights.",
1472
  "",
1473
  "## Split",
1474
  "",
@@ -1502,9 +1518,9 @@ def build_markdown(summary: dict[str, Any]) -> str:
1502
  "",
1503
  "## Interpretation",
1504
  "",
1505
- "The trainable scores are metadata/text baselines, not replacements for full raw-modality baselines. They are useful for checking split alignment, label difficulty, train/test label coverage, and whether the Qwen diagnostic run is being compared against the same 96/16/16 episode setup.",
1506
  "",
1507
- "Tasks marked `unsupported_without_raw_128_feature_blocks` still need the 128-run sensor feature NPZ blocks to reproduce the single-episode feature-level target exactly.",
1508
  ]
1509
  )
1510
  return "\n".join(lines) + "\n"
 
1463
 
1464
 
1465
  def build_markdown(summary: dict[str, Any]) -> str:
1466
+ sensor_completion = bool((summary.get("feature_contract") or {}).get("sensor_block_completion"))
1467
+ source_sentence = (
1468
+ "The aligned runner uses the derived Qwen JSONL export for metadata/text tasks and staged processed sensor NPZ blocks only for the explicitly listed block-completion tasks. It still does not use raw Xperience-10M videos, raw annotation HDF5 files, Qwen weights, or LoRA weights."
1469
+ if sensor_completion
1470
+ else "The runner uses the derived Qwen JSONL export and public-safe metadata. It does not use raw Xperience-10M videos, HDF5 files, sensor NPZ blocks, Qwen weights, or LoRA weights."
1471
+ )
1472
+ unsupported_sentence = (
1473
+ "Tasks still marked unsupported require raw annotation interaction text or paired camera-view embeddings that are absent from the staged 128 export."
1474
+ if sensor_completion
1475
+ else "Tasks marked `unsupported_without_raw_128_feature_blocks` still need the 128-run sensor feature NPZ blocks to reproduce the single-episode feature-level target exactly."
1476
+ )
1477
+ interpretation_sentence = (
1478
+ "The trainable scores combine JSONL metadata/text tasks with staged sensor-block completion tasks. They are useful for checking split alignment, label difficulty, train/test target coverage, and whether the Qwen diagnostic run is being compared against the same 96/16/16 episode setup."
1479
+ if sensor_completion
1480
+ else "The trainable scores are metadata/text baselines, not replacements for full raw-modality baselines. They are useful for checking split alignment, label difficulty, train/test label coverage, and whether the Qwen diagnostic run is being compared against the same 96/16/16 episode setup."
1481
+ )
1482
  lines = [
1483
  "# 128-Episode Aligned Baselines",
1484
  "",
1485
  "These results align the earlier simple and neural baseline framing to the same selected 128-episode split used by the Qwen3-Omni pilot.",
1486
  "",
1487
+ source_sentence,
1488
  "",
1489
  "## Split",
1490
  "",
 
1518
  "",
1519
  "## Interpretation",
1520
  "",
1521
+ interpretation_sentence,
1522
  "",
1523
+ unsupported_sentence,
1524
  ]
1525
  )
1526
  return "\n".join(lines) + "\n"