cy0307 commited on
Commit
146ae33
·
verified ·
1 Parent(s): 965d0da

Add files using upload-large-folder tool

Browse files
Files changed (43) hide show
  1. ARTIFACT_GUIDE.md +3 -3
  2. EVALUATION_PROTOCOL.md +22 -22
  3. FIGURE_INDEX.md +2 -2
  4. PROJECT_README.md +17 -25
  5. PROJECT_STATUS.md +2 -2
  6. README.md +17 -25
  7. RESEARCH_TAKEAWAYS.md +1 -1
  8. TASK_METHOD_20_GAP_AUDIT.md +1 -1
  9. TASK_SUITE_20.md +22 -22
  10. data/artifact_index.json +64 -64
  11. data/evaluation_protocol.json +23 -23
  12. data/live_publication_status.json +0 -0
  13. data/mirror_parity.json +0 -0
  14. data/omni_model_comparison.json +2 -2
  15. data/project_manifest.json +3 -4
  16. data/project_packet.json +3 -4
  17. data/project_status.json +5 -6
  18. data/publication_audit.json +1 -1
  19. data/quality_gates.json +1 -1
  20. data/reproducibility_matrix.json +4 -4
  21. data/research_takeaways.json +2 -2
  22. data/scope_claims_audit.json +1 -1
  23. data/single_episode_task_model_radar.json +21 -21
  24. data/source_alignment_audit.json +1 -1
  25. data/task_method_20_gap_audit.json +1 -1
  26. data/task_method_20_result_matrix.json +1 -1
  27. data/task_suite_20.json +46 -46
  28. data/task_surface_integrity.json +1 -1
  29. data/tier2_task_suite.json +24 -25
  30. data/unified_task_model_radar.json +21 -21
  31. data/website_integrity.json +24 -31
  32. index.html +12 -70
  33. metrics/episode128_task_model_radar.json +21 -21
  34. metrics/figure_index.json +7 -7
  35. metrics/live_publication_status.json +0 -0
  36. metrics/omni_model_comparison.json +2 -2
  37. metrics/project_brief.json +1 -1
  38. metrics/project_packet.json +3 -4
  39. metrics/public_surface_qa.json +7 -7
  40. metrics/reproducibility_matrix.json +4 -4
  41. metrics/research_takeaways.json +2 -2
  42. metrics/task_method_20_gap_audit.json +1 -1
  43. metrics/task_surface_integrity.json +1 -1
ARTIFACT_GUIDE.md CHANGED
@@ -20,7 +20,7 @@ Xperience-native pretraining goal.
20
  | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Describes the future full-corpus Xperience Embodied Foundation Model goal, including modules, objectives, staged scale-up, hardware ranges, and evaluation. |
21
  | [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) | Defines the task unit, chronological split, metrics, leakage controls, and current limitations. |
22
  | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
23
- | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md) | Shows measured current-audio and raw log-mel replacement deltas across the original task contracts. |
24
  | [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) | Gives a static window-level explorer for the public sample episode. |
25
  | [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md) | Optional detail for readers who need official dataset and access-term context. |
26
 
@@ -74,13 +74,13 @@ Xperience-native pretraining goal.
74
  | --- | --- |
75
  | [`TASK_SUITE_20.md`](TASK_SUITE_20.md) | Reader-facing table for the unified 20-task suite. |
76
  | [`docs/data/task_suite_20.json`](docs/data/task_suite_20.json) | Machine-readable unified 20-task suite for the website and Hugging Face mirrors. |
77
- | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) | The original task contracts, chronological split, and minimal/neural metrics. |
78
  | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) | Matching PyTorch MLP heads for the same task contracts and feature windows. |
79
  | [`results/episode_task_suite/research_directions/`](results/episode_task_suite/research_directions/) | Mapping from the unified 20-task suite to the four Ropedia research directions. |
80
  | [`results/episode_task_suite/research_direction_extensions/`](results/episode_task_suite/research_direction_extensions/) | Four additional coded probes, one per research direction. |
81
  | [`results/episode_task_suite/tier2_task_suite/`](results/episode_task_suite/tier2_task_suite/) | Historical provenance path inside the unified 20-task suite. |
82
  | [`results/episode_task_suite/task_walkthroughs/`](results/episode_task_suite/task_walkthroughs/) | Human-readable research names and case studies explaining input, process modules, output, metric, limitation, and the website task-player data. |
83
- | [`results/audio_ablation/audio_ablation_metrics.csv`](results/audio_ablation/audio_ablation_metrics.csv) | All measured audio rows for the original task contracts across six variants, including no-audio, audio-only, alternate-audio-only, representation replacement, and all-input variants. |
84
  | [`results/audio_ablation/audio_delta_summary.csv`](results/audio_ablation/audio_delta_summary.csv) | Compact per-task audio delta table for quick manual inspection. |
85
  | [`scripts/audio_ablation_and_raw_upgrade.py`](scripts/audio_ablation_and_raw_upgrade.py) | Regenerates audio contribution results from real task-suite artifacts plus the local public-sample MP4. |
86
  | [`scripts/validate_task_surface.py`](scripts/validate_task_surface.py) | Fails publication if public task cards drift back to raw artifact ids or lose their thumbnail/player wiring. |
 
20
  | [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Describes the future full-corpus Xperience Embodied Foundation Model goal, including modules, objectives, staged scale-up, hardware ranges, and evaluation. |
21
  | [`EVALUATION_PROTOCOL.md`](EVALUATION_PROTOCOL.md) | Defines the task unit, chronological split, metrics, leakage controls, and current limitations. |
22
  | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. |
23
+ | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md) | Shows measured current-audio and raw log-mel replacement deltas across the walkthrough-backed task contracts. |
24
  | [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) | Gives a static window-level explorer for the public sample episode. |
25
  | [`XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`](XPERIENCE10M_DATASET_CARD_ALIGNMENT.md) | Optional detail for readers who need official dataset and access-term context. |
26
 
 
74
  | --- | --- |
75
  | [`TASK_SUITE_20.md`](TASK_SUITE_20.md) | Reader-facing table for the unified 20-task suite. |
76
  | [`docs/data/task_suite_20.json`](docs/data/task_suite_20.json) | Machine-readable unified 20-task suite for the website and Hugging Face mirrors. |
77
+ | [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) | The walkthrough-backed task contracts, chronological split, and minimal/neural metrics. |
78
  | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/) | Matching PyTorch MLP heads for the same task contracts and feature windows. |
79
  | [`results/episode_task_suite/research_directions/`](results/episode_task_suite/research_directions/) | Mapping from the unified 20-task suite to the four Ropedia research directions. |
80
  | [`results/episode_task_suite/research_direction_extensions/`](results/episode_task_suite/research_direction_extensions/) | Four additional coded probes, one per research direction. |
81
  | [`results/episode_task_suite/tier2_task_suite/`](results/episode_task_suite/tier2_task_suite/) | Historical provenance path inside the unified 20-task suite. |
82
  | [`results/episode_task_suite/task_walkthroughs/`](results/episode_task_suite/task_walkthroughs/) | Human-readable research names and case studies explaining input, process modules, output, metric, limitation, and the website task-player data. |
83
+ | [`results/audio_ablation/audio_ablation_metrics.csv`](results/audio_ablation/audio_ablation_metrics.csv) | All measured audio rows for the walkthrough-backed task contracts across six variants, including no-audio, audio-only, alternate-audio-only, representation replacement, and all-input variants. |
84
  | [`results/audio_ablation/audio_delta_summary.csv`](results/audio_ablation/audio_delta_summary.csv) | Compact per-task audio delta table for quick manual inspection. |
85
  | [`scripts/audio_ablation_and_raw_upgrade.py`](scripts/audio_ablation_and_raw_upgrade.py) | Regenerates audio contribution results from real task-suite artifacts plus the local public-sample MP4. |
86
  | [`scripts/validate_task_surface.py`](scripts/validate_task_surface.py) | Fails publication if public task cards drift back to raw artifact ids or lose their thumbnail/player wiring. |
EVALUATION_PROTOCOL.md CHANGED
@@ -50,28 +50,28 @@ All 20 public-sample task contracts are presented together under the same
50
  minimal/neural baseline setup. Historical `tier2_task_suite` paths are
51
  retained only as stable provenance artifact locations inside the unified suite.
52
 
53
- | # | Task | Artifact id | Origin | Family | Unit | Input -> target | Primary metric | Minimal | Neural |
54
- | ---: | --- | --- | --- | --- | --- | --- | --- | ---: | ---: |
55
- | 1 | Action Recognition | `timeline_action` | original | supervised classification | single window | current 20-frame all-feature window -> current action label | macro_f1 (higher better) | 0.0500 | 0.0148 |
56
- | 2 | Procedure Step Recognition | `timeline_subtask` | original | supervised classification | single window | current 20-frame all-feature window -> current subtask label | macro_f1 (higher better) | 0.0506 | 0.0281 |
57
- | 3 | Action Boundary Detection | `transition_detection` | original | temporal diagnostic | single window | current 20-frame all-feature window -> action boundary versus steady | macro_f1 (higher better) | 0.6118 | 0.5862 |
58
- | 4 | Next-Action Prediction | `next_action` | original | short-horizon prediction | single window | current 20-frame all-feature window at time t -> action label at t + 20 frames | macro_f1 (higher better) | 0.0593 | 0.0419 |
59
- | 5 | Hand Trajectory Forecasting | `hand_trajectory_forecast` | original | trajectory regression | single window | current all-feature window -> future left/right hand 3D joints for 10 frames | mpjpe (lower better) | 0.8647 | 0.1079 |
60
- | 6 | Contact State Prediction | `contact_prediction` | original | binary classification | single window | non-contact and non-caption feature blocks -> any body contact | macro_f1 (higher better) | 1.0000 | 1.0000 |
61
- | 7 | Object Relevance Prediction | `object_relevance` | original | multi-label classification | single window | non-caption feature blocks -> current relevant object set | micro_f1 (higher better) | 0.1803 | 0.1679 |
62
- | 8 | Language Grounding | `caption_grounding` | original | retrieval | caption query | caption object/interaction query plus candidate sensor windows -> matching time window | mrr (higher better) | 0.0160 | 0.0168 |
63
- | 9 | Cross-Modal Retrieval | `cross_modal_retrieval` | original | retrieval | sensor query | motion, IMU, and camera query features -> matching depth/video window | top5_accuracy (higher better) | 0.3678 | 0.1983 |
64
- | 10 | Cross-Modal Reconstruction | `modality_reconstruction` | original | cross-modal regression | single window | motion, IMU, and camera features -> depth/video feature vector | r2 (higher better) | -0.0153 | -0.0102 |
65
- | 11 | Temporal Order Verification | `temporal_order` | original | pairwise diagnostic | adjacent window pair | two adjacent windows -> correct versus reversed order | f1 (higher better) | 0.5400 | 0.8520 |
66
- | 12 | Multimodal Synchronization Detection | `misalignment_detection` | original | pairwise diagnostic | paired modality window | motion side plus visual/depth side -> aligned versus shifted by 8 windows | f1 (higher better) | 0.5052 | 0.7153 |
67
- | 13 | Long-Horizon Next-Action Forecasting | `long_horizon_next_action` | additional | classification | single aligned window | Current 20-frame non-caption multimodal window. -> Action label five seconds later. | macro_f1 (higher better) | 0.0750 | 0.0655 |
68
- | 14 | Long-Horizon Next-Subtask Forecasting | `next_subtask_forecast` | additional | classification | single aligned window | Current 20-frame non-caption multimodal window. -> Procedure subtask label five seconds later. | macro_f1 (higher better) | 0.0455 | 0.0507 |
69
- | 15 | Interaction Text Prediction | `interaction_text_prediction` | additional | classification | single aligned window | Current 20-frame sensor window with caption-text features removed. -> Raw annotation interaction phrase for the same window. | macro_f1 (higher better) | 0.0444 | 0.0381 |
70
- | 16 | Action-Object Relation Prediction | `action_object_relation` | additional | classification | single aligned window | Current 20-frame sensor window with caption-text features removed. -> Joint action plus active object-set relation. | macro_f1 (higher better) | 0.0000 | 0.0000 |
71
- | 17 | Future Object-Set Forecasting | `object_set_forecast` | additional | multi_label | single aligned window | Current 20-frame sensor window with caption-text features removed. -> Object set active five seconds later. | micro_f1 (higher better) | 0.1694 | 0.1972 |
72
- | 18 | IMU-to-Hand Pose Reconstruction | `imu_to_hand_pose` | additional | regression | single aligned window | Current IMU acceleration/gyroscope feature block only. -> Current left/right hand joint feature blocks. | mae (lower better) | 0.0420 | 0.0426 |
73
- | 19 | Camera-View Synchronization Retrieval | `camera_view_sync_retrieval` | additional | retrieval | held-out query window | Fisheye camera-1 feature query projected into fisheye camera-3 feature space. -> The synchronized held-out camera-3 window. | mrr (higher better) | 0.4943 | 0.2409 |
74
- | 20 | Time-to-Next-Transition Regression | `time_to_transition` | additional | regression | single aligned window | Current 20-frame non-caption multimodal window. -> Frames until the next action-label boundary, capped at 200 frames. | mae (lower better) | 10.5374 | 10.5545 |
75
 
76
  ## Leakage Controls
77
 
 
50
  minimal/neural baseline setup. Historical `tier2_task_suite` paths are
51
  retained only as stable provenance artifact locations inside the unified suite.
52
 
53
+ | # | Task | Artifact id | Family | Unit | Input -> target | Primary metric | Minimal | Neural |
54
+ | ---: | --- | --- | --- | --- | --- | --- | ---: | ---: |
55
+ | 1 | Action Recognition | `timeline_action` | supervised classification | single window | current 20-frame all-feature window -> current action label | macro_f1 (higher better) | 0.0500 | 0.0148 |
56
+ | 2 | Procedure Step Recognition | `timeline_subtask` | supervised classification | single window | current 20-frame all-feature window -> current subtask label | macro_f1 (higher better) | 0.0506 | 0.0281 |
57
+ | 3 | Action Boundary Detection | `transition_detection` | temporal diagnostic | single window | current 20-frame all-feature window -> action boundary versus steady | macro_f1 (higher better) | 0.6118 | 0.5862 |
58
+ | 4 | Next-Action Prediction | `next_action` | short-horizon prediction | single window | current 20-frame all-feature window at time t -> action label at t + 20 frames | macro_f1 (higher better) | 0.0593 | 0.0419 |
59
+ | 5 | Hand Trajectory Forecasting | `hand_trajectory_forecast` | trajectory regression | single window | current all-feature window -> future left/right hand 3D joints for 10 frames | mpjpe (lower better) | 0.8647 | 0.1079 |
60
+ | 6 | Contact State Prediction | `contact_prediction` | binary classification | single window | non-contact and non-caption feature blocks -> any body contact | macro_f1 (higher better) | 1.0000 | 1.0000 |
61
+ | 7 | Object Relevance Prediction | `object_relevance` | multi-label classification | single window | non-caption feature blocks -> current relevant object set | micro_f1 (higher better) | 0.1803 | 0.1679 |
62
+ | 8 | Language Grounding | `caption_grounding` | retrieval | caption query | caption object/interaction query plus candidate sensor windows -> matching time window | mrr (higher better) | 0.0160 | 0.0168 |
63
+ | 9 | Cross-Modal Retrieval | `cross_modal_retrieval` | retrieval | sensor query | motion, IMU, and camera query features -> matching depth/video window | top5_accuracy (higher better) | 0.3678 | 0.1983 |
64
+ | 10 | Cross-Modal Reconstruction | `modality_reconstruction` | cross-modal regression | single window | motion, IMU, and camera features -> depth/video feature vector | r2 (higher better) | -0.0153 | -0.0102 |
65
+ | 11 | Temporal Order Verification | `temporal_order` | pairwise diagnostic | adjacent window pair | two adjacent windows -> correct versus reversed order | f1 (higher better) | 0.5400 | 0.8520 |
66
+ | 12 | Multimodal Synchronization Detection | `misalignment_detection` | pairwise diagnostic | paired modality window | motion side plus visual/depth side -> aligned versus shifted by 8 windows | f1 (higher better) | 0.5052 | 0.7153 |
67
+ | 13 | Long-Horizon Next-Action Forecasting | `long_horizon_next_action` | classification | single aligned window | Current 20-frame non-caption multimodal window. -> Action label five seconds later. | macro_f1 (higher better) | 0.0750 | 0.0655 |
68
+ | 14 | Long-Horizon Next-Subtask Forecasting | `next_subtask_forecast` | classification | single aligned window | Current 20-frame non-caption multimodal window. -> Procedure subtask label five seconds later. | macro_f1 (higher better) | 0.0455 | 0.0507 |
69
+ | 15 | Interaction Text Prediction | `interaction_text_prediction` | classification | single aligned window | Current 20-frame sensor window with caption-text features removed. -> Raw annotation interaction phrase for the same window. | macro_f1 (higher better) | 0.0444 | 0.0381 |
70
+ | 16 | Action-Object Relation Prediction | `action_object_relation` | classification | single aligned window | Current 20-frame sensor window with caption-text features removed. -> Joint action plus active object-set relation. | macro_f1 (higher better) | 0.0000 | 0.0000 |
71
+ | 17 | Future Object-Set Forecasting | `object_set_forecast` | multi_label | single aligned window | Current 20-frame sensor window with caption-text features removed. -> Object set active five seconds later. | micro_f1 (higher better) | 0.1694 | 0.1972 |
72
+ | 18 | IMU-to-Hand Pose Reconstruction | `imu_to_hand_pose` | regression | single aligned window | Current IMU acceleration/gyroscope feature block only. -> Current left/right hand joint feature blocks. | mae (lower better) | 0.0420 | 0.0426 |
73
+ | 19 | Camera-View Synchronization Retrieval | `camera_view_sync_retrieval` | retrieval | held-out query window | Fisheye camera-1 feature query projected into fisheye camera-3 feature space. -> The synchronized held-out camera-3 window. | mrr (higher better) | 0.4943 | 0.2409 |
74
+ | 20 | Time-to-Next-Transition Regression | `time_to_transition` | regression | single aligned window | Current 20-frame non-caption multimodal window. -> Frames until the next action-label boundary, capped at 200 frames. | mae (lower better) | 10.5374 | 10.5545 |
75
 
76
  ## Leakage Controls
77
 
FIGURE_INDEX.md CHANGED
@@ -14,13 +14,13 @@ Public figures, diagrams, charts, and derived modality thumbnails. Raw Xperience
14
  | Project logo mark | `docs/assets/brand/xperience10m-logo-mark-512.png` | 512 x 512 | `scripts/build_brand_assets.py` | Primary X-shaped multimodal camera mark used for the website header, README, HF cards, and brand identity. |
15
  | Project logo social card | `docs/assets/brand/xperience10m-logo-social-card.png` | 1200 x 630 | `scripts/build_brand_assets.py` | Large preview image for README, Hugging Face cards, and Open Graph/Twitter social sharing. |
16
  | Project favicon | `docs/assets/brand/xperience10m-logo-favicon-64.png` | 64 x 64 | `scripts/build_brand_assets.py` | Small dark-tile logo for browser tabs and compact navigation. |
17
- | Original task-suite infographic | `docs/assets/task_suite_infographic.png` | 1800 x 7600 | `scripts/render_task_suite_infographic.py` | Primary visual map of the original task families, verified metrics, and sample modalities; the unified public suite is now documented as 20 tasks. |
18
  | Episode-to-task pipeline diagram | `docs/assets/pipeline_diagram.png` | 1800 x 1120 | `scripts/generate_visualizations.py` | End-to-end data processing and evaluation pipeline overview. |
19
  | Qwen3-Omni LoRA training pipeline | `docs/assets/qwen3_omni_lora_pipeline.png` | 1536 x 1024 | `docs/assets/qwen3_omni_lora_pipeline.prompt.md` | Detailed raw-data-to-adapter flow for staged Xperience-10M Qwen3-Omni LoRA training. |
20
  | Spatial intelligence slide diagram | `docs/assets/foundation-pipelines/spatial-intelligence-pipeline.png` | 2560 x 1920 | `scripts/render_foundation_pipeline_diagrams.py` | High-resolution slide diagram for the spatial intelligence pipeline track. |
21
  | Human-video world model slide diagram | `docs/assets/foundation-pipelines/human-video-world-model-pipeline.png` | 2560 x 1920 | `scripts/render_foundation_pipeline_diagrams.py` | High-resolution slide diagram for the human-video world-model pipeline track. |
22
  | Vision-language-action slide diagram | `docs/assets/foundation-pipelines/vision-language-action-pipeline.png` | 2560 x 1920 | `scripts/render_foundation_pipeline_diagrams.py` | High-resolution slide diagram for the VLA/action-policy pipeline track. |
23
- | Minimal and neural task architecture map | `docs/assets/task_architectures.png` | 1800 x 2450 | `scripts/render_overview_figures.py` | Minimal and neural heads for the original task contracts and shared feature contracts. |
24
  | Video modality thumbnail | `docs/assets/modalities/video.jpg` | 880 x 520 | `scripts/export_modality_atlas_assets.py` | Derived thumbnail for synchronized camera streams. |
25
  | Audio modality thumbnail | `docs/assets/modalities/audio.png` | 880 x 520 | `scripts/export_modality_atlas_assets.py` | Derived waveform thumbnail for the MP4 AAC stream. |
26
  | Depth modality thumbnail | `docs/assets/modalities/depth.jpg` | 880 x 520 | `scripts/export_modality_atlas_assets.py` | Derived depth and confidence thumbnail. |
 
14
  | Project logo mark | `docs/assets/brand/xperience10m-logo-mark-512.png` | 512 x 512 | `scripts/build_brand_assets.py` | Primary X-shaped multimodal camera mark used for the website header, README, HF cards, and brand identity. |
15
  | Project logo social card | `docs/assets/brand/xperience10m-logo-social-card.png` | 1200 x 630 | `scripts/build_brand_assets.py` | Large preview image for README, Hugging Face cards, and Open Graph/Twitter social sharing. |
16
  | Project favicon | `docs/assets/brand/xperience10m-logo-favicon-64.png` | 64 x 64 | `scripts/build_brand_assets.py` | Small dark-tile logo for browser tabs and compact navigation. |
17
+ | Original task-suite infographic | `docs/assets/task_suite_infographic.png` | 1800 x 7600 | `scripts/render_task_suite_infographic.py` | Primary visual map of the walkthrough-backed task families, verified metrics, and sample modalities; the unified public suite is documented as 20 tasks. |
18
  | Episode-to-task pipeline diagram | `docs/assets/pipeline_diagram.png` | 1800 x 1120 | `scripts/generate_visualizations.py` | End-to-end data processing and evaluation pipeline overview. |
19
  | Qwen3-Omni LoRA training pipeline | `docs/assets/qwen3_omni_lora_pipeline.png` | 1536 x 1024 | `docs/assets/qwen3_omni_lora_pipeline.prompt.md` | Detailed raw-data-to-adapter flow for staged Xperience-10M Qwen3-Omni LoRA training. |
20
  | Spatial intelligence slide diagram | `docs/assets/foundation-pipelines/spatial-intelligence-pipeline.png` | 2560 x 1920 | `scripts/render_foundation_pipeline_diagrams.py` | High-resolution slide diagram for the spatial intelligence pipeline track. |
21
  | Human-video world model slide diagram | `docs/assets/foundation-pipelines/human-video-world-model-pipeline.png` | 2560 x 1920 | `scripts/render_foundation_pipeline_diagrams.py` | High-resolution slide diagram for the human-video world-model pipeline track. |
22
  | Vision-language-action slide diagram | `docs/assets/foundation-pipelines/vision-language-action-pipeline.png` | 2560 x 1920 | `scripts/render_foundation_pipeline_diagrams.py` | High-resolution slide diagram for the VLA/action-policy pipeline track. |
23
+ | Minimal and neural task architecture map | `docs/assets/task_architectures.png` | 1800 x 2450 | `scripts/render_overview_figures.py` | Minimal and neural heads for the walkthrough-backed task contracts and shared feature contracts. |
24
  | Video modality thumbnail | `docs/assets/modalities/video.jpg` | 880 x 520 | `scripts/export_modality_atlas_assets.py` | Derived thumbnail for synchronized camera streams. |
25
  | Audio modality thumbnail | `docs/assets/modalities/audio.png` | 880 x 520 | `scripts/export_modality_atlas_assets.py` | Derived waveform thumbnail for the MP4 AAC stream. |
26
  | Depth modality thumbnail | `docs/assets/modalities/depth.jpg` | 880 x 520 | `scripts/export_modality_atlas_assets.py` | Derived depth and confidence thumbnail. |
PROJECT_README.md CHANGED
@@ -850,9 +850,9 @@ and verified Qwen3-Omni/Cosmos3 diagnostic artifacts.
850
  scripts/
851
  train_min_action_model.py # motion/IMU baseline
852
  train_all_modalities_model.py # current all-feature lightweight baseline
853
- episode_task_suite.py # original end-to-end task definitions
854
  neural_task_models.py # optional PyTorch MLP heads for task contracts
855
- research_direction_taxonomy.py # maps original tasks to the four research tracks
856
  research_direction_extension_tasks.py # one extra data-backed probe per track
857
  tier2_task_suite.py # historical-name provenance builder for unified task rows
858
  build_unified_task_suite.py # builds TASK_SUITE_20.md and task_suite_20.json
@@ -890,7 +890,7 @@ results/
890
  research_directions/ # four-track taxonomy, CSV, and summary
891
  research_direction_extensions/ # four extra direction probes + predictions
892
  tier2_task_suite/ # provenance baseline tasks + predictions; historical path
893
- task_walkthroughs/ # case-study walkthroughs for original tasks
894
  omni_exploration/ # ModelScope readiness-check artifacts
895
  omni_finetune/model_output_task_probes_20260616/ # task-13/task-16 probes derived from verified model JSON
896
 
@@ -1028,7 +1028,7 @@ cd ropedia-xperience-10m-task-suite
1028
  python scripts/episode_task_suite.py --workspace /path/to/workspace
1029
  ```
1030
 
1031
- Run the original task definitions with lightweight neural heads:
1032
 
1033
  ```bash
1034
  pip install torch
@@ -1449,7 +1449,7 @@ and [`docs/data/additional_development_directions.json`](docs/data/additional_de
1449
 
1450
  ## Four Research Directions
1451
 
1452
- The original task contracts are organized against the four Ropedia research directions in
1453
  a generated artifact, not only in prose:
1454
 
1455
  - [`research_direction_taxonomy.json`](results/episode_task_suite/research_directions/research_direction_taxonomy.json)
@@ -1475,13 +1475,13 @@ Current direction-level coverage:
1475
 
1476
  The important interpretation is that all four directions can be **started** from
1477
  the Xperience-10M sample modalities, but only direction C is strongly represented
1478
- by the original task suite. Directions A, B, and D need additional targets and
1479
  multi-episode training before they become full research deliverables.
1480
 
1481
- ## Four Direction-Extension Probes
1482
 
1483
- Beyond the original task contracts, the repo now includes one extra data-backed
1484
- probe for each research direction. These probes are computed from the same
1485
  `shared_windows.npz`, `windows.csv`, and `feature_manifest.json` artifacts, so
1486
  the reported numbers are computed from sample-derived features and saved metric artifacts.
1487
 
@@ -1543,18 +1543,10 @@ unified 20-task suite, not as a separate benchmark tier.
1543
 
1544
  ![128-episode 20-task model radar](docs/assets/charts/episode128_task_model_radar.svg)
1545
 
1546
- ![Unified 20-task provenance chart](docs/assets/charts/tier2_task_suite.svg)
1547
-
1548
- | # | Task | Input | Output | Minimal | Neural MLP | Meaning |
1549
- | ---: | --- | --- | --- | ---: | ---: | --- |
1550
- | 13 | Long-Horizon Next-Action Forecasting | current non-caption multimodal window | action label five seconds later | `0.0750` macro-F1 | `0.0655` macro-F1 | Tests procedure context beyond the one-second next-action task. |
1551
- | 14 | Long-Horizon Next-Subtask Forecasting | current non-caption multimodal window | subtask five seconds later | `0.0455` macro-F1 | `0.0507` macro-F1 | Moves anticipation from low-level action to high-level procedure state. |
1552
- | 15 | Interaction Text Prediction | current sensor window without caption text | raw interaction phrase | `0.0444` macro-F1 | `0.0381` macro-F1 | Uses the original annotation interaction text instead of only hashed features. |
1553
- | 16 | Action-Object Relation Prediction | current sensor window without caption text | joint action plus object-set label | `0.0000` macro-F1 | `0.0000` macro-F1 | Exposes a hard binding target for action-object reasoning. |
1554
- | 17 | Future Object-Set Forecasting | current sensor window without caption text | object set five seconds later | `0.1694` micro-F1 | `0.1972` micro-F1 | Predicts which objects become relevant soon. |
1555
- | 18 | IMU-to-Hand Pose Reconstruction | IMU feature block only | current left/right hand joints | `0.0420` MAE | `0.0426` MAE | Tests inertial-to-hand sensor bridging. |
1556
- | 19 | Camera-View Synchronization Retrieval | fisheye camera-1 query | synchronized fisheye camera-3 window | `0.4943` MRR | `0.2409` MRR | Stress-tests multi-camera temporal alignment. |
1557
- | 20 | Time-to-Next-Transition Regression | current non-caption multimodal window | capped frames until next action boundary | `10.5374` MAE frames | `10.5545` MAE frames | Converts boundary detection into continuous timing. |
1558
 
1559
  Run:
1560
 
@@ -1632,7 +1624,7 @@ PyTorch MLP classifiers or regressors. Its outputs live under
1632
  and the rollup is stored in the `neural_tasks` section of
1633
  [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json).
1634
 
1635
- The original task-specific heads are:
1636
 
1637
  | Task | Input | Minimal head | Output |
1638
  | --- | --- | --- | --- |
@@ -1663,8 +1655,8 @@ The original task-specific heads are:
1663
  | Neural MLP hand forecast | 0.1079 MPJPE | n/a | Same features/split, nonlinear regression head |
1664
  | Neural MLP temporal order | 0.8520 F1 | 0.8578 | Strong improvement on adjacent-window ordering |
1665
  | Neural MLP misalignment | 0.7153 F1 | 0.7009 | Detects shifted motion/visual/audio pairs better than the linear head |
1666
- | Audio ablation | +0.0418 mean delta | n/a | Current audio variant improves the primary metric on 6 of the original task contracts |
1667
- | Alternate audio representation | +0.0936 mean delta | n/a | Alternate audio-window representation improves over the baseline audio variant on 6 of the original task contracts |
1668
 
1669
  ## Audio Contribution Study
1670
 
@@ -1743,7 +1735,7 @@ episodes; they are not reported as multi-episode benchmark results.
1743
 
1744
  I re-ran the full pipeline from the local raw public sample into a temporary
1745
  local workspace and compared regenerated metrics with the committed
1746
- artifacts. The baseline metrics, original task metrics, feature manifest, and
1747
  available modality manifest matched exactly after float normalization.
1748
 
1749
  See [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) for the
 
850
  scripts/
851
  train_min_action_model.py # motion/IMU baseline
852
  train_all_modalities_model.py # current all-feature lightweight baseline
853
+ episode_task_suite.py # public-sample task definitions
854
  neural_task_models.py # optional PyTorch MLP heads for task contracts
855
+ research_direction_taxonomy.py # maps walkthrough-backed tasks to the four research tracks
856
  research_direction_extension_tasks.py # one extra data-backed probe per track
857
  tier2_task_suite.py # historical-name provenance builder for unified task rows
858
  build_unified_task_suite.py # builds TASK_SUITE_20.md and task_suite_20.json
 
890
  research_directions/ # four-track taxonomy, CSV, and summary
891
  research_direction_extensions/ # four extra direction probes + predictions
892
  tier2_task_suite/ # provenance baseline tasks + predictions; historical path
893
+ task_walkthroughs/ # case-study walkthroughs for walkthrough-backed tasks
894
  omni_exploration/ # ModelScope readiness-check artifacts
895
  omni_finetune/model_output_task_probes_20260616/ # task-13/task-16 probes derived from verified model JSON
896
 
 
1028
  python scripts/episode_task_suite.py --workspace /path/to/workspace
1029
  ```
1030
 
1031
+ Run the public-sample task definitions with lightweight neural heads:
1032
 
1033
  ```bash
1034
  pip install torch
 
1449
 
1450
  ## Four Research Directions
1451
 
1452
+ The walkthrough-backed task contracts are organized against the four Ropedia research directions in
1453
  a generated artifact, not only in prose:
1454
 
1455
  - [`research_direction_taxonomy.json`](results/episode_task_suite/research_directions/research_direction_taxonomy.json)
 
1475
 
1476
  The important interpretation is that all four directions can be **started** from
1477
  the Xperience-10M sample modalities, but only direction C is strongly represented
1478
+ by the current task evidence. Directions A, B, and D need additional targets and
1479
  multi-episode training before they become full research deliverables.
1480
 
1481
+ ## Four Direction Probes
1482
 
1483
+ Alongside the unified 20-task suite, the repo includes one data-backed probe for
1484
+ each research direction. These probes are computed from the same
1485
  `shared_windows.npz`, `windows.csv`, and `feature_manifest.json` artifacts, so
1486
  the reported numbers are computed from sample-derived features and saved metric artifacts.
1487
 
 
1543
 
1544
  ![128-episode 20-task model radar](docs/assets/charts/episode128_task_model_radar.svg)
1545
 
1546
+ The all-task table, including every input/output contract and minimal/neural
1547
+ metric, is in [`TASK_SUITE_20.md`](TASK_SUITE_20.md). Historical provenance
1548
+ links remain listed above for exact source tracing, but the public task surface
1549
+ should be read as one integrated 20-task suite.
 
 
 
 
 
 
 
 
1550
 
1551
  Run:
1552
 
 
1624
  and the rollup is stored in the `neural_tasks` section of
1625
  [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json).
1626
 
1627
+ The walkthrough-backed task heads are:
1628
 
1629
  | Task | Input | Minimal head | Output |
1630
  | --- | --- | --- | --- |
 
1655
  | Neural MLP hand forecast | 0.1079 MPJPE | n/a | Same features/split, nonlinear regression head |
1656
  | Neural MLP temporal order | 0.8520 F1 | 0.8578 | Strong improvement on adjacent-window ordering |
1657
  | Neural MLP misalignment | 0.7153 F1 | 0.7009 | Detects shifted motion/visual/audio pairs better than the linear head |
1658
+ | Audio ablation | +0.0418 mean delta | n/a | Current audio variant improves the primary metric on 6 walkthrough-backed task contracts |
1659
+ | Alternate audio representation | +0.0936 mean delta | n/a | Alternate audio-window representation improves over the baseline audio variant on 6 walkthrough-backed task contracts |
1660
 
1661
  ## Audio Contribution Study
1662
 
 
1735
 
1736
  I re-ran the full pipeline from the local raw public sample into a temporary
1737
  local workspace and compared regenerated metrics with the committed
1738
+ artifacts. The baseline metrics, task metrics, feature manifest, and
1739
  available modality manifest matched exactly after float normalization.
1740
 
1741
  See [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) for the
PROJECT_STATUS.md CHANGED
@@ -33,7 +33,7 @@ prior multiscale release, and v6 is the current public 20-task Qwen3-Omni row.
33
  | Unified 20-task suite | Verified | `TASK_SUITE_20.md`, `docs/data/task_suite_20.json`, `results/episode_task_suite/`, `results/episode_task_suite/tier2_task_suite/` | All 20 task contracts have committed minimal metrics and share the same 20-frame windows, 5-frame stride, chronological split, and minimal/neural head pattern. The `tier2_task_suite` path is historical provenance inside the unified suite, not a separate public tier. |
34
  | 180-result method matrix | Verified complete | `docs/data/task_method_20_result_matrix.json`, `TASK_METHOD_20_RESULT_MATRIX.md`, `docs/data/task_method_20_gap_audit.json`, `docs/assets/charts/unified_task_model_radar.svg` | The public comparison matrix now has 9 methods x 20 tasks = 180/180 scored method-task records. Six rows are explicitly marked as compact-proxy scores where the public 128-episode export lacks the direct raw target. |
35
  | Neural heads | Verified | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | Each task also has a compact PyTorch MLP run over the same feature tensor and chronological split. |
36
- | Audio contribution study | Verified | `scripts/audio_ablation_and_raw_upgrade.py`, `results/audio_ablation/`, `docs/data/audio_ablation_summary.json` | Audio variants are compared across the original task contracts; audio improves the primary metric on 6 of those contracts, and a 588-d audio-window representation improves over the baseline audio variant on 6 of those contracts. |
37
  | Research takeaways | Verified | `RESEARCH_TAKEAWAYS.md`, `docs/data/research_takeaways.json`, `scripts/build_research_takeaways.py` | The main result interpretation is generated from committed metrics: chronological class shift, neural gains on dynamics/order/alignment, open retrieval/reconstruction problems, and the need for held-out episodes. |
38
  | Research roadmap | Current | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | The roadmap connects public-sample task development to the final verified Qwen3-Omni diagnostic result, same-split baseline alignment, action/subtask error analysis, robustness runs, world/policy tracks, and the future Xperience-native pretraining goal. |
39
  | 128-episode task-suite enhancement pack | Current no-new-episode plan | `TASK_SUITE_ENHANCEMENT_128.md`, `docs/data/task_suite_enhancement_128.json`, `results/omni_finetune/task_suite_enhancement_128_v1_20260608/enhancement_plan.json`, `scripts/omni/build_task_suite_enhancement_128.py` | The current 3,808-window selected split can be stressed without more episodes by exporting denser and multiscale windows. The recommended next export is `multiscale_20s10_40s20_80s40`, estimated at 106,095 windows from the observed frame spans; the pack also defines hierarchical action/subtask targets, raw-feature shard priorities for unsupported tasks, and Qwen3-Omni/Cosmos3 follow-up run cards. |
@@ -112,7 +112,7 @@ prior multiscale release, and v6 is the current public 20-task Qwen3-Omni row.
112
  - The current reconstruction task reconstructs feature vectors, not pixel
113
  depth, meshes, NeRF outputs, or Gaussian splats.
114
  - Audio is part of the current 8,546-dimensional baseline feature vector.
115
- - Audio contribution is evaluated across the original task contracts in
116
  `results/audio_ablation/`.
117
  - Foundation-model selection is now explicit: Qwen3-Omni is the immediate
118
  trainable pilot, Cosmos 3 is the first world-model track, and Cosmos3-Super
 
33
  | Unified 20-task suite | Verified | `TASK_SUITE_20.md`, `docs/data/task_suite_20.json`, `results/episode_task_suite/`, `results/episode_task_suite/tier2_task_suite/` | All 20 task contracts have committed minimal metrics and share the same 20-frame windows, 5-frame stride, chronological split, and minimal/neural head pattern. The `tier2_task_suite` path is historical provenance inside the unified suite, not a separate public tier. |
34
  | 180-result method matrix | Verified complete | `docs/data/task_method_20_result_matrix.json`, `TASK_METHOD_20_RESULT_MATRIX.md`, `docs/data/task_method_20_gap_audit.json`, `docs/assets/charts/unified_task_model_radar.svg` | The public comparison matrix now has 9 methods x 20 tasks = 180/180 scored method-task records. Six rows are explicitly marked as compact-proxy scores where the public 128-episode export lacks the direct raw target. |
35
  | Neural heads | Verified | `scripts/neural_task_models.py`, `results/episode_task_suite/neural_mlp/` | Each task also has a compact PyTorch MLP run over the same feature tensor and chronological split. |
36
+ | Audio contribution study | Verified | `scripts/audio_ablation_and_raw_upgrade.py`, `results/audio_ablation/`, `docs/data/audio_ablation_summary.json` | Audio variants are compared across the walkthrough-backed task contracts; audio improves the primary metric on 6 of those contracts, and a 588-d audio-window representation improves over the baseline audio variant on 6 of those contracts. |
37
  | Research takeaways | Verified | `RESEARCH_TAKEAWAYS.md`, `docs/data/research_takeaways.json`, `scripts/build_research_takeaways.py` | The main result interpretation is generated from committed metrics: chronological class shift, neural gains on dynamics/order/alignment, open retrieval/reconstruction problems, and the need for held-out episodes. |
38
  | Research roadmap | Current | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | The roadmap connects public-sample task development to the final verified Qwen3-Omni diagnostic result, same-split baseline alignment, action/subtask error analysis, robustness runs, world/policy tracks, and the future Xperience-native pretraining goal. |
39
  | 128-episode task-suite enhancement pack | Current no-new-episode plan | `TASK_SUITE_ENHANCEMENT_128.md`, `docs/data/task_suite_enhancement_128.json`, `results/omni_finetune/task_suite_enhancement_128_v1_20260608/enhancement_plan.json`, `scripts/omni/build_task_suite_enhancement_128.py` | The current 3,808-window selected split can be stressed without more episodes by exporting denser and multiscale windows. The recommended next export is `multiscale_20s10_40s20_80s40`, estimated at 106,095 windows from the observed frame spans; the pack also defines hierarchical action/subtask targets, raw-feature shard priorities for unsupported tasks, and Qwen3-Omni/Cosmos3 follow-up run cards. |
 
112
  - The current reconstruction task reconstructs feature vectors, not pixel
113
  depth, meshes, NeRF outputs, or Gaussian splats.
114
  - Audio is part of the current 8,546-dimensional baseline feature vector.
115
+ - Audio contribution is evaluated across the walkthrough-backed task contracts in
116
  `results/audio_ablation/`.
117
  - Foundation-model selection is now explicit: Qwen3-Omni is the immediate
118
  trainable pilot, Cosmos 3 is the first world-model track, and Cosmos3-Super
README.md CHANGED
@@ -872,9 +872,9 @@ and verified Qwen3-Omni/Cosmos3 diagnostic artifacts.
872
  scripts/
873
  train_min_action_model.py # motion/IMU baseline
874
  train_all_modalities_model.py # current all-feature lightweight baseline
875
- episode_task_suite.py # original end-to-end task definitions
876
  neural_task_models.py # optional PyTorch MLP heads for task contracts
877
- research_direction_taxonomy.py # maps original tasks to the four research tracks
878
  research_direction_extension_tasks.py # one extra data-backed probe per track
879
  tier2_task_suite.py # historical-name provenance builder for unified task rows
880
  build_unified_task_suite.py # builds TASK_SUITE_20.md and task_suite_20.json
@@ -912,7 +912,7 @@ results/
912
  research_directions/ # four-track taxonomy, CSV, and summary
913
  research_direction_extensions/ # four extra direction probes + predictions
914
  tier2_task_suite/ # provenance baseline tasks + predictions; historical path
915
- task_walkthroughs/ # case-study walkthroughs for original tasks
916
  omni_exploration/ # ModelScope readiness-check artifacts
917
  omni_finetune/model_output_task_probes_20260616/ # task-13/task-16 probes derived from verified model JSON
918
 
@@ -1050,7 +1050,7 @@ cd ropedia-xperience-10m-task-suite
1050
  python scripts/episode_task_suite.py --workspace /path/to/workspace
1051
  ```
1052
 
1053
- Run the original task definitions with lightweight neural heads:
1054
 
1055
  ```bash
1056
  pip install torch
@@ -1471,7 +1471,7 @@ and [`docs/data/additional_development_directions.json`](docs/data/additional_de
1471
 
1472
  ## Four Research Directions
1473
 
1474
- The original task contracts are organized against the four Ropedia research directions in
1475
  a generated artifact, not only in prose:
1476
 
1477
  - [`research_direction_taxonomy.json`](results/episode_task_suite/research_directions/research_direction_taxonomy.json)
@@ -1497,13 +1497,13 @@ Current direction-level coverage:
1497
 
1498
  The important interpretation is that all four directions can be **started** from
1499
  the Xperience-10M sample modalities, but only direction C is strongly represented
1500
- by the original task suite. Directions A, B, and D need additional targets and
1501
  multi-episode training before they become full research deliverables.
1502
 
1503
- ## Four Direction-Extension Probes
1504
 
1505
- Beyond the original task contracts, the repo now includes one extra data-backed
1506
- probe for each research direction. These probes are computed from the same
1507
  `shared_windows.npz`, `windows.csv`, and `feature_manifest.json` artifacts, so
1508
  the reported numbers are computed from sample-derived features and saved metric artifacts.
1509
 
@@ -1565,18 +1565,10 @@ unified 20-task suite, not as a separate benchmark tier.
1565
 
1566
  ![128-episode 20-task model radar](docs/assets/charts/episode128_task_model_radar.svg)
1567
 
1568
- ![Unified 20-task provenance chart](docs/assets/charts/tier2_task_suite.svg)
1569
-
1570
- | # | Task | Input | Output | Minimal | Neural MLP | Meaning |
1571
- | ---: | --- | --- | --- | ---: | ---: | --- |
1572
- | 13 | Long-Horizon Next-Action Forecasting | current non-caption multimodal window | action label five seconds later | `0.0750` macro-F1 | `0.0655` macro-F1 | Tests procedure context beyond the one-second next-action task. |
1573
- | 14 | Long-Horizon Next-Subtask Forecasting | current non-caption multimodal window | subtask five seconds later | `0.0455` macro-F1 | `0.0507` macro-F1 | Moves anticipation from low-level action to high-level procedure state. |
1574
- | 15 | Interaction Text Prediction | current sensor window without caption text | raw interaction phrase | `0.0444` macro-F1 | `0.0381` macro-F1 | Uses the original annotation interaction text instead of only hashed features. |
1575
- | 16 | Action-Object Relation Prediction | current sensor window without caption text | joint action plus object-set label | `0.0000` macro-F1 | `0.0000` macro-F1 | Exposes a hard binding target for action-object reasoning. |
1576
- | 17 | Future Object-Set Forecasting | current sensor window without caption text | object set five seconds later | `0.1694` micro-F1 | `0.1972` micro-F1 | Predicts which objects become relevant soon. |
1577
- | 18 | IMU-to-Hand Pose Reconstruction | IMU feature block only | current left/right hand joints | `0.0420` MAE | `0.0426` MAE | Tests inertial-to-hand sensor bridging. |
1578
- | 19 | Camera-View Synchronization Retrieval | fisheye camera-1 query | synchronized fisheye camera-3 window | `0.4943` MRR | `0.2409` MRR | Stress-tests multi-camera temporal alignment. |
1579
- | 20 | Time-to-Next-Transition Regression | current non-caption multimodal window | capped frames until next action boundary | `10.5374` MAE frames | `10.5545` MAE frames | Converts boundary detection into continuous timing. |
1580
 
1581
  Run:
1582
 
@@ -1654,7 +1646,7 @@ PyTorch MLP classifiers or regressors. Its outputs live under
1654
  and the rollup is stored in the `neural_tasks` section of
1655
  [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json).
1656
 
1657
- The original task-specific heads are:
1658
 
1659
  | Task | Input | Minimal head | Output |
1660
  | --- | --- | --- | --- |
@@ -1685,8 +1677,8 @@ The original task-specific heads are:
1685
  | Neural MLP hand forecast | 0.1079 MPJPE | n/a | Same features/split, nonlinear regression head |
1686
  | Neural MLP temporal order | 0.8520 F1 | 0.8578 | Strong improvement on adjacent-window ordering |
1687
  | Neural MLP misalignment | 0.7153 F1 | 0.7009 | Detects shifted motion/visual/audio pairs better than the linear head |
1688
- | Audio ablation | +0.0418 mean delta | n/a | Current audio variant improves the primary metric on 6 of the original task contracts |
1689
- | Alternate audio representation | +0.0936 mean delta | n/a | Alternate audio-window representation improves over the baseline audio variant on 6 of the original task contracts |
1690
 
1691
  ## Audio Contribution Study
1692
 
@@ -1765,7 +1757,7 @@ episodes; they are not reported as multi-episode benchmark results.
1765
 
1766
  I re-ran the full pipeline from the local raw public sample into a temporary
1767
  local workspace and compared regenerated metrics with the committed
1768
- artifacts. The baseline metrics, original task metrics, feature manifest, and
1769
  available modality manifest matched exactly after float normalization.
1770
 
1771
  See [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) for the
 
872
  scripts/
873
  train_min_action_model.py # motion/IMU baseline
874
  train_all_modalities_model.py # current all-feature lightweight baseline
875
+ episode_task_suite.py # public-sample task definitions
876
  neural_task_models.py # optional PyTorch MLP heads for task contracts
877
+ research_direction_taxonomy.py # maps walkthrough-backed tasks to the four research tracks
878
  research_direction_extension_tasks.py # one extra data-backed probe per track
879
  tier2_task_suite.py # historical-name provenance builder for unified task rows
880
  build_unified_task_suite.py # builds TASK_SUITE_20.md and task_suite_20.json
 
912
  research_directions/ # four-track taxonomy, CSV, and summary
913
  research_direction_extensions/ # four extra direction probes + predictions
914
  tier2_task_suite/ # provenance baseline tasks + predictions; historical path
915
+ task_walkthroughs/ # case-study walkthroughs for walkthrough-backed tasks
916
  omni_exploration/ # ModelScope readiness-check artifacts
917
  omni_finetune/model_output_task_probes_20260616/ # task-13/task-16 probes derived from verified model JSON
918
 
 
1050
  python scripts/episode_task_suite.py --workspace /path/to/workspace
1051
  ```
1052
 
1053
+ Run the public-sample task definitions with lightweight neural heads:
1054
 
1055
  ```bash
1056
  pip install torch
 
1471
 
1472
  ## Four Research Directions
1473
 
1474
+ The walkthrough-backed task contracts are organized against the four Ropedia research directions in
1475
  a generated artifact, not only in prose:
1476
 
1477
  - [`research_direction_taxonomy.json`](results/episode_task_suite/research_directions/research_direction_taxonomy.json)
 
1497
 
1498
  The important interpretation is that all four directions can be **started** from
1499
  the Xperience-10M sample modalities, but only direction C is strongly represented
1500
+ by the current task evidence. Directions A, B, and D need additional targets and
1501
  multi-episode training before they become full research deliverables.
1502
 
1503
+ ## Four Direction Probes
1504
 
1505
+ Alongside the unified 20-task suite, the repo includes one data-backed probe for
1506
+ each research direction. These probes are computed from the same
1507
  `shared_windows.npz`, `windows.csv`, and `feature_manifest.json` artifacts, so
1508
  the reported numbers are computed from sample-derived features and saved metric artifacts.
1509
 
 
1565
 
1566
  ![128-episode 20-task model radar](docs/assets/charts/episode128_task_model_radar.svg)
1567
 
1568
+ The all-task table, including every input/output contract and minimal/neural
1569
+ metric, is in [`TASK_SUITE_20.md`](TASK_SUITE_20.md). Historical provenance
1570
+ links remain listed above for exact source tracing, but the public task surface
1571
+ should be read as one integrated 20-task suite.
 
 
 
 
 
 
 
 
1572
 
1573
  Run:
1574
 
 
1646
  and the rollup is stored in the `neural_tasks` section of
1647
  [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json).
1648
 
1649
+ The walkthrough-backed task heads are:
1650
 
1651
  | Task | Input | Minimal head | Output |
1652
  | --- | --- | --- | --- |
 
1677
  | Neural MLP hand forecast | 0.1079 MPJPE | n/a | Same features/split, nonlinear regression head |
1678
  | Neural MLP temporal order | 0.8520 F1 | 0.8578 | Strong improvement on adjacent-window ordering |
1679
  | Neural MLP misalignment | 0.7153 F1 | 0.7009 | Detects shifted motion/visual/audio pairs better than the linear head |
1680
+ | Audio ablation | +0.0418 mean delta | n/a | Current audio variant improves the primary metric on 6 walkthrough-backed task contracts |
1681
+ | Alternate audio representation | +0.0936 mean delta | n/a | Alternate audio-window representation improves over the baseline audio variant on 6 walkthrough-backed task contracts |
1682
 
1683
  ## Audio Contribution Study
1684
 
 
1757
 
1758
  I re-ran the full pipeline from the local raw public sample into a temporary
1759
  local workspace and compared regenerated metrics with the committed
1760
+ artifacts. The baseline metrics, task metrics, feature manifest, and
1761
  available modality manifest matched exactly after float normalization.
1762
 
1763
  See [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) for the
RESEARCH_TAKEAWAYS.md CHANGED
@@ -80,7 +80,7 @@ Current scope: The current reconstruction task predicts feature vectors; depth,
80
 
81
  ### Audio helps some tasks and hurts others on the public sample
82
 
83
- Audio improves the primary metric on 6 of the original task contracts, while raw log-mel replacement improves over the current handcrafted block on 6 of those contracts. The largest current-audio gain appears in feature reconstruction, not in action classification.
84
 
85
  | Metric | Value |
86
  | --- | ---: |
 
80
 
81
  ### Audio helps some tasks and hurts others on the public sample
82
 
83
+ Audio improves the primary metric on 6 walkthrough-backed task contracts, while raw log-mel replacement improves over the current handcrafted block on 6 of those contracts. The largest current-audio gain appears in feature reconstruction, not in action classification.
84
 
85
  | Metric | Value |
86
  | --- | ---: |
TASK_METHOD_20_GAP_AUDIT.md CHANGED
@@ -1,6 +1,6 @@
1
  # Task Method 20-Result Completion Audit
2
 
3
- Generated: `2026-06-21T08:38:20+00:00`
4
 
5
  This audit is the explicit completion ledger for the 9-method x 20-task result
6
  matrix. The current public matrix is complete at 180/180 scored records while
 
1
  # Task Method 20-Result Completion Audit
2
 
3
+ Generated: `2026-06-21T15:21:42+00:00`
4
 
5
  This audit is the explicit completion ledger for the 9-method x 20-task result
6
  matrix. The current public matrix is complete at 180/180 scored records while
TASK_SUITE_20.md CHANGED
@@ -20,28 +20,28 @@ as a separate benchmark tier.
20
 
21
  ## Task Table
22
 
23
- | # | Task | Artifact id | Origin | Input -> output | Primary metric | Minimal | Neural |
24
- | ---: | --- | --- | --- | --- | --- | ---: | ---: |
25
- | 1 | Action Recognition | `timeline_action` | original task | 20-frame multimodal window -> current action class | macro-F1 (higher better) | 0.0500 | 0.0148 |
26
- | 2 | Procedure Step Recognition | `timeline_subtask` | original task | 20-frame multimodal window -> current procedure step | macro-F1 (higher better) | 0.0506 | 0.0281 |
27
- | 3 | Action Boundary Detection | `transition_detection` | original task | current window with boundary target -> boundary or steady | macro-F1 (higher better) | 0.6118 | 0.5862 |
28
- | 4 | Next-Action Prediction | `next_action` | original task | current window at time t -> action at t+20 frames | macro-F1 (higher better) | 0.0593 | 0.0419 |
29
- | 5 | Hand Trajectory Forecasting | `hand_trajectory_forecast` | original task | current multimodal window -> future hand-joint trajectory | MPJPE (lower better) | 0.8647 | 0.1079 |
30
- | 6 | Contact State Prediction | `contact_prediction` | original task | non-contact, non-caption features -> contact or no contact | macro-F1 (higher better) | 1.0000 | 1.0000 |
31
- | 7 | Object Relevance Prediction | `object_relevance` | original task | non-caption multimodal features -> relevant object set | micro-F1 (higher better) | 0.1803 | 0.1679 |
32
- | 8 | Language Grounding | `caption_grounding` | original task | text-like query and candidate windows -> ranked matching moments | MRR (higher better) | 0.0160 | 0.0168 |
33
- | 9 | Cross-Modal Retrieval | `cross_modal_retrieval` | original task | motion/IMU/pose query; depth/video candidates -> ranked visual windows | MRR (higher better) | 0.2693 | 0.1300 |
34
- | 10 | Cross-Modal Reconstruction | `modality_reconstruction` | original task | motion, IMU, and camera/pose features -> reconstructed depth/video vector | R2 (higher better) | -0.0153 | -0.0102 |
35
- | 11 | Temporal Order Verification | `temporal_order` | original task | two adjacent windows plus difference vector -> correct or reversed | F1 (higher better) | 0.5400 | 0.8520 |
36
- | 12 | Multimodal Synchronization Detection | `misalignment_detection` | original task | motion-side and visual/depth-side feature groups -> aligned or shifted | F1 (higher better) | 0.5052 | 0.7153 |
37
- | 13 | Long-Horizon Next-Action Forecasting | `long_horizon_next_action` | additional task | Current 20-frame non-caption multimodal window. -> Action label five seconds later. | macro-F1 (higher better) | 0.0750 | 0.0655 |
38
- | 14 | Long-Horizon Next-Subtask Forecasting | `next_subtask_forecast` | additional task | Current 20-frame non-caption multimodal window. -> Procedure subtask label five seconds later. | macro-F1 (higher better) | 0.0455 | 0.0507 |
39
- | 15 | Interaction Text Prediction | `interaction_text_prediction` | additional task | Current 20-frame sensor window with caption-text features removed. -> Raw annotation interaction phrase for the same window. | macro-F1 (higher better) | 0.0444 | 0.0381 |
40
- | 16 | Action-Object Relation Prediction | `action_object_relation` | additional task | Current 20-frame sensor window with caption-text features removed. -> Joint action plus active object-set relation. | macro-F1 (higher better) | 0.0000 | 0.0000 |
41
- | 17 | Future Object-Set Forecasting | `object_set_forecast` | additional task | Current 20-frame sensor window with caption-text features removed. -> Object set active five seconds later. | micro-F1 (higher better) | 0.1694 | 0.1972 |
42
- | 18 | IMU-to-Hand Pose Reconstruction | `imu_to_hand_pose` | additional task | Current IMU acceleration/gyroscope feature block only. -> Current left/right hand joint feature blocks. | MAE (lower better) | 0.0420 | 0.0426 |
43
- | 19 | Camera-View Synchronization Retrieval | `camera_view_sync_retrieval` | additional task | Fisheye camera-1 feature query projected into fisheye camera-3 feature space. -> The synchronized held-out camera-3 window. | MRR (higher better) | 0.4943 | 0.2409 |
44
- | 20 | Time-to-Next-Transition Regression | `time_to_transition` | additional task | Current 20-frame non-caption multimodal window. -> Frames until the next action-label boundary, capped at 200 frames. | MAE frames (lower better) | 10.5374 | 10.5545 |
45
 
46
  ## Machine-Readable Copy
47
 
 
20
 
21
  ## Task Table
22
 
23
+ | # | Task | Artifact id | Input -> output | Primary metric | Minimal | Neural |
24
+ | ---: | --- | --- | --- | --- | ---: | ---: |
25
+ | 1 | Action Recognition | `timeline_action` | 20-frame multimodal window -> current action class | macro-F1 (higher better) | 0.0500 | 0.0148 |
26
+ | 2 | Procedure Step Recognition | `timeline_subtask` | 20-frame multimodal window -> current procedure step | macro-F1 (higher better) | 0.0506 | 0.0281 |
27
+ | 3 | Action Boundary Detection | `transition_detection` | current window with boundary target -> boundary or steady | macro-F1 (higher better) | 0.6118 | 0.5862 |
28
+ | 4 | Next-Action Prediction | `next_action` | current window at time t -> action at t+20 frames | macro-F1 (higher better) | 0.0593 | 0.0419 |
29
+ | 5 | Hand Trajectory Forecasting | `hand_trajectory_forecast` | current multimodal window -> future hand-joint trajectory | MPJPE (lower better) | 0.8647 | 0.1079 |
30
+ | 6 | Contact State Prediction | `contact_prediction` | non-contact, non-caption features -> contact or no contact | macro-F1 (higher better) | 1.0000 | 1.0000 |
31
+ | 7 | Object Relevance Prediction | `object_relevance` | non-caption multimodal features -> relevant object set | micro-F1 (higher better) | 0.1803 | 0.1679 |
32
+ | 8 | Language Grounding | `caption_grounding` | text-like query and candidate windows -> ranked matching moments | MRR (higher better) | 0.0160 | 0.0168 |
33
+ | 9 | Cross-Modal Retrieval | `cross_modal_retrieval` | motion/IMU/pose query; depth/video candidates -> ranked visual windows | MRR (higher better) | 0.2693 | 0.1300 |
34
+ | 10 | Cross-Modal Reconstruction | `modality_reconstruction` | motion, IMU, and camera/pose features -> reconstructed depth/video vector | R2 (higher better) | -0.0153 | -0.0102 |
35
+ | 11 | Temporal Order Verification | `temporal_order` | two adjacent windows plus difference vector -> correct or reversed | F1 (higher better) | 0.5400 | 0.8520 |
36
+ | 12 | Multimodal Synchronization Detection | `misalignment_detection` | motion-side and visual/depth-side feature groups -> aligned or shifted | F1 (higher better) | 0.5052 | 0.7153 |
37
+ | 13 | Long-Horizon Next-Action Forecasting | `long_horizon_next_action` | Current 20-frame non-caption multimodal window. -> Action label five seconds later. | macro-F1 (higher better) | 0.0750 | 0.0655 |
38
+ | 14 | Long-Horizon Next-Subtask Forecasting | `next_subtask_forecast` | Current 20-frame non-caption multimodal window. -> Procedure subtask label five seconds later. | macro-F1 (higher better) | 0.0455 | 0.0507 |
39
+ | 15 | Interaction Text Prediction | `interaction_text_prediction` | Current 20-frame sensor window with caption-text features removed. -> Raw annotation interaction phrase for the same window. | macro-F1 (higher better) | 0.0444 | 0.0381 |
40
+ | 16 | Action-Object Relation Prediction | `action_object_relation` | Current 20-frame sensor window with caption-text features removed. -> Joint action plus active object-set relation. | macro-F1 (higher better) | 0.0000 | 0.0000 |
41
+ | 17 | Future Object-Set Forecasting | `object_set_forecast` | Current 20-frame sensor window with caption-text features removed. -> Object set active five seconds later. | micro-F1 (higher better) | 0.1694 | 0.1972 |
42
+ | 18 | IMU-to-Hand Pose Reconstruction | `imu_to_hand_pose` | Current IMU acceleration/gyroscope feature block only. -> Current left/right hand joint feature blocks. | MAE (lower better) | 0.0420 | 0.0426 |
43
+ | 19 | Camera-View Synchronization Retrieval | `camera_view_sync_retrieval` | Fisheye camera-1 feature query projected into fisheye camera-3 feature space. -> The synchronized held-out camera-3 window. | MRR (higher better) | 0.4943 | 0.2409 |
44
+ | 20 | Time-to-Next-Transition Regression | `time_to_transition` | Current 20-frame non-caption multimodal window. -> Frames until the next action-label boundary, capped at 200 frames. | MAE frames (lower better) | 10.5374 | 10.5545 |
45
 
46
  ## Machine-Readable Copy
47
 
data/artifact_index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
- "generated_at_utc": "2026-06-21T14:40:34+00:00",
4
  "status": "pass",
5
  "artifact_count": 228,
6
  "missing": [],
@@ -59,8 +59,8 @@
59
  "surface": "website_hf",
60
  "shows": "Machine-readable first-reader project brief for the website and Hugging Face mirrors.",
61
  "exists": true,
62
- "bytes": 4019,
63
- "sha256": "9521556a750941a0f9ee8e9541903acbb0fbec2501fd05ed4e7a017fc18cf794"
64
  },
65
  {
66
  "id": "project_status",
@@ -70,8 +70,8 @@
70
  "surface": "repo_hf",
71
  "shows": "Gives a compact current-state table for first-pass readers.",
72
  "exists": true,
73
- "bytes": 15993,
74
- "sha256": "96bf5d894ace804aea2f3889a4d99a802a5e015405e7eed573eb3a98882ce968"
75
  },
76
  {
77
  "id": "project_status_json",
@@ -81,8 +81,8 @@
81
  "surface": "website_hf",
82
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
83
  "exists": true,
84
- "bytes": 23255,
85
- "sha256": "874f1133ee75f060735f0c9e763cf81463f304432f1dbca3ebc9837225c0259d"
86
  },
87
  {
88
  "id": "glossary",
@@ -576,8 +576,8 @@
576
  "surface": "website_hf",
577
  "shows": "Gives a short project path with scope status and public surfaces.",
578
  "exists": true,
579
- "bytes": 10009,
580
- "sha256": "e0f8bd65cd15b0fe68c8079045b4c72552daaf644c35b8a7a68426250a4aa441"
581
  },
582
  {
583
  "id": "artifact_guide",
@@ -587,8 +587,8 @@
587
  "surface": "repo_hf",
588
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
589
  "exists": true,
590
- "bytes": 20571,
591
- "sha256": "217e3eb2cf82999f75ce6e132f567fa1ed08d319bf7a44f77b7150a45fae5274"
592
  },
593
  {
594
  "id": "official_dataset_card_alignment",
@@ -632,7 +632,7 @@
632
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
633
  "exists": true,
634
  "bytes": 4432,
635
- "sha256": "3def3dc923162ad0d2802acdca8a689a4e9ad1408f36edae8f77f49c4507cef1"
636
  },
637
  {
638
  "id": "source_alignment_validator",
@@ -686,8 +686,8 @@
686
  "surface": "repo_hf",
687
  "shows": "Defines the window unit, chronological split, task metrics, leakage controls, and current limitations.",
688
  "exists": true,
689
- "bytes": 9156,
690
- "sha256": "cfc23b3115ebce2b41a349b8a2cd6989aaf2294790c79e2b17545ebede2b2df0"
691
  },
692
  {
693
  "id": "evaluation_protocol_json",
@@ -697,8 +697,8 @@
697
  "surface": "website_hf",
698
  "shows": "Machine-readable protocol generated from committed task metrics for website and HF mirrors.",
699
  "exists": true,
700
- "bytes": 24007,
701
- "sha256": "dde490d175f0d6828f5973f1b24e696a3d1b3b09d65a59cb9c1dde5c38845b66"
702
  },
703
  {
704
  "id": "evaluation_protocol_builder",
@@ -708,8 +708,8 @@
708
  "surface": "repo_hf",
709
  "shows": "Regenerates the protocol from committed summary metrics and task artifacts.",
710
  "exists": true,
711
- "bytes": 19931,
712
- "sha256": "080f894b50c609e3a467c8c513dfe441f90ba0dad3586dd1cb88de6e58eedb3b"
713
  },
714
  {
715
  "id": "task_suite_20",
@@ -719,8 +719,8 @@
719
  "surface": "repo_hf",
720
  "shows": "Reader-facing table for the single unified public-sample task suite under the same window, split, feature, and baseline contract.",
721
  "exists": true,
722
- "bytes": 5196,
723
- "sha256": "473891503dcd1251a2cc9a16e6642ce16fbca9d264a734d2397c2afc60977195"
724
  },
725
  {
726
  "id": "task_suite_20_json",
@@ -730,8 +730,8 @@
730
  "surface": "website_hf",
731
  "shows": "Machine-readable unified 20-task index for the website, Hugging Face mirrors, and live verification.",
732
  "exists": true,
733
- "bytes": 34597,
734
- "sha256": "2029f7f9744001861ac00acabdb578fe97d3b39a1c16a7c2d19c56347ded22d7"
735
  },
736
  {
737
  "id": "task_suite_20_builder",
@@ -741,8 +741,8 @@
741
  "surface": "repo_hf",
742
  "shows": "Regenerates the unified 20-task JSON and Markdown from the public-sample metrics plus the historical provenance result bundle.",
743
  "exists": true,
744
- "bytes": 12213,
745
- "sha256": "1421593f05e345799007bbcdf138f81dfdb7c511ec1e31b56d00e2cdaed3d7de"
746
  },
747
  {
748
  "id": "unified_task_model_radar_json",
@@ -1005,8 +1005,8 @@
1005
  "surface": "repo_hf",
1006
  "shows": "Summarizes the main research lessons from committed metrics and identifies which experiments need held-out episodes.",
1007
  "exists": true,
1008
- "bytes": 5172,
1009
- "sha256": "39978c1e30b6aa76c5fd2684e9a1111ec2e813423feaff6053084b0335968db8"
1010
  },
1011
  {
1012
  "id": "research_takeaways_json",
@@ -1016,8 +1016,8 @@
1016
  "surface": "website_hf",
1017
  "shows": "Machine-readable result interpretation for the website, HF cards, and mirror checks.",
1018
  "exists": true,
1019
- "bytes": 7162,
1020
- "sha256": "9899c5cb6b92bcfe5e64f98503af5b7d0759ad1a9c5098dbfe4146f54ee26656"
1021
  },
1022
  {
1023
  "id": "research_takeaways_builder",
@@ -1027,8 +1027,8 @@
1027
  "surface": "repo_hf",
1028
  "shows": "Regenerates the research takeaways from committed summary metrics and task result artifacts.",
1029
  "exists": true,
1030
- "bytes": 13496,
1031
- "sha256": "c35995607dc16fa2a318c626b84323eb47b61a373a492c22cf9fdac851b4d9b5"
1032
  },
1033
  {
1034
  "id": "audio_ablation_script",
@@ -1036,7 +1036,7 @@
1036
  "path": "scripts/audio_ablation_and_raw_upgrade.py",
1037
  "kind": "result_interpretation",
1038
  "surface": "repo_hf",
1039
- "shows": "Measures audio contribution variants across the original task contracts.",
1040
  "exists": true,
1041
  "bytes": 43159,
1042
  "sha256": "2444f2e52efb975be931b33d66b7180d53031e1d5e821719122160f92f4540aa"
@@ -1080,7 +1080,7 @@
1080
  "path": "docs/assets/charts/audio_ablation_delta.svg",
1081
  "kind": "visual_evidence",
1082
  "surface": "website_hf",
1083
- "shows": "Bar chart of measured current-audio primary-metric deltas across the original tasks.",
1084
  "exists": true,
1085
  "bytes": 4146,
1086
  "sha256": "187dbabe01f9ff18841ff61a1e7fbf85bebdd188cc0f248bb5090d64528e7568"
@@ -1093,8 +1093,8 @@
1093
  "surface": "repo_hf",
1094
  "shows": "Catalogs public figures, charts, modality thumbnails, dimensions, hashes, roles, and source scripts.",
1095
  "exists": true,
1096
- "bytes": 7011,
1097
- "sha256": "f6554cd980efa6c0b3b8feac5ff3e19c3e2e74ccf2d446ac4afb5ee5d65413f3"
1098
  },
1099
  {
1100
  "id": "figure_index_json",
@@ -1104,8 +1104,8 @@
1104
  "surface": "website_hf",
1105
  "shows": "Machine-readable visual asset index for website and Hugging Face mirrors.",
1106
  "exists": true,
1107
- "bytes": 19469,
1108
- "sha256": "11a06ee64d28f81f3280eb99327d99b47dc58fb1521332434b9df11c97b9b4e8"
1109
  },
1110
  {
1111
  "id": "figure_index_builder",
@@ -1115,8 +1115,8 @@
1115
  "surface": "repo_hf",
1116
  "shows": "Regenerates visual-asset hashes, dimensions, and source-script provenance.",
1117
  "exists": true,
1118
- "bytes": 16829,
1119
- "sha256": "14f1ed7f94630c8f70fbc14547071db251647f3d527cf760341b7a233883d069"
1120
  },
1121
  {
1122
  "id": "brand_assets_json",
@@ -1182,7 +1182,7 @@
1182
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1183
  "exists": true,
1184
  "bytes": 8640,
1185
- "sha256": "c8ce99ac63ab70e3696386671bf201f5605b6a88c8be8f288d44a122bad9025e"
1186
  },
1187
  {
1188
  "id": "public_surface_qa",
@@ -1226,7 +1226,7 @@
1226
  "volatile": true,
1227
  "shows": "Machine-readable report for SEO/social metadata, accessible tab semantics, public links, project links, and clear project presentation.",
1228
  "exists": true,
1229
- "bytes": 7690,
1230
  "hash_policy": "existence_and_size_only"
1231
  },
1232
  {
@@ -1307,7 +1307,7 @@
1307
  "volatile": true,
1308
  "shows": "Records the last live GitHub/HF URL verification after upload.",
1309
  "exists": true,
1310
- "bytes": 189922,
1311
  "hash_policy": "existence_and_size_only"
1312
  },
1313
  {
@@ -1340,8 +1340,8 @@
1340
  "surface": "website_hf",
1341
  "shows": "Machine-readable reproduction steps with expected artifacts and public boundaries.",
1342
  "exists": true,
1343
- "bytes": 6815,
1344
- "sha256": "ff44893cac56c229d6eb5d20d8cb261ea38e0358e6444615406affd692d8d98e"
1345
  },
1346
  {
1347
  "id": "artifact_index_builder",
@@ -1351,8 +1351,8 @@
1351
  "surface": "repo_hf",
1352
  "shows": "Generates the selective artifact catalog from local files.",
1353
  "exists": true,
1354
- "bytes": 68232,
1355
- "sha256": "ee1b210688c1b722d6ca94d1c1706c1a510218c964298b91dd3e596fa19ed2a1"
1356
  },
1357
  {
1358
  "id": "publication_audit",
@@ -1410,8 +1410,8 @@
1410
  "surface": "website_hf",
1411
  "shows": "Lists public URLs, upstream sources, and machine-readable project metadata.",
1412
  "exists": true,
1413
- "bytes": 5774,
1414
- "sha256": "8da6063de9e0b888089aa62daac6d323057dd80247b8f38be5fbce0b370ef6ac"
1415
  },
1416
  {
1417
  "id": "task_summary",
@@ -1474,7 +1474,7 @@
1474
  "path": "results/episode_task_suite/neural_mlp",
1475
  "kind": "result_directory",
1476
  "surface": "repo_hf_model",
1477
- "shows": "Stores matching PyTorch MLP results for the original task contracts.",
1478
  "exists": true,
1479
  "file_count": 60,
1480
  "bytes": 90609517
@@ -1485,7 +1485,7 @@
1485
  "path": "results/episode_task_suite/research_directions/research_direction_taxonomy.json",
1486
  "kind": "taxonomy",
1487
  "surface": "repo_hf",
1488
- "shows": "Maps the original tasks to the four Ropedia research directions as direct/proxy/diagnostic.",
1489
  "exists": true,
1490
  "bytes": 25046,
1491
  "sha256": "0e3c442e5eb9057b04b1e8c8fa723dfde6f72e7fae1378d5ea022d93f7d25ca3"
@@ -1509,8 +1509,8 @@
1509
  "surface": "repo_hf",
1510
  "shows": "Stores the historical result bundle for provenance rows with minimal and neural baselines aligned to the same 20-task window/split setup.",
1511
  "exists": true,
1512
- "bytes": 33402,
1513
- "sha256": "5a1051d25ceafe53c60dbd5b81d4b686a421c493ad09a462ad96bac100c5f3f3"
1514
  },
1515
  {
1516
  "id": "tier2_task_suite_json",
@@ -1520,8 +1520,8 @@
1520
  "surface": "website_hf",
1521
  "shows": "Machine-readable provenance definitions, setup alignment, metrics, and public source paths; the file name is historical.",
1522
  "exists": true,
1523
- "bytes": 33402,
1524
- "sha256": "5a1051d25ceafe53c60dbd5b81d4b686a421c493ad09a462ad96bac100c5f3f3"
1525
  },
1526
  {
1527
  "id": "tier2_task_suite_chart",
@@ -1531,8 +1531,8 @@
1531
  "surface": "website_hf",
1532
  "shows": "Visual summary of the historical provenance baseline metrics inside the unified 20-task suite.",
1533
  "exists": true,
1534
- "bytes": 5437,
1535
- "sha256": "3e35e476f559cd6188e5417e4d28c25efc130abafc9cab2d941bc77d559177a1"
1536
  },
1537
  {
1538
  "id": "tier2_task_suite_builder",
@@ -1542,8 +1542,8 @@
1542
  "surface": "repo_hf",
1543
  "shows": "Regenerates the historical provenance rows from shared windows plus the local public-sample annotation HDF5; the script name is historical.",
1544
  "exists": true,
1545
- "bytes": 47102,
1546
- "sha256": "3cddefaaeedd8efb65e6db956cbd13605e4a5b3772d98fa831d34fd6f92850de"
1547
  },
1548
  {
1549
  "id": "task_walkthroughs",
@@ -1564,8 +1564,8 @@
1564
  "surface": "website_hf",
1565
  "shows": "Presents the task suite and sample modality thumbnails with metrics generated from committed files.",
1566
  "exists": true,
1567
- "bytes": 1903454,
1568
- "sha256": "6667eb856cf61ada9f868807b5d5c6ccde06e4f791b2f9dd567d98b71b307415"
1569
  },
1570
  {
1571
  "id": "modality_atlas",
@@ -1672,7 +1672,7 @@
1672
  "path": "results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md",
1673
  "kind": "scaleup_status",
1674
  "surface": "repo_hf",
1675
- "shows": "Summarizes same-split simple and neural metadata baselines for the 12 original task ids, with unsupported markers for tasks that need missing raw 128 feature blocks.",
1676
  "exists": true,
1677
  "bytes": 2238,
1678
  "sha256": "c70440aa502ec569a840159ab7e05b8e7d4ed70e0091ad9a4b2fb3fb0d3803c1"
@@ -1696,8 +1696,8 @@
1696
  "surface": "repo_hf",
1697
  "shows": "Reader-facing comparison of the single-episode task suite, 128-episode aligned baselines, Qwen3-Omni packages, and Cosmos3 future-window branch.",
1698
  "exists": true,
1699
- "bytes": 15983,
1700
- "sha256": "4db248566972e811aac6ca06582f233414821624f00f9d4fc4a1b66b2e00401f"
1701
  },
1702
  {
1703
  "id": "omni_model_comparison_json",
@@ -1707,8 +1707,8 @@
1707
  "surface": "repo_hf",
1708
  "shows": "Machine-readable comparison of the current result versions, per-task aligned baselines, verified Qwen3 packages, and Cosmos3 package.",
1709
  "exists": true,
1710
- "bytes": 82088,
1711
- "sha256": "82ccc2932cad63a9ebad85da53e694b18ef626aa3720bda3ed5da30f3dc5e121"
1712
  },
1713
  {
1714
  "id": "cosmos3_nano_verified_summary",
 
1
  {
2
  "title": "Ropedia Xperience-10M Task Suite Artifact Index",
3
+ "generated_at_utc": "2026-06-21T15:19:00+00:00",
4
  "status": "pass",
5
  "artifact_count": 228,
6
  "missing": [],
 
59
  "surface": "website_hf",
60
  "shows": "Machine-readable first-reader project brief for the website and Hugging Face mirrors.",
61
  "exists": true,
62
+ "bytes": 4032,
63
+ "sha256": "328d601390fdd61c836434e00cfe27670ef5fb96252270975c4ca339f2a51bfa"
64
  },
65
  {
66
  "id": "project_status",
 
70
  "surface": "repo_hf",
71
  "shows": "Gives a compact current-state table for first-pass readers.",
72
  "exists": true,
73
+ "bytes": 16013,
74
+ "sha256": "5ad142b601ad982ce59620bd7fa50446c8837050b0331b2be4a357280b295c21"
75
  },
76
  {
77
  "id": "project_status_json",
 
81
  "surface": "website_hf",
82
  "shows": "Machine-readable copy of the current project status for website and HF mirrors.",
83
  "exists": true,
84
+ "bytes": 23232,
85
+ "sha256": "406c48ec858b5f288c7ebef6eefc0ed94dc8bad11bf9221f435b9c8aca547ea3"
86
  },
87
  {
88
  "id": "glossary",
 
576
  "surface": "website_hf",
577
  "shows": "Gives a short project path with scope status and public surfaces.",
578
  "exists": true,
579
+ "bytes": 10018,
580
+ "sha256": "6b7ae7fe0df1a9e4a12d241a3162540b0cf1ade86803dec8aac68e3dc99bfc66"
581
  },
582
  {
583
  "id": "artifact_guide",
 
587
  "surface": "repo_hf",
588
  "shows": "Gives the human-readable map from project scope to data, tasks, platform mirrors, and scale-up status.",
589
  "exists": true,
590
+ "bytes": 20601,
591
+ "sha256": "e0e4ad50271ab1d58d2fe97de5b3451a52f034996b54d0ee9499b562b9decbbf"
592
  },
593
  {
594
  "id": "official_dataset_card_alignment",
 
632
  "shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
633
  "exists": true,
634
  "bytes": 4432,
635
+ "sha256": "5ab2ea4bfefe9f5bc7854f02b2e1e2b5206766a54447647191828da1a1a2077c"
636
  },
637
  {
638
  "id": "source_alignment_validator",
 
686
  "surface": "repo_hf",
687
  "shows": "Defines the window unit, chronological split, task metrics, leakage controls, and current limitations.",
688
  "exists": true,
689
+ "bytes": 8905,
690
+ "sha256": "f82e9b9c4a07e95776005968788e7acdaae9e322991113d79432d59057181add"
691
  },
692
  {
693
  "id": "evaluation_protocol_json",
 
697
  "surface": "website_hf",
698
  "shows": "Machine-readable protocol generated from committed task metrics for website and HF mirrors.",
699
  "exists": true,
700
+ "bytes": 24047,
701
+ "sha256": "d8f61b646a2f3f1e0af901dbdaff310ebfeea90622c93a34b9e35f34be98b896"
702
  },
703
  {
704
  "id": "evaluation_protocol_builder",
 
708
  "surface": "repo_hf",
709
  "shows": "Regenerates the protocol from committed summary metrics and task artifacts.",
710
  "exists": true,
711
+ "bytes": 19825,
712
+ "sha256": "aa9de1582f8fa79c1850e10e69fb125c0e3c1add433c7ebedc104c2efb42272e"
713
  },
714
  {
715
  "id": "task_suite_20",
 
719
  "surface": "repo_hf",
720
  "shows": "Reader-facing table for the single unified public-sample task suite under the same window, split, feature, and baseline contract.",
721
  "exists": true,
722
+ "bytes": 4845,
723
+ "sha256": "076a68734f20e2660d1eddba460672c1246951b893494396f1281d6423f3627a"
724
  },
725
  {
726
  "id": "task_suite_20_json",
 
730
  "surface": "website_hf",
731
  "shows": "Machine-readable unified 20-task index for the website, Hugging Face mirrors, and live verification.",
732
  "exists": true,
733
+ "bytes": 34585,
734
+ "sha256": "75145285cf71bc3bb9a10377a1921b60e85c4546dc8b858102b3c26e94c11a01"
735
  },
736
  {
737
  "id": "task_suite_20_builder",
 
741
  "surface": "repo_hf",
742
  "shows": "Regenerates the unified 20-task JSON and Markdown from the public-sample metrics plus the historical provenance result bundle.",
743
  "exists": true,
744
+ "bytes": 12157,
745
+ "sha256": "157265b5c025f279ce1eb56c52dd720ce0969b8426d5887030bfa179a3b565e0"
746
  },
747
  {
748
  "id": "unified_task_model_radar_json",
 
1005
  "surface": "repo_hf",
1006
  "shows": "Summarizes the main research lessons from committed metrics and identifies which experiments need held-out episodes.",
1007
  "exists": true,
1008
+ "bytes": 5175,
1009
+ "sha256": "385d1b77b41c632925bbd27878c334839303462d03a3b9d358326951b1088da8"
1010
  },
1011
  {
1012
  "id": "research_takeaways_json",
 
1016
  "surface": "website_hf",
1017
  "shows": "Machine-readable result interpretation for the website, HF cards, and mirror checks.",
1018
  "exists": true,
1019
+ "bytes": 7165,
1020
+ "sha256": "f1ddead60f986e3036206bc3c70d4bdda422a8be4761b285eb89c9c49d9832b6"
1021
  },
1022
  {
1023
  "id": "research_takeaways_builder",
 
1027
  "surface": "repo_hf",
1028
  "shows": "Regenerates the research takeaways from committed summary metrics and task result artifacts.",
1029
  "exists": true,
1030
+ "bytes": 13499,
1031
+ "sha256": "fc749125f9be87ee0db5b66918342da5c0378d6c97fb1acabe9688f920554c39"
1032
  },
1033
  {
1034
  "id": "audio_ablation_script",
 
1036
  "path": "scripts/audio_ablation_and_raw_upgrade.py",
1037
  "kind": "result_interpretation",
1038
  "surface": "repo_hf",
1039
+ "shows": "Measures audio contribution variants across the walkthrough-backed task contracts.",
1040
  "exists": true,
1041
  "bytes": 43159,
1042
  "sha256": "2444f2e52efb975be931b33d66b7180d53031e1d5e821719122160f92f4540aa"
 
1080
  "path": "docs/assets/charts/audio_ablation_delta.svg",
1081
  "kind": "visual_evidence",
1082
  "surface": "website_hf",
1083
+ "shows": "Bar chart of measured current-audio primary-metric deltas across the walkthrough-backed tasks.",
1084
  "exists": true,
1085
  "bytes": 4146,
1086
  "sha256": "187dbabe01f9ff18841ff61a1e7fbf85bebdd188cc0f248bb5090d64528e7568"
 
1093
  "surface": "repo_hf",
1094
  "shows": "Catalogs public figures, charts, modality thumbnails, dimensions, hashes, roles, and source scripts.",
1095
  "exists": true,
1096
+ "bytes": 7027,
1097
+ "sha256": "b7b507c35cd3cba2765586e9703a447c8025c89658c3daa390df67db4211d0fc"
1098
  },
1099
  {
1100
  "id": "figure_index_json",
 
1104
  "surface": "website_hf",
1105
  "shows": "Machine-readable visual asset index for website and Hugging Face mirrors.",
1106
  "exists": true,
1107
+ "bytes": 19485,
1108
+ "sha256": "4f225bf08f00fbe843999d6bd2b3d5f5d6c17f2ff67e1f6a85eee9094c6bb6a3"
1109
  },
1110
  {
1111
  "id": "figure_index_builder",
 
1115
  "surface": "repo_hf",
1116
  "shows": "Regenerates visual-asset hashes, dimensions, and source-script provenance.",
1117
  "exists": true,
1118
+ "bytes": 16845,
1119
+ "sha256": "3f91f7f13a3fb08ab57c2f0a6b320102e9d5ae19b102b71499edb5b8fd5a2cec"
1120
  },
1121
  {
1122
  "id": "brand_assets_json",
 
1182
  "shows": "Machine-readable release-check summary for validators, mirrors, and public project surfaces.",
1183
  "exists": true,
1184
  "bytes": 8640,
1185
+ "sha256": "6e54f6828b8fef97e963a9a56bccc91162b8a632f6897743095e32407fa0db98"
1186
  },
1187
  {
1188
  "id": "public_surface_qa",
 
1226
  "volatile": true,
1227
  "shows": "Machine-readable report for SEO/social metadata, accessible tab semantics, public links, project links, and clear project presentation.",
1228
  "exists": true,
1229
+ "bytes": 7691,
1230
  "hash_policy": "existence_and_size_only"
1231
  },
1232
  {
 
1307
  "volatile": true,
1308
  "shows": "Records the last live GitHub/HF URL verification after upload.",
1309
  "exists": true,
1310
+ "bytes": 189990,
1311
  "hash_policy": "existence_and_size_only"
1312
  },
1313
  {
 
1340
  "surface": "website_hf",
1341
  "shows": "Machine-readable reproduction steps with expected artifacts and public boundaries.",
1342
  "exists": true,
1343
+ "bytes": 6836,
1344
+ "sha256": "3f1e1615c6c0853d21bc14a8eab20af3757ecc443e72dab7744b3c0ec149fa87"
1345
  },
1346
  {
1347
  "id": "artifact_index_builder",
 
1351
  "surface": "repo_hf",
1352
  "shows": "Generates the selective artifact catalog from local files.",
1353
  "exists": true,
1354
+ "bytes": 68279,
1355
+ "sha256": "69b43ad5d3dc5a6893c4592fa47fff6a7a87691728ec2c61b121ec262d00bf2a"
1356
  },
1357
  {
1358
  "id": "publication_audit",
 
1410
  "surface": "website_hf",
1411
  "shows": "Lists public URLs, upstream sources, and machine-readable project metadata.",
1412
  "exists": true,
1413
+ "bytes": 5739,
1414
+ "sha256": "d972f30552dd346ec296f88d004c70bf2fb99e92e44ddc8d3a6dad5634f0336d"
1415
  },
1416
  {
1417
  "id": "task_summary",
 
1474
  "path": "results/episode_task_suite/neural_mlp",
1475
  "kind": "result_directory",
1476
  "surface": "repo_hf_model",
1477
+ "shows": "Stores matching PyTorch MLP results for the walkthrough-backed task contracts.",
1478
  "exists": true,
1479
  "file_count": 60,
1480
  "bytes": 90609517
 
1485
  "path": "results/episode_task_suite/research_directions/research_direction_taxonomy.json",
1486
  "kind": "taxonomy",
1487
  "surface": "repo_hf",
1488
+ "shows": "Maps the walkthrough-backed tasks to the four Ropedia research directions as direct/proxy/diagnostic.",
1489
  "exists": true,
1490
  "bytes": 25046,
1491
  "sha256": "0e3c442e5eb9057b04b1e8c8fa723dfde6f72e7fae1378d5ea022d93f7d25ca3"
 
1509
  "surface": "repo_hf",
1510
  "shows": "Stores the historical result bundle for provenance rows with minimal and neural baselines aligned to the same 20-task window/split setup.",
1511
  "exists": true,
1512
+ "bytes": 33575,
1513
+ "sha256": "d6d2f851325a691e77aed6d948f7355b16cf8d81ca35bf115e7309a7b7308efd"
1514
  },
1515
  {
1516
  "id": "tier2_task_suite_json",
 
1520
  "surface": "website_hf",
1521
  "shows": "Machine-readable provenance definitions, setup alignment, metrics, and public source paths; the file name is historical.",
1522
  "exists": true,
1523
+ "bytes": 33575,
1524
+ "sha256": "d6d2f851325a691e77aed6d948f7355b16cf8d81ca35bf115e7309a7b7308efd"
1525
  },
1526
  {
1527
  "id": "tier2_task_suite_chart",
 
1531
  "surface": "website_hf",
1532
  "shows": "Visual summary of the historical provenance baseline metrics inside the unified 20-task suite.",
1533
  "exists": true,
1534
+ "bytes": 5453,
1535
+ "sha256": "e9da29c57f42b29a7a05622fee1335089ac2b6fc9692a3b49fa5b753904db9dc"
1536
  },
1537
  {
1538
  "id": "tier2_task_suite_builder",
 
1542
  "surface": "repo_hf",
1543
  "shows": "Regenerates the historical provenance rows from shared windows plus the local public-sample annotation HDF5; the script name is historical.",
1544
  "exists": true,
1545
+ "bytes": 47155,
1546
+ "sha256": "569f05c1299f5186778ec75280188969fe1a5a76ae8553738fd44fc2faaab195"
1547
  },
1548
  {
1549
  "id": "task_walkthroughs",
 
1564
  "surface": "website_hf",
1565
  "shows": "Presents the task suite and sample modality thumbnails with metrics generated from committed files.",
1566
  "exists": true,
1567
+ "bytes": 1897278,
1568
+ "sha256": "71b1ab150e952cf902488226c65b3822d8016974f63d111204c1eb1a7745faad"
1569
  },
1570
  {
1571
  "id": "modality_atlas",
 
1672
  "path": "results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md",
1673
  "kind": "scaleup_status",
1674
  "surface": "repo_hf",
1675
+ "shows": "Summarizes same-split simple and neural metadata baselines for the walkthrough-backed task ids, with unsupported markers for tasks that need missing raw 128 feature blocks.",
1676
  "exists": true,
1677
  "bytes": 2238,
1678
  "sha256": "c70440aa502ec569a840159ab7e05b8e7d4ed70e0091ad9a4b2fb3fb0d3803c1"
 
1696
  "surface": "repo_hf",
1697
  "shows": "Reader-facing comparison of the single-episode task suite, 128-episode aligned baselines, Qwen3-Omni packages, and Cosmos3 future-window branch.",
1698
  "exists": true,
1699
+ "bytes": 15997,
1700
+ "sha256": "c8296c51eb1d67d155b84e3a39f703642d30e855fee7ee7d6ca437966b5c760b"
1701
  },
1702
  {
1703
  "id": "omni_model_comparison_json",
 
1707
  "surface": "repo_hf",
1708
  "shows": "Machine-readable comparison of the current result versions, per-task aligned baselines, verified Qwen3 packages, and Cosmos3 package.",
1709
  "exists": true,
1710
+ "bytes": 82102,
1711
+ "sha256": "6b246dbdb2685efdc9d0a92bb8c446a89523a1787ebc8a883805b4179e266dd1"
1712
  },
1713
  {
1714
  "id": "cosmos3_nano_verified_summary",
data/evaluation_protocol.json CHANGED
@@ -2,7 +2,7 @@
2
  "title": "Ropedia Xperience-10M Task Suite Evaluation Protocol",
3
  "status": "pass",
4
  "version": "2026-06-01",
5
- "generated_at_utc": "2026-06-21T14:40:33+00:00",
6
  "source_files": [
7
  "docs/data/summary_metrics.json",
8
  "results/episode_task_suite/summary_report.json",
@@ -26,8 +26,8 @@
26
  "task_suite": {
27
  "status": "unified_public_sample_suite",
28
  "task_count": 20,
29
- "original_public_sample_tasks": 12,
30
- "additional_public_sample_tasks": 8,
31
  "unified_results": "docs/data/task_suite_20.json",
32
  "legacy_additional_task_result_path": "docs/data/tier2_task_suite.json",
33
  "legacy_path_note": "The tier2_task_suite path is retained for stable links only; it is provenance inside the same 20-task suite."
@@ -82,7 +82,7 @@
82
  {
83
  "task": "timeline_action",
84
  "task_display_name": "Action Recognition",
85
- "origin": "original_public_sample_tasks",
86
  "family": "supervised classification",
87
  "unit": "single window",
88
  "input": "current 20-frame all-feature window",
@@ -105,7 +105,7 @@
105
  {
106
  "task": "timeline_subtask",
107
  "task_display_name": "Procedure Step Recognition",
108
- "origin": "original_public_sample_tasks",
109
  "family": "supervised classification",
110
  "unit": "single window",
111
  "input": "current 20-frame all-feature window",
@@ -128,7 +128,7 @@
128
  {
129
  "task": "transition_detection",
130
  "task_display_name": "Action Boundary Detection",
131
- "origin": "original_public_sample_tasks",
132
  "family": "temporal diagnostic",
133
  "unit": "single window",
134
  "input": "current 20-frame all-feature window",
@@ -151,7 +151,7 @@
151
  {
152
  "task": "next_action",
153
  "task_display_name": "Next-Action Prediction",
154
- "origin": "original_public_sample_tasks",
155
  "family": "short-horizon prediction",
156
  "unit": "single window",
157
  "input": "current 20-frame all-feature window at time t",
@@ -174,7 +174,7 @@
174
  {
175
  "task": "hand_trajectory_forecast",
176
  "task_display_name": "Hand Trajectory Forecasting",
177
- "origin": "original_public_sample_tasks",
178
  "family": "trajectory regression",
179
  "unit": "single window",
180
  "input": "current all-feature window",
@@ -197,7 +197,7 @@
197
  {
198
  "task": "contact_prediction",
199
  "task_display_name": "Contact State Prediction",
200
- "origin": "original_public_sample_tasks",
201
  "family": "binary classification",
202
  "unit": "single window",
203
  "input": "non-contact and non-caption feature blocks",
@@ -220,7 +220,7 @@
220
  {
221
  "task": "object_relevance",
222
  "task_display_name": "Object Relevance Prediction",
223
- "origin": "original_public_sample_tasks",
224
  "family": "multi-label classification",
225
  "unit": "single window",
226
  "input": "non-caption feature blocks",
@@ -243,7 +243,7 @@
243
  {
244
  "task": "caption_grounding",
245
  "task_display_name": "Language Grounding",
246
- "origin": "original_public_sample_tasks",
247
  "family": "retrieval",
248
  "unit": "caption query",
249
  "input": "caption object/interaction query plus candidate sensor windows",
@@ -266,7 +266,7 @@
266
  {
267
  "task": "cross_modal_retrieval",
268
  "task_display_name": "Cross-Modal Retrieval",
269
- "origin": "original_public_sample_tasks",
270
  "family": "retrieval",
271
  "unit": "sensor query",
272
  "input": "motion, IMU, and camera query features",
@@ -289,7 +289,7 @@
289
  {
290
  "task": "modality_reconstruction",
291
  "task_display_name": "Cross-Modal Reconstruction",
292
- "origin": "original_public_sample_tasks",
293
  "family": "cross-modal regression",
294
  "unit": "single window",
295
  "input": "motion, IMU, and camera features",
@@ -311,7 +311,7 @@
311
  {
312
  "task": "temporal_order",
313
  "task_display_name": "Temporal Order Verification",
314
- "origin": "original_public_sample_tasks",
315
  "family": "pairwise diagnostic",
316
  "unit": "adjacent window pair",
317
  "input": "two adjacent windows",
@@ -334,7 +334,7 @@
334
  {
335
  "task": "misalignment_detection",
336
  "task_display_name": "Multimodal Synchronization Detection",
337
- "origin": "original_public_sample_tasks",
338
  "family": "pairwise diagnostic",
339
  "unit": "paired modality window",
340
  "input": "motion side plus visual/depth side",
@@ -357,7 +357,7 @@
357
  {
358
  "task": "long_horizon_next_action",
359
  "task_display_name": "Long-Horizon Next-Action Forecasting",
360
- "origin": "additional_public_sample_tasks",
361
  "family": "classification",
362
  "unit": "single aligned window",
363
  "input": "Current 20-frame non-caption multimodal window.",
@@ -375,7 +375,7 @@
375
  {
376
  "task": "next_subtask_forecast",
377
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
378
- "origin": "additional_public_sample_tasks",
379
  "family": "classification",
380
  "unit": "single aligned window",
381
  "input": "Current 20-frame non-caption multimodal window.",
@@ -393,7 +393,7 @@
393
  {
394
  "task": "interaction_text_prediction",
395
  "task_display_name": "Interaction Text Prediction",
396
- "origin": "additional_public_sample_tasks",
397
  "family": "classification",
398
  "unit": "single aligned window",
399
  "input": "Current 20-frame sensor window with caption-text features removed.",
@@ -411,7 +411,7 @@
411
  {
412
  "task": "action_object_relation",
413
  "task_display_name": "Action-Object Relation Prediction",
414
- "origin": "additional_public_sample_tasks",
415
  "family": "classification",
416
  "unit": "single aligned window",
417
  "input": "Current 20-frame sensor window with caption-text features removed.",
@@ -429,7 +429,7 @@
429
  {
430
  "task": "object_set_forecast",
431
  "task_display_name": "Future Object-Set Forecasting",
432
- "origin": "additional_public_sample_tasks",
433
  "family": "multi_label",
434
  "unit": "single aligned window",
435
  "input": "Current 20-frame sensor window with caption-text features removed.",
@@ -447,7 +447,7 @@
447
  {
448
  "task": "imu_to_hand_pose",
449
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
450
- "origin": "additional_public_sample_tasks",
451
  "family": "regression",
452
  "unit": "single aligned window",
453
  "input": "Current IMU acceleration/gyroscope feature block only.",
@@ -465,7 +465,7 @@
465
  {
466
  "task": "camera_view_sync_retrieval",
467
  "task_display_name": "Camera-View Synchronization Retrieval",
468
- "origin": "additional_public_sample_tasks",
469
  "family": "retrieval",
470
  "unit": "held-out query window",
471
  "input": "Fisheye camera-1 feature query projected into fisheye camera-3 feature space.",
@@ -483,7 +483,7 @@
483
  {
484
  "task": "time_to_transition",
485
  "task_display_name": "Time-to-Next-Transition Regression",
486
- "origin": "additional_public_sample_tasks",
487
  "family": "regression",
488
  "unit": "single aligned window",
489
  "input": "Current 20-frame non-caption multimodal window.",
 
2
  "title": "Ropedia Xperience-10M Task Suite Evaluation Protocol",
3
  "status": "pass",
4
  "version": "2026-06-01",
5
+ "generated_at_utc": "2026-06-21T15:20:33+00:00",
6
  "source_files": [
7
  "docs/data/summary_metrics.json",
8
  "results/episode_task_suite/summary_report.json",
 
26
  "task_suite": {
27
  "status": "unified_public_sample_suite",
28
  "task_count": 20,
29
+ "public_framing": "all 20 public-sample task contracts are presented as one suite",
30
+ "legacy_provenance_rows": 8,
31
  "unified_results": "docs/data/task_suite_20.json",
32
  "legacy_additional_task_result_path": "docs/data/tier2_task_suite.json",
33
  "legacy_path_note": "The tier2_task_suite path is retained for stable links only; it is provenance inside the same 20-task suite."
 
82
  {
83
  "task": "timeline_action",
84
  "task_display_name": "Action Recognition",
85
+ "provenance_source": "walkthrough_backed_task_contract",
86
  "family": "supervised classification",
87
  "unit": "single window",
88
  "input": "current 20-frame all-feature window",
 
105
  {
106
  "task": "timeline_subtask",
107
  "task_display_name": "Procedure Step Recognition",
108
+ "provenance_source": "walkthrough_backed_task_contract",
109
  "family": "supervised classification",
110
  "unit": "single window",
111
  "input": "current 20-frame all-feature window",
 
128
  {
129
  "task": "transition_detection",
130
  "task_display_name": "Action Boundary Detection",
131
+ "provenance_source": "walkthrough_backed_task_contract",
132
  "family": "temporal diagnostic",
133
  "unit": "single window",
134
  "input": "current 20-frame all-feature window",
 
151
  {
152
  "task": "next_action",
153
  "task_display_name": "Next-Action Prediction",
154
+ "provenance_source": "walkthrough_backed_task_contract",
155
  "family": "short-horizon prediction",
156
  "unit": "single window",
157
  "input": "current 20-frame all-feature window at time t",
 
174
  {
175
  "task": "hand_trajectory_forecast",
176
  "task_display_name": "Hand Trajectory Forecasting",
177
+ "provenance_source": "walkthrough_backed_task_contract",
178
  "family": "trajectory regression",
179
  "unit": "single window",
180
  "input": "current all-feature window",
 
197
  {
198
  "task": "contact_prediction",
199
  "task_display_name": "Contact State Prediction",
200
+ "provenance_source": "walkthrough_backed_task_contract",
201
  "family": "binary classification",
202
  "unit": "single window",
203
  "input": "non-contact and non-caption feature blocks",
 
220
  {
221
  "task": "object_relevance",
222
  "task_display_name": "Object Relevance Prediction",
223
+ "provenance_source": "walkthrough_backed_task_contract",
224
  "family": "multi-label classification",
225
  "unit": "single window",
226
  "input": "non-caption feature blocks",
 
243
  {
244
  "task": "caption_grounding",
245
  "task_display_name": "Language Grounding",
246
+ "provenance_source": "walkthrough_backed_task_contract",
247
  "family": "retrieval",
248
  "unit": "caption query",
249
  "input": "caption object/interaction query plus candidate sensor windows",
 
266
  {
267
  "task": "cross_modal_retrieval",
268
  "task_display_name": "Cross-Modal Retrieval",
269
+ "provenance_source": "walkthrough_backed_task_contract",
270
  "family": "retrieval",
271
  "unit": "sensor query",
272
  "input": "motion, IMU, and camera query features",
 
289
  {
290
  "task": "modality_reconstruction",
291
  "task_display_name": "Cross-Modal Reconstruction",
292
+ "provenance_source": "walkthrough_backed_task_contract",
293
  "family": "cross-modal regression",
294
  "unit": "single window",
295
  "input": "motion, IMU, and camera features",
 
311
  {
312
  "task": "temporal_order",
313
  "task_display_name": "Temporal Order Verification",
314
+ "provenance_source": "walkthrough_backed_task_contract",
315
  "family": "pairwise diagnostic",
316
  "unit": "adjacent window pair",
317
  "input": "two adjacent windows",
 
334
  {
335
  "task": "misalignment_detection",
336
  "task_display_name": "Multimodal Synchronization Detection",
337
+ "provenance_source": "walkthrough_backed_task_contract",
338
  "family": "pairwise diagnostic",
339
  "unit": "paired modality window",
340
  "input": "motion side plus visual/depth side",
 
357
  {
358
  "task": "long_horizon_next_action",
359
  "task_display_name": "Long-Horizon Next-Action Forecasting",
360
+ "provenance_source": "historical_result_bundle",
361
  "family": "classification",
362
  "unit": "single aligned window",
363
  "input": "Current 20-frame non-caption multimodal window.",
 
375
  {
376
  "task": "next_subtask_forecast",
377
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
378
+ "provenance_source": "historical_result_bundle",
379
  "family": "classification",
380
  "unit": "single aligned window",
381
  "input": "Current 20-frame non-caption multimodal window.",
 
393
  {
394
  "task": "interaction_text_prediction",
395
  "task_display_name": "Interaction Text Prediction",
396
+ "provenance_source": "historical_result_bundle",
397
  "family": "classification",
398
  "unit": "single aligned window",
399
  "input": "Current 20-frame sensor window with caption-text features removed.",
 
411
  {
412
  "task": "action_object_relation",
413
  "task_display_name": "Action-Object Relation Prediction",
414
+ "provenance_source": "historical_result_bundle",
415
  "family": "classification",
416
  "unit": "single aligned window",
417
  "input": "Current 20-frame sensor window with caption-text features removed.",
 
429
  {
430
  "task": "object_set_forecast",
431
  "task_display_name": "Future Object-Set Forecasting",
432
+ "provenance_source": "historical_result_bundle",
433
  "family": "multi_label",
434
  "unit": "single aligned window",
435
  "input": "Current 20-frame sensor window with caption-text features removed.",
 
447
  {
448
  "task": "imu_to_hand_pose",
449
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
450
+ "provenance_source": "historical_result_bundle",
451
  "family": "regression",
452
  "unit": "single aligned window",
453
  "input": "Current IMU acceleration/gyroscope feature block only.",
 
465
  {
466
  "task": "camera_view_sync_retrieval",
467
  "task_display_name": "Camera-View Synchronization Retrieval",
468
+ "provenance_source": "historical_result_bundle",
469
  "family": "retrieval",
470
  "unit": "held-out query window",
471
  "input": "Fisheye camera-1 feature query projected into fisheye camera-3 feature space.",
 
483
  {
484
  "task": "time_to_transition",
485
  "task_display_name": "Time-to-Next-Transition Regression",
486
+ "provenance_source": "historical_result_bundle",
487
  "family": "regression",
488
  "unit": "single aligned window",
489
  "input": "Current 20-frame non-caption multimodal window.",
data/live_publication_status.json CHANGED
The diff for this file is too large to render. See raw diff
 
data/mirror_parity.json CHANGED
The diff for this file is too large to render. See raw diff
 
data/omni_model_comparison.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
- "generated_at_utc": "2026-06-21T10:47:04+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
7
  "comparison_rule": "Compare only rows with the same scope and target. Single-episode raw-feature metrics, 128-episode metadata baselines, Qwen3 structured JSON metrics, and the two Cosmos3 targets answer different questions: Nano future-window retrieval versus Super structured JSON Reasoner evaluation.",
8
  "version_reading_notes": [
9
- "Version 1 is the public-sample 20-task surface: original core heads, tasks 13-20, and the 180-row method-task matrix.",
10
  "Version 2 is the selected 128-episode same-split simple/NN baseline alignment.",
11
  "The selected-128 model-diagnostic group contains the current Qwen3-Omni LoRA JSON-task row, Cosmos3-Nano future-window compatibility result, Cosmos3-Super Reasoner base-weight JSON-task evaluation, and the separate Cosmos3-Super Forward-Dynamics LoRA adapter artifact."
12
  ],
 
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
+ "generated_at_utc": "2026-06-21T15:17:00+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
7
  "comparison_rule": "Compare only rows with the same scope and target. Single-episode raw-feature metrics, 128-episode metadata baselines, Qwen3 structured JSON metrics, and the two Cosmos3 targets answer different questions: Nano future-window retrieval versus Super structured JSON Reasoner evaluation.",
8
  "version_reading_notes": [
9
+ "Version 1 is the public-sample 20-task surface: unified task heads, historical provenance rows, and the 180-row method-task matrix.",
10
  "Version 2 is the selected 128-episode same-split simple/NN baseline alignment.",
11
  "The selected-128 model-diagnostic group contains the current Qwen3-Omni LoRA JSON-task row, Cosmos3-Nano future-window compatibility result, Cosmos3-Super Reasoner base-weight JSON-task evaluation, and the separate Cosmos3-Super Forward-Dynamics LoRA adapter artifact."
12
  ],
data/project_manifest.json CHANGED
@@ -23,9 +23,8 @@
23
  "qwen3_omni_json_quality_target_met": true,
24
  "qwen3_omni_lora_adapter_repo": "https://huggingface.co/cy0307/ropedia-qwen3-omni-lora-128ep",
25
  "task_count": 20,
26
- "original_public_sample_task_count": 12,
27
- "additional_public_sample_task_count": 8,
28
- "legacy_tasks_13_to_20_result_path": "docs/data/tier2_task_suite.json"
29
  },
30
  "public_surfaces": {
31
  "github_repo": "https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite",
@@ -96,7 +95,7 @@
96
  "task_walkthroughs": "docs/data/task_walkthroughs.json",
97
  "task_suite_20": "TASK_SUITE_20.md",
98
  "task_suite_20_json": "docs/data/task_suite_20.json",
99
- "tasks_13_to_20_result_bundle": "docs/data/tier2_task_suite.json"
100
  },
101
  "citation_files": {
102
  "citation_cff": "CITATION.cff",
 
23
  "qwen3_omni_json_quality_target_met": true,
24
  "qwen3_omni_lora_adapter_repo": "https://huggingface.co/cy0307/ropedia-qwen3-omni-lora-128ep",
25
  "task_count": 20,
26
+ "task_surface_framing": "unified_20_task_suite",
27
+ "legacy_provenance_result_path": "docs/data/tier2_task_suite.json"
 
28
  },
29
  "public_surfaces": {
30
  "github_repo": "https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite",
 
95
  "task_walkthroughs": "docs/data/task_walkthroughs.json",
96
  "task_suite_20": "TASK_SUITE_20.md",
97
  "task_suite_20_json": "docs/data/task_suite_20.json",
98
+ "historical_provenance_result_bundle": "docs/data/tier2_task_suite.json"
99
  },
100
  "citation_files": {
101
  "citation_cff": "CITATION.cff",
data/project_packet.json CHANGED
@@ -15,9 +15,8 @@
15
  "cosmos3_super_forward_dynamics_lora_status": "The first Cosmos3-Super fine-tuned adapter branch is verified as a forward-dynamics LoRA over camera-pose proxy targets; it reports loss metrics, not JSON action-label accuracy.",
16
  "task_suite_enhancement_128_status": "Current no-new-episode enhancement pack recommends multiscale_20s10_40s20_80s40, hierarchical action/subtask targets, label-normalized scoring, and raw-feature shards before adding more episodes.",
17
  "task_count": 20,
18
- "original_public_sample_task_count": 12,
19
- "additional_public_sample_task_count": 8,
20
- "legacy_tasks_13_to_20_result_path": "docs/data/tier2_task_suite.json"
21
  },
22
  "reading_path": [
23
  {
@@ -110,7 +109,7 @@
110
  "results/episode_task_suite/neural_mlp/",
111
  "docs/data/summary_metrics.json"
112
  ],
113
- "readout": "The unified suite has 20 task contracts; tasks 1-12 have walkthroughs and neural MLP heads, and tasks 13-20 have aligned minimal/neural result bundles under the historical tier2_task_suite path."
114
  },
115
  {
116
  "step": 8,
 
15
  "cosmos3_super_forward_dynamics_lora_status": "The first Cosmos3-Super fine-tuned adapter branch is verified as a forward-dynamics LoRA over camera-pose proxy targets; it reports loss metrics, not JSON action-label accuracy.",
16
  "task_suite_enhancement_128_status": "Current no-new-episode enhancement pack recommends multiscale_20s10_40s20_80s40, hierarchical action/subtask targets, label-normalized scoring, and raw-feature shards before adding more episodes.",
17
  "task_count": 20,
18
+ "task_surface_framing": "unified_20_task_suite",
19
+ "legacy_provenance_result_path": "docs/data/tier2_task_suite.json"
 
20
  },
21
  "reading_path": [
22
  {
 
109
  "results/episode_task_suite/neural_mlp/",
110
  "docs/data/summary_metrics.json"
111
  ],
112
+ "readout": "The unified suite has 20 task contracts in one task surface. Walkthrough-backed tasks, aligned minimal/neural result bundles, and historical tier2_task_suite provenance paths are all linked from TASK_SUITE_20.md and docs/data/task_suite_20.json."
113
  },
114
  {
115
  "step": 8,
data/project_status.json CHANGED
@@ -62,9 +62,8 @@
62
  "task_suite_enhancement_128_recommended_export": "multiscale_20s10_40s20_80s40",
63
  "task_suite_enhancement_128_estimated_windows": 106095,
64
  "task_count": 20,
65
- "original_public_sample_task_count": 12,
66
- "additional_public_sample_task_count": 8,
67
- "legacy_tasks_13_to_20_result_path": "docs/data/tier2_task_suite.json"
68
  },
69
  "rows": [
70
  {
@@ -86,7 +85,7 @@
86
  "results/episode_task_suite/",
87
  "results/episode_task_suite/tier2_task_suite/"
88
  ],
89
- "readout": "All 20 task contracts have committed minimal metrics; tasks 13-20 reuse the same 20-frame windows, 5-frame stride, chronological split, and minimal/neural head pattern. The tier2_task_suite path is historical and now stores tasks 13-20, not a separate public tier."
90
  },
91
  {
92
  "area": "180-result method matrix",
@@ -116,7 +115,7 @@
116
  "results/audio_ablation/",
117
  "docs/data/audio_ablation_summary.json"
118
  ],
119
- "readout": "Audio variants improve the primary metric on 6 of the original task contracts in this single-episode setting."
120
  },
121
  {
122
  "area": "Evaluation protocol",
@@ -355,7 +354,7 @@
355
  "The Cosmos3-Nano future-window package is verified as a compatibility adapter result, Cosmos3-Super Reasoner is verified as a base-weight evaluation, and Cosmos3-Super Forward-Dynamics LoRA is verified as the first fine-tuned Super adapter artifact. Cosmos3-Super adapter weights belong in cy0307/ropedia-cosmos3-super-forward-dynamics-lora-128ep; verified_public packages exclude safetensors.",
356
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
357
  "Audio is one of the synchronized source modalities in the current task representation.",
358
- "The audio ablation report compares audio/no-audio variants across the original task contracts in results/audio_ablation/.",
359
  "Foundation-model selection is explicit: Qwen3-Omni is the structured JSON baseline, Cosmos 3 is the world-model track with Nano compatibility and Super forward-dynamics LoRA results, and policy models such as OpenVLA/openpi/GR00T wait for robot-compatible action-target conversion.",
360
  "Future model tracks should be added through the backbone registry and verified package contract, not as one-off result folders with incompatible metrics or publication rules.",
361
  "The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
 
62
  "task_suite_enhancement_128_recommended_export": "multiscale_20s10_40s20_80s40",
63
  "task_suite_enhancement_128_estimated_windows": 106095,
64
  "task_count": 20,
65
+ "task_surface_framing": "unified_20_task_suite",
66
+ "legacy_provenance_result_path": "docs/data/tier2_task_suite.json"
 
67
  },
68
  "rows": [
69
  {
 
85
  "results/episode_task_suite/",
86
  "results/episode_task_suite/tier2_task_suite/"
87
  ],
88
+ "readout": "All 20 task contracts are presented together with committed minimal metrics, the same 20-frame windows, 5-frame stride, chronological split, and minimal/neural head pattern. The tier2_task_suite path is historical provenance inside the suite, not a separate public tier."
89
  },
90
  {
91
  "area": "180-result method matrix",
 
115
  "results/audio_ablation/",
116
  "docs/data/audio_ablation_summary.json"
117
  ],
118
+ "readout": "Audio variants improve the primary metric on 6 walkthrough-backed task contracts in this single-episode setting."
119
  },
120
  {
121
  "area": "Evaluation protocol",
 
354
  "The Cosmos3-Nano future-window package is verified as a compatibility adapter result, Cosmos3-Super Reasoner is verified as a base-weight evaluation, and Cosmos3-Super Forward-Dynamics LoRA is verified as the first fine-tuned Super adapter artifact. Cosmos3-Super adapter weights belong in cy0307/ropedia-cosmos3-super-forward-dynamics-lora-128ep; verified_public packages exclude safetensors.",
355
  "The current reconstruction task reconstructs feature vectors, not pixel-depth, mesh, NeRF, or Gaussian reconstruction.",
356
  "Audio is one of the synchronized source modalities in the current task representation.",
357
+ "The audio ablation report compares audio/no-audio variants across the walkthrough-backed task contracts in results/audio_ablation/.",
358
  "Foundation-model selection is explicit: Qwen3-Omni is the structured JSON baseline, Cosmos 3 is the world-model track with Nano compatibility and Super forward-dynamics LoRA results, and policy models such as OpenVLA/openpi/GR00T wait for robot-compatible action-target conversion.",
359
  "Future model tracks should be added through the backbone registry and verified package contract, not as one-off result folders with incompatible metrics or publication rules.",
360
  "The Xperience Embodied Foundation Model is a future native-pretraining goal, not a completed model or current benchmark."
data/publication_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-21T14:46:11+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-21T15:22:42+00:00",
4
  "checks": [
5
  {
6
  "name": "required_publication_assets_present",
data/quality_gates.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T14:46:48+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
 
1
  {
2
  "title": "Ropedia Xperience-10M Release Checks",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:21:42+00:00",
5
  "rule": "A release is current when the automated reports pass and the live GitHub/Hugging Face mirrors are verified after publishing.",
6
  "automated_gates": [
7
  {
data/reproducibility_matrix.json CHANGED
@@ -39,7 +39,7 @@
39
  "id": "original_task_suite",
40
  "status": "reproducible",
41
  "command": "python scripts/episode_task_suite.py --workspace $WORKSPACE --include-neural",
42
- "expected": "original task metrics, predictions, manifests, and neural_mlp task-head artifacts",
43
  "boundary": "8,546-dimensional multimodal window contract"
44
  },
45
  {
@@ -50,11 +50,11 @@
50
  "boundary": "single-episode probes, not full research-direction solutions"
51
  },
52
  {
53
- "id": "tasks_13_to_20_and_unified_index",
54
  "status": "reproducible",
55
  "command": "python scripts/tier2_task_suite.py && python scripts/build_unified_task_suite.py && python scripts/build_unified_task_model_radar.py",
56
- "expected": "tasks 13-20 metrics, prediction/rank artifacts, TASK_SUITE_20.md, docs/data/task_suite_20.json, docs/data/tier2_task_suite.json, docs/assets/charts/tier2_task_suite.svg, docs/data/unified_task_model_radar.json, and docs/assets/charts/unified_task_model_radar.svg",
57
- "boundary": "requires local public-sample annotation.hdf5 plus HOMIE Toolkit or h5py for tasks 13-20; raw HDF5 and MP4 files are not redistributed"
58
  },
59
  {
60
  "id": "source_alignment_audit",
 
39
  "id": "original_task_suite",
40
  "status": "reproducible",
41
  "command": "python scripts/episode_task_suite.py --workspace $WORKSPACE --include-neural",
42
+ "expected": "walkthrough-backed task metrics, predictions, manifests, and neural_mlp task-head artifacts",
43
  "boundary": "8,546-dimensional multimodal window contract"
44
  },
45
  {
 
50
  "boundary": "single-episode probes, not full research-direction solutions"
51
  },
52
  {
53
+ "id": "unified_20_task_index",
54
  "status": "reproducible",
55
  "command": "python scripts/tier2_task_suite.py && python scripts/build_unified_task_suite.py && python scripts/build_unified_task_model_radar.py",
56
+ "expected": "unified 20-task metrics, prediction/rank artifacts, TASK_SUITE_20.md, docs/data/task_suite_20.json, docs/data/tier2_task_suite.json, docs/assets/charts/tier2_task_suite.svg, docs/data/unified_task_model_radar.json, and docs/assets/charts/unified_task_model_radar.svg",
57
+ "boundary": "requires local public-sample annotation.hdf5 plus HOMIE Toolkit or h5py for full public-task regeneration; raw HDF5 and MP4 files are not redistributed"
58
  },
59
  {
60
  "id": "source_alignment_audit",
data/research_takeaways.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Research Takeaways",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-20T21:27:21+00:00",
5
  "source_files": [
6
  "docs/data/summary_metrics.json",
7
  "results/episode_task_suite/summary_report.json",
@@ -133,7 +133,7 @@
133
  {
134
  "id": "audio_contribution_is_task_specific",
135
  "title": "Audio helps some tasks and hurts others on the public sample",
136
- "readout": "Audio improves the primary metric on 6 of the original task contracts, while raw log-mel replacement improves over the current handcrafted block on 6 of those contracts. The largest current-audio gain appears in feature reconstruction, not in action classification.",
137
  "evidence": [
138
  {
139
  "label": "tasks_where_current_audio_improves",
 
1
  {
2
  "title": "Ropedia Xperience-10M Research Takeaways",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:18:59+00:00",
5
  "source_files": [
6
  "docs/data/summary_metrics.json",
7
  "results/episode_task_suite/summary_report.json",
 
133
  {
134
  "id": "audio_contribution_is_task_specific",
135
  "title": "Audio helps some tasks and hurts others on the public sample",
136
+ "readout": "Audio improves the primary metric on 6 walkthrough-backed task contracts, while raw log-mel replacement improves over the current handcrafted block on 6 of those contracts. The largest current-audio gain appears in feature reconstruction, not in action classification.",
137
  "evidence": [
138
  {
139
  "label": "tasks_where_current_audio_improves",
data/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-21T14:47:03+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-21T15:23:13+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
data/single_episode_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T10:47:17+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
@@ -73,7 +73,7 @@
73
  "label": "Action Recognition",
74
  "axis_label": "01 Action Recognition",
75
  "short_label": "Action",
76
- "origin": "original_public_sample_tasks",
77
  "metric_key": "macro_f1",
78
  "metric_name": "macro-F1",
79
  "metric_direction": "higher",
@@ -107,7 +107,7 @@
107
  "label": "Procedure Step Recognition",
108
  "axis_label": "02 Procedure Step Recognition",
109
  "short_label": "Step",
110
- "origin": "original_public_sample_tasks",
111
  "metric_key": "macro_f1",
112
  "metric_name": "macro-F1",
113
  "metric_direction": "higher",
@@ -141,7 +141,7 @@
141
  "label": "Action Boundary Detection",
142
  "axis_label": "03 Action Boundary Detection",
143
  "short_label": "Boundary",
144
- "origin": "original_public_sample_tasks",
145
  "metric_key": "macro_f1",
146
  "metric_name": "macro-F1",
147
  "metric_direction": "higher",
@@ -175,7 +175,7 @@
175
  "label": "Next-Action Prediction",
176
  "axis_label": "04 Next-Action Prediction",
177
  "short_label": "Next act",
178
- "origin": "original_public_sample_tasks",
179
  "metric_key": "macro_f1",
180
  "metric_name": "macro-F1",
181
  "metric_direction": "higher",
@@ -209,7 +209,7 @@
209
  "label": "Hand Trajectory Forecasting",
210
  "axis_label": "05 Hand Trajectory Forecasting",
211
  "short_label": "Hand traj",
212
- "origin": "original_public_sample_tasks",
213
  "metric_key": "mpjpe",
214
  "metric_name": "MPJPE",
215
  "metric_direction": "lower",
@@ -243,7 +243,7 @@
243
  "label": "Contact State Prediction",
244
  "axis_label": "06 Contact State Prediction",
245
  "short_label": "Contact",
246
- "origin": "original_public_sample_tasks",
247
  "metric_key": "macro_f1",
248
  "metric_name": "macro-F1",
249
  "metric_direction": "higher",
@@ -277,7 +277,7 @@
277
  "label": "Object Relevance Prediction",
278
  "axis_label": "07 Object Relevance Prediction",
279
  "short_label": "Objects",
280
- "origin": "original_public_sample_tasks",
281
  "metric_key": "micro_f1",
282
  "metric_name": "micro-F1",
283
  "metric_direction": "higher",
@@ -311,7 +311,7 @@
311
  "label": "Language Grounding",
312
  "axis_label": "08 Language Grounding",
313
  "short_label": "Language",
314
- "origin": "original_public_sample_tasks",
315
  "metric_key": "mrr",
316
  "metric_name": "MRR",
317
  "metric_direction": "higher",
@@ -345,7 +345,7 @@
345
  "label": "Cross-Modal Retrieval",
346
  "axis_label": "09 Cross-Modal Retrieval",
347
  "short_label": "X-modal",
348
- "origin": "original_public_sample_tasks",
349
  "metric_key": "mrr",
350
  "metric_name": "MRR",
351
  "metric_direction": "higher",
@@ -379,7 +379,7 @@
379
  "label": "Cross-Modal Reconstruction",
380
  "axis_label": "10 Cross-Modal Reconstruction",
381
  "short_label": "Recon",
382
- "origin": "original_public_sample_tasks",
383
  "metric_key": "r2",
384
  "metric_name": "R2",
385
  "metric_direction": "higher",
@@ -413,7 +413,7 @@
413
  "label": "Temporal Order Verification",
414
  "axis_label": "11 Temporal Order Verification",
415
  "short_label": "Order",
416
- "origin": "original_public_sample_tasks",
417
  "metric_key": "f1",
418
  "metric_name": "F1",
419
  "metric_direction": "higher",
@@ -447,7 +447,7 @@
447
  "label": "Multimodal Synchronization Detection",
448
  "axis_label": "12 Multimodal Synchronization Detection",
449
  "short_label": "Sync",
450
- "origin": "original_public_sample_tasks",
451
  "metric_key": "f1",
452
  "metric_name": "F1",
453
  "metric_direction": "higher",
@@ -481,7 +481,7 @@
481
  "label": "Long-Horizon Next-Action Forecasting",
482
  "axis_label": "13 Long-Horizon Next-Action Forecasting",
483
  "short_label": "Long act",
484
- "origin": "additional_public_sample_tasks",
485
  "metric_key": "macro_f1",
486
  "metric_name": "macro-F1",
487
  "metric_direction": "higher",
@@ -515,7 +515,7 @@
515
  "label": "Long-Horizon Next-Subtask Forecasting",
516
  "axis_label": "14 Long-Horizon Next-Subtask Forecasting",
517
  "short_label": "Long step",
518
- "origin": "additional_public_sample_tasks",
519
  "metric_key": "macro_f1",
520
  "metric_name": "macro-F1",
521
  "metric_direction": "higher",
@@ -549,7 +549,7 @@
549
  "label": "Interaction Text Prediction",
550
  "axis_label": "15 Interaction Text Prediction",
551
  "short_label": "Interact txt",
552
- "origin": "additional_public_sample_tasks",
553
  "metric_key": "macro_f1",
554
  "metric_name": "macro-F1",
555
  "metric_direction": "higher",
@@ -583,7 +583,7 @@
583
  "label": "Action-Object Relation Prediction",
584
  "axis_label": "16 Action-Object Relation Prediction",
585
  "short_label": "Act+obj",
586
- "origin": "additional_public_sample_tasks",
587
  "metric_key": "macro_f1",
588
  "metric_name": "macro-F1",
589
  "metric_direction": "higher",
@@ -617,7 +617,7 @@
617
  "label": "Future Object-Set Forecasting",
618
  "axis_label": "17 Future Object-Set Forecasting",
619
  "short_label": "Future obj",
620
- "origin": "additional_public_sample_tasks",
621
  "metric_key": "micro_f1",
622
  "metric_name": "micro-F1",
623
  "metric_direction": "higher",
@@ -651,7 +651,7 @@
651
  "label": "IMU-to-Hand Pose Reconstruction",
652
  "axis_label": "18 IMU-to-Hand Pose Reconstruction",
653
  "short_label": "IMU->hand",
654
- "origin": "additional_public_sample_tasks",
655
  "metric_key": "mae",
656
  "metric_name": "MAE",
657
  "metric_direction": "lower",
@@ -685,7 +685,7 @@
685
  "label": "Camera-View Synchronization Retrieval",
686
  "axis_label": "19 Camera-View Synchronization Retrieval",
687
  "short_label": "Cam sync",
688
- "origin": "additional_public_sample_tasks",
689
  "metric_key": "mrr",
690
  "metric_name": "MRR",
691
  "metric_direction": "higher",
@@ -719,7 +719,7 @@
719
  "label": "Time-to-Next-Transition Regression",
720
  "axis_label": "20 Time-to-Next-Transition Regression",
721
  "short_label": "Time2bdry",
722
- "origin": "additional_public_sample_tasks",
723
  "metric_key": "mae",
724
  "metric_name": "MAE frames",
725
  "metric_direction": "lower",
 
1
  {
2
  "title": "Single-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:20:34+00:00",
5
  "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
6
  "task_count": 20,
7
  "method_count": 2,
 
73
  "label": "Action Recognition",
74
  "axis_label": "01 Action Recognition",
75
  "short_label": "Action",
76
+ "provenance_source": "walkthrough_backed_task_contract",
77
  "metric_key": "macro_f1",
78
  "metric_name": "macro-F1",
79
  "metric_direction": "higher",
 
107
  "label": "Procedure Step Recognition",
108
  "axis_label": "02 Procedure Step Recognition",
109
  "short_label": "Step",
110
+ "provenance_source": "walkthrough_backed_task_contract",
111
  "metric_key": "macro_f1",
112
  "metric_name": "macro-F1",
113
  "metric_direction": "higher",
 
141
  "label": "Action Boundary Detection",
142
  "axis_label": "03 Action Boundary Detection",
143
  "short_label": "Boundary",
144
+ "provenance_source": "walkthrough_backed_task_contract",
145
  "metric_key": "macro_f1",
146
  "metric_name": "macro-F1",
147
  "metric_direction": "higher",
 
175
  "label": "Next-Action Prediction",
176
  "axis_label": "04 Next-Action Prediction",
177
  "short_label": "Next act",
178
+ "provenance_source": "walkthrough_backed_task_contract",
179
  "metric_key": "macro_f1",
180
  "metric_name": "macro-F1",
181
  "metric_direction": "higher",
 
209
  "label": "Hand Trajectory Forecasting",
210
  "axis_label": "05 Hand Trajectory Forecasting",
211
  "short_label": "Hand traj",
212
+ "provenance_source": "walkthrough_backed_task_contract",
213
  "metric_key": "mpjpe",
214
  "metric_name": "MPJPE",
215
  "metric_direction": "lower",
 
243
  "label": "Contact State Prediction",
244
  "axis_label": "06 Contact State Prediction",
245
  "short_label": "Contact",
246
+ "provenance_source": "walkthrough_backed_task_contract",
247
  "metric_key": "macro_f1",
248
  "metric_name": "macro-F1",
249
  "metric_direction": "higher",
 
277
  "label": "Object Relevance Prediction",
278
  "axis_label": "07 Object Relevance Prediction",
279
  "short_label": "Objects",
280
+ "provenance_source": "walkthrough_backed_task_contract",
281
  "metric_key": "micro_f1",
282
  "metric_name": "micro-F1",
283
  "metric_direction": "higher",
 
311
  "label": "Language Grounding",
312
  "axis_label": "08 Language Grounding",
313
  "short_label": "Language",
314
+ "provenance_source": "walkthrough_backed_task_contract",
315
  "metric_key": "mrr",
316
  "metric_name": "MRR",
317
  "metric_direction": "higher",
 
345
  "label": "Cross-Modal Retrieval",
346
  "axis_label": "09 Cross-Modal Retrieval",
347
  "short_label": "X-modal",
348
+ "provenance_source": "walkthrough_backed_task_contract",
349
  "metric_key": "mrr",
350
  "metric_name": "MRR",
351
  "metric_direction": "higher",
 
379
  "label": "Cross-Modal Reconstruction",
380
  "axis_label": "10 Cross-Modal Reconstruction",
381
  "short_label": "Recon",
382
+ "provenance_source": "walkthrough_backed_task_contract",
383
  "metric_key": "r2",
384
  "metric_name": "R2",
385
  "metric_direction": "higher",
 
413
  "label": "Temporal Order Verification",
414
  "axis_label": "11 Temporal Order Verification",
415
  "short_label": "Order",
416
+ "provenance_source": "walkthrough_backed_task_contract",
417
  "metric_key": "f1",
418
  "metric_name": "F1",
419
  "metric_direction": "higher",
 
447
  "label": "Multimodal Synchronization Detection",
448
  "axis_label": "12 Multimodal Synchronization Detection",
449
  "short_label": "Sync",
450
+ "provenance_source": "walkthrough_backed_task_contract",
451
  "metric_key": "f1",
452
  "metric_name": "F1",
453
  "metric_direction": "higher",
 
481
  "label": "Long-Horizon Next-Action Forecasting",
482
  "axis_label": "13 Long-Horizon Next-Action Forecasting",
483
  "short_label": "Long act",
484
+ "provenance_source": "historical_result_bundle",
485
  "metric_key": "macro_f1",
486
  "metric_name": "macro-F1",
487
  "metric_direction": "higher",
 
515
  "label": "Long-Horizon Next-Subtask Forecasting",
516
  "axis_label": "14 Long-Horizon Next-Subtask Forecasting",
517
  "short_label": "Long step",
518
+ "provenance_source": "historical_result_bundle",
519
  "metric_key": "macro_f1",
520
  "metric_name": "macro-F1",
521
  "metric_direction": "higher",
 
549
  "label": "Interaction Text Prediction",
550
  "axis_label": "15 Interaction Text Prediction",
551
  "short_label": "Interact txt",
552
+ "provenance_source": "historical_result_bundle",
553
  "metric_key": "macro_f1",
554
  "metric_name": "macro-F1",
555
  "metric_direction": "higher",
 
583
  "label": "Action-Object Relation Prediction",
584
  "axis_label": "16 Action-Object Relation Prediction",
585
  "short_label": "Act+obj",
586
+ "provenance_source": "historical_result_bundle",
587
  "metric_key": "macro_f1",
588
  "metric_name": "macro-F1",
589
  "metric_direction": "higher",
 
617
  "label": "Future Object-Set Forecasting",
618
  "axis_label": "17 Future Object-Set Forecasting",
619
  "short_label": "Future obj",
620
+ "provenance_source": "historical_result_bundle",
621
  "metric_key": "micro_f1",
622
  "metric_name": "micro-F1",
623
  "metric_direction": "higher",
 
651
  "label": "IMU-to-Hand Pose Reconstruction",
652
  "axis_label": "18 IMU-to-Hand Pose Reconstruction",
653
  "short_label": "IMU->hand",
654
+ "provenance_source": "historical_result_bundle",
655
  "metric_key": "mae",
656
  "metric_name": "MAE",
657
  "metric_direction": "lower",
 
685
  "label": "Camera-View Synchronization Retrieval",
686
  "axis_label": "19 Camera-View Synchronization Retrieval",
687
  "short_label": "Cam sync",
688
+ "provenance_source": "historical_result_bundle",
689
  "metric_key": "mrr",
690
  "metric_name": "MRR",
691
  "metric_direction": "higher",
 
719
  "label": "Time-to-Next-Transition Regression",
720
  "axis_label": "20 Time-to-Next-Transition Regression",
721
  "short_label": "Time2bdry",
722
+ "provenance_source": "historical_result_bundle",
723
  "metric_key": "mae",
724
  "metric_name": "MAE frames",
725
  "metric_direction": "lower",
data/source_alignment_audit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T14:46:49+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
 
1
  {
2
  "title": "Ropedia Xperience-10M Source Alignment Note",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:21:55+00:00",
5
  "alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
6
  "alignment_summary": {
7
  "full_dataset_repo": "ropedia-ai/xperience-10m",
data/task_method_20_gap_audit.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "generated_at_utc": "2026-06-21T08:38:20+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
 
1
  {
2
+ "generated_at_utc": "2026-06-21T15:21:42+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
data/task_method_20_result_matrix.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T10:47:17+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
 
1
  {
2
  "title": "Task Method 20-Result Matrix",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:20:34+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
data/task_suite_20.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "Ropedia Xperience-10M Unified 20-Task Suite",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T14:40:33+00:00",
5
  "task_count": 20,
6
- "task_count_breakdown": {
7
- "original_public_sample_tasks": 12,
8
- "additional_public_sample_tasks": 8,
9
- "total_unified_tasks": 20
10
  },
11
  "unification_policy": {
12
  "public_framing": "The suite is presented as one 20-task benchmark surface. All task contracts share the same window, split, feature, baseline, and leakage-control language.",
@@ -21,7 +21,7 @@
21
  "window_frames": 20,
22
  "stride_frames": 5,
23
  "split_policy": "single_episode_chronological_70_30",
24
- "raw_hdf5_required_for_tasks_13_20_regeneration": true,
25
  "raw_data_redistributed": false
26
  },
27
  "setup_alignment": {
@@ -47,8 +47,8 @@
47
  "task_id": "timeline_action",
48
  "task_display_name": "Action Recognition",
49
  "research_name": "Egocentric Action Recognition",
50
- "origin": "original_public_sample_tasks",
51
- "origin_count_label": "original task",
52
  "family": "supervised",
53
  "architecture_family": "multiclass classifier",
54
  "primary_direction": "C. Egocentric Vision & Interaction",
@@ -82,8 +82,8 @@
82
  "task_id": "timeline_subtask",
83
  "task_display_name": "Procedure Step Recognition",
84
  "research_name": "Temporal Subtask Recognition",
85
- "origin": "original_public_sample_tasks",
86
- "origin_count_label": "original task",
87
  "family": "supervised",
88
  "architecture_family": "multiclass classifier",
89
  "primary_direction": "C. Egocentric Vision & Interaction",
@@ -117,8 +117,8 @@
117
  "task_id": "transition_detection",
118
  "task_display_name": "Action Boundary Detection",
119
  "research_name": "Temporal Action Segmentation",
120
- "origin": "original_public_sample_tasks",
121
- "origin_count_label": "original task",
122
  "family": "diagnostic",
123
  "architecture_family": "binary classifier",
124
  "primary_direction": "C. Egocentric Vision & Interaction",
@@ -152,8 +152,8 @@
152
  "task_id": "next_action",
153
  "task_display_name": "Next-Action Prediction",
154
  "research_name": "Short-Horizon Intention Prediction",
155
- "origin": "original_public_sample_tasks",
156
- "origin_count_label": "original task",
157
  "family": "supervised",
158
  "architecture_family": "future-label classifier",
159
  "primary_direction": "C. Egocentric Vision & Interaction",
@@ -187,8 +187,8 @@
187
  "task_id": "hand_trajectory_forecast",
188
  "task_display_name": "Hand Trajectory Forecasting",
189
  "research_name": "3D Hand Motion Forecasting",
190
- "origin": "original_public_sample_tasks",
191
- "origin_count_label": "original task",
192
  "family": "forecast",
193
  "architecture_family": "continuous regressor",
194
  "primary_direction": "A. Human Modeling & Motion Understanding",
@@ -220,8 +220,8 @@
220
  "task_id": "contact_prediction",
221
  "task_display_name": "Contact State Prediction",
222
  "research_name": "Human-Object Contact Prediction",
223
- "origin": "original_public_sample_tasks",
224
- "origin_count_label": "original task",
225
  "family": "supervised",
226
  "architecture_family": "binary classifier",
227
  "primary_direction": "A. Human Modeling & Motion Understanding",
@@ -255,8 +255,8 @@
255
  "task_id": "object_relevance",
256
  "task_display_name": "Object Relevance Prediction",
257
  "research_name": "Object-Centric Interaction Recognition",
258
- "origin": "original_public_sample_tasks",
259
- "origin_count_label": "original task",
260
  "family": "supervised",
261
  "architecture_family": "multi-label classifier",
262
  "primary_direction": "C. Egocentric Vision & Interaction",
@@ -288,8 +288,8 @@
288
  "task_id": "caption_grounding",
289
  "task_display_name": "Language Grounding",
290
  "research_name": "Language-to-Moment Grounding",
291
- "origin": "original_public_sample_tasks",
292
- "origin_count_label": "original task",
293
  "family": "retrieval",
294
  "architecture_family": "retrieval ranker",
295
  "primary_direction": "C. Egocentric Vision & Interaction",
@@ -321,8 +321,8 @@
321
  "task_id": "cross_modal_retrieval",
322
  "task_display_name": "Cross-Modal Retrieval",
323
  "research_name": "Multimodal Representation Retrieval",
324
- "origin": "original_public_sample_tasks",
325
- "origin_count_label": "original task",
326
  "family": "retrieval",
327
  "architecture_family": "two-tower retrieval head",
328
  "primary_direction": "D. Scene Reconstruction & World Modeling",
@@ -354,8 +354,8 @@
354
  "task_id": "modality_reconstruction",
355
  "task_display_name": "Cross-Modal Reconstruction",
356
  "research_name": "Modality Feature Reconstruction",
357
- "origin": "original_public_sample_tasks",
358
- "origin_count_label": "original task",
359
  "family": "forecast",
360
  "architecture_family": "feature regressor",
361
  "primary_direction": "B. 3D/4D Reconstruction & Neural Rendering",
@@ -386,8 +386,8 @@
386
  "task_id": "temporal_order",
387
  "task_display_name": "Temporal Order Verification",
388
  "research_name": "Temporal Order Verification",
389
- "origin": "original_public_sample_tasks",
390
- "origin_count_label": "original task",
391
  "family": "diagnostic",
392
  "architecture_family": "pairwise classifier",
393
  "primary_direction": "D. Scene Reconstruction & World Modeling",
@@ -419,8 +419,8 @@
419
  "task_id": "misalignment_detection",
420
  "task_display_name": "Multimodal Synchronization Detection",
421
  "research_name": "Cross-Modal Misalignment Detection",
422
- "origin": "original_public_sample_tasks",
423
- "origin_count_label": "original task",
424
  "family": "diagnostic",
425
  "architecture_family": "pairwise classifier",
426
  "primary_direction": "B. 3D/4D Reconstruction & Neural Rendering",
@@ -452,8 +452,8 @@
452
  "task_id": "long_horizon_next_action",
453
  "task_display_name": "Long-Horizon Next-Action Forecasting",
454
  "research_name": "Long-Horizon Next-Action Forecasting",
455
- "origin": "additional_public_sample_tasks",
456
- "origin_count_label": "additional task",
457
  "family": "classification",
458
  "architecture_family": "minimal_softmax",
459
  "primary_direction": "sample-supported extension",
@@ -487,8 +487,8 @@
487
  "task_id": "next_subtask_forecast",
488
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
489
  "research_name": "Long-Horizon Next-Subtask Forecasting",
490
- "origin": "additional_public_sample_tasks",
491
- "origin_count_label": "additional task",
492
  "family": "classification",
493
  "architecture_family": "minimal_softmax",
494
  "primary_direction": "sample-supported extension",
@@ -522,8 +522,8 @@
522
  "task_id": "interaction_text_prediction",
523
  "task_display_name": "Interaction Text Prediction",
524
  "research_name": "Interaction Text Prediction",
525
- "origin": "additional_public_sample_tasks",
526
- "origin_count_label": "additional task",
527
  "family": "classification",
528
  "architecture_family": "minimal_softmax",
529
  "primary_direction": "sample-supported extension",
@@ -557,8 +557,8 @@
557
  "task_id": "action_object_relation",
558
  "task_display_name": "Action-Object Relation Prediction",
559
  "research_name": "Action-Object Relation Prediction",
560
- "origin": "additional_public_sample_tasks",
561
- "origin_count_label": "additional task",
562
  "family": "classification",
563
  "architecture_family": "minimal_softmax",
564
  "primary_direction": "sample-supported extension",
@@ -592,8 +592,8 @@
592
  "task_id": "object_set_forecast",
593
  "task_display_name": "Future Object-Set Forecasting",
594
  "research_name": "Future Object-Set Forecasting",
595
- "origin": "additional_public_sample_tasks",
596
- "origin_count_label": "additional task",
597
  "family": "multi_label",
598
  "architecture_family": "minimal_ridge_multilabel",
599
  "primary_direction": "sample-supported extension",
@@ -625,8 +625,8 @@
625
  "task_id": "imu_to_hand_pose",
626
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
627
  "research_name": "IMU-to-Hand Pose Reconstruction",
628
- "origin": "additional_public_sample_tasks",
629
- "origin_count_label": "additional task",
630
  "family": "regression",
631
  "architecture_family": "minimal_ridge_regression",
632
  "primary_direction": "sample-supported extension",
@@ -658,8 +658,8 @@
658
  "task_id": "camera_view_sync_retrieval",
659
  "task_display_name": "Camera-View Synchronization Retrieval",
660
  "research_name": "Camera-View Synchronization Retrieval",
661
- "origin": "additional_public_sample_tasks",
662
- "origin_count_label": "additional task",
663
  "family": "retrieval",
664
  "architecture_family": "minimal_ridge_projection_cosine_retrieval",
665
  "primary_direction": "sample-supported extension",
@@ -690,8 +690,8 @@
690
  "task_id": "time_to_transition",
691
  "task_display_name": "Time-to-Next-Transition Regression",
692
  "research_name": "Time-to-Next-Transition Regression",
693
- "origin": "additional_public_sample_tasks",
694
- "origin_count_label": "additional task",
695
  "family": "regression",
696
  "architecture_family": "minimal_ridge_regression",
697
  "primary_direction": "sample-supported extension",
 
1
  {
2
  "title": "Ropedia Xperience-10M Unified 20-Task Suite",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:21:12+00:00",
5
  "task_count": 20,
6
+ "task_count_summary": {
7
+ "total_unified_tasks": 20,
8
+ "public_framing": "all 20 task contracts are presented as one suite",
9
+ "legacy_provenance_rows": 8
10
  },
11
  "unification_policy": {
12
  "public_framing": "The suite is presented as one 20-task benchmark surface. All task contracts share the same window, split, feature, baseline, and leakage-control language.",
 
21
  "window_frames": 20,
22
  "stride_frames": 5,
23
  "split_policy": "single_episode_chronological_70_30",
24
+ "raw_hdf5_required_for_full_public_regeneration": true,
25
  "raw_data_redistributed": false
26
  },
27
  "setup_alignment": {
 
47
  "task_id": "timeline_action",
48
  "task_display_name": "Action Recognition",
49
  "research_name": "Egocentric Action Recognition",
50
+ "provenance_source": "walkthrough_backed_task_contract",
51
+ "origin_count_label": "unified task",
52
  "family": "supervised",
53
  "architecture_family": "multiclass classifier",
54
  "primary_direction": "C. Egocentric Vision & Interaction",
 
82
  "task_id": "timeline_subtask",
83
  "task_display_name": "Procedure Step Recognition",
84
  "research_name": "Temporal Subtask Recognition",
85
+ "provenance_source": "walkthrough_backed_task_contract",
86
+ "origin_count_label": "unified task",
87
  "family": "supervised",
88
  "architecture_family": "multiclass classifier",
89
  "primary_direction": "C. Egocentric Vision & Interaction",
 
117
  "task_id": "transition_detection",
118
  "task_display_name": "Action Boundary Detection",
119
  "research_name": "Temporal Action Segmentation",
120
+ "provenance_source": "walkthrough_backed_task_contract",
121
+ "origin_count_label": "unified task",
122
  "family": "diagnostic",
123
  "architecture_family": "binary classifier",
124
  "primary_direction": "C. Egocentric Vision & Interaction",
 
152
  "task_id": "next_action",
153
  "task_display_name": "Next-Action Prediction",
154
  "research_name": "Short-Horizon Intention Prediction",
155
+ "provenance_source": "walkthrough_backed_task_contract",
156
+ "origin_count_label": "unified task",
157
  "family": "supervised",
158
  "architecture_family": "future-label classifier",
159
  "primary_direction": "C. Egocentric Vision & Interaction",
 
187
  "task_id": "hand_trajectory_forecast",
188
  "task_display_name": "Hand Trajectory Forecasting",
189
  "research_name": "3D Hand Motion Forecasting",
190
+ "provenance_source": "walkthrough_backed_task_contract",
191
+ "origin_count_label": "unified task",
192
  "family": "forecast",
193
  "architecture_family": "continuous regressor",
194
  "primary_direction": "A. Human Modeling & Motion Understanding",
 
220
  "task_id": "contact_prediction",
221
  "task_display_name": "Contact State Prediction",
222
  "research_name": "Human-Object Contact Prediction",
223
+ "provenance_source": "walkthrough_backed_task_contract",
224
+ "origin_count_label": "unified task",
225
  "family": "supervised",
226
  "architecture_family": "binary classifier",
227
  "primary_direction": "A. Human Modeling & Motion Understanding",
 
255
  "task_id": "object_relevance",
256
  "task_display_name": "Object Relevance Prediction",
257
  "research_name": "Object-Centric Interaction Recognition",
258
+ "provenance_source": "walkthrough_backed_task_contract",
259
+ "origin_count_label": "unified task",
260
  "family": "supervised",
261
  "architecture_family": "multi-label classifier",
262
  "primary_direction": "C. Egocentric Vision & Interaction",
 
288
  "task_id": "caption_grounding",
289
  "task_display_name": "Language Grounding",
290
  "research_name": "Language-to-Moment Grounding",
291
+ "provenance_source": "walkthrough_backed_task_contract",
292
+ "origin_count_label": "unified task",
293
  "family": "retrieval",
294
  "architecture_family": "retrieval ranker",
295
  "primary_direction": "C. Egocentric Vision & Interaction",
 
321
  "task_id": "cross_modal_retrieval",
322
  "task_display_name": "Cross-Modal Retrieval",
323
  "research_name": "Multimodal Representation Retrieval",
324
+ "provenance_source": "walkthrough_backed_task_contract",
325
+ "origin_count_label": "unified task",
326
  "family": "retrieval",
327
  "architecture_family": "two-tower retrieval head",
328
  "primary_direction": "D. Scene Reconstruction & World Modeling",
 
354
  "task_id": "modality_reconstruction",
355
  "task_display_name": "Cross-Modal Reconstruction",
356
  "research_name": "Modality Feature Reconstruction",
357
+ "provenance_source": "walkthrough_backed_task_contract",
358
+ "origin_count_label": "unified task",
359
  "family": "forecast",
360
  "architecture_family": "feature regressor",
361
  "primary_direction": "B. 3D/4D Reconstruction & Neural Rendering",
 
386
  "task_id": "temporal_order",
387
  "task_display_name": "Temporal Order Verification",
388
  "research_name": "Temporal Order Verification",
389
+ "provenance_source": "walkthrough_backed_task_contract",
390
+ "origin_count_label": "unified task",
391
  "family": "diagnostic",
392
  "architecture_family": "pairwise classifier",
393
  "primary_direction": "D. Scene Reconstruction & World Modeling",
 
419
  "task_id": "misalignment_detection",
420
  "task_display_name": "Multimodal Synchronization Detection",
421
  "research_name": "Cross-Modal Misalignment Detection",
422
+ "provenance_source": "walkthrough_backed_task_contract",
423
+ "origin_count_label": "unified task",
424
  "family": "diagnostic",
425
  "architecture_family": "pairwise classifier",
426
  "primary_direction": "B. 3D/4D Reconstruction & Neural Rendering",
 
452
  "task_id": "long_horizon_next_action",
453
  "task_display_name": "Long-Horizon Next-Action Forecasting",
454
  "research_name": "Long-Horizon Next-Action Forecasting",
455
+ "provenance_source": "historical_result_bundle",
456
+ "origin_count_label": "unified task",
457
  "family": "classification",
458
  "architecture_family": "minimal_softmax",
459
  "primary_direction": "sample-supported extension",
 
487
  "task_id": "next_subtask_forecast",
488
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
489
  "research_name": "Long-Horizon Next-Subtask Forecasting",
490
+ "provenance_source": "historical_result_bundle",
491
+ "origin_count_label": "unified task",
492
  "family": "classification",
493
  "architecture_family": "minimal_softmax",
494
  "primary_direction": "sample-supported extension",
 
522
  "task_id": "interaction_text_prediction",
523
  "task_display_name": "Interaction Text Prediction",
524
  "research_name": "Interaction Text Prediction",
525
+ "provenance_source": "historical_result_bundle",
526
+ "origin_count_label": "unified task",
527
  "family": "classification",
528
  "architecture_family": "minimal_softmax",
529
  "primary_direction": "sample-supported extension",
 
557
  "task_id": "action_object_relation",
558
  "task_display_name": "Action-Object Relation Prediction",
559
  "research_name": "Action-Object Relation Prediction",
560
+ "provenance_source": "historical_result_bundle",
561
+ "origin_count_label": "unified task",
562
  "family": "classification",
563
  "architecture_family": "minimal_softmax",
564
  "primary_direction": "sample-supported extension",
 
592
  "task_id": "object_set_forecast",
593
  "task_display_name": "Future Object-Set Forecasting",
594
  "research_name": "Future Object-Set Forecasting",
595
+ "provenance_source": "historical_result_bundle",
596
+ "origin_count_label": "unified task",
597
  "family": "multi_label",
598
  "architecture_family": "minimal_ridge_multilabel",
599
  "primary_direction": "sample-supported extension",
 
625
  "task_id": "imu_to_hand_pose",
626
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
627
  "research_name": "IMU-to-Hand Pose Reconstruction",
628
+ "provenance_source": "historical_result_bundle",
629
+ "origin_count_label": "unified task",
630
  "family": "regression",
631
  "architecture_family": "minimal_ridge_regression",
632
  "primary_direction": "sample-supported extension",
 
658
  "task_id": "camera_view_sync_retrieval",
659
  "task_display_name": "Camera-View Synchronization Retrieval",
660
  "research_name": "Camera-View Synchronization Retrieval",
661
+ "provenance_source": "historical_result_bundle",
662
+ "origin_count_label": "unified task",
663
  "family": "retrieval",
664
  "architecture_family": "minimal_ridge_projection_cosine_retrieval",
665
  "primary_direction": "sample-supported extension",
 
690
  "task_id": "time_to_transition",
691
  "task_display_name": "Time-to-Next-Transition Regression",
692
  "research_name": "Time-to-Next-Transition Regression",
693
+ "provenance_source": "historical_result_bundle",
694
+ "origin_count_label": "unified task",
695
  "family": "regression",
696
  "architecture_family": "minimal_ridge_regression",
697
  "primary_direction": "sample-supported extension",
data/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-21T14:45:00+00:00",
4
  "summary": {
5
  "original_walkthrough_task_count": 12,
6
  "expected_original_walkthrough_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-21T15:21:55+00:00",
4
  "summary": {
5
  "original_walkthrough_task_count": 12,
6
  "expected_original_walkthrough_task_count": 12,
data/tier2_task_suite.json CHANGED
@@ -2,13 +2,12 @@
2
  "title": "Ropedia Xperience-10M Unified 20-Task Provenance Bundle",
3
  "status": "pass",
4
  "generated_at_utc": "2026-06-16T06:25:58+00:00",
5
- "suite_position": "tasks_13_to_20",
6
  "legacy_path_note": "The tier2_task_suite file and directory names are retained for stable public links; this bundle is provenance inside the unified 20-task suite, not a separate public tier.",
7
- "integrated_with_tasks_1_to_12": {
8
- "tasks_1_to_12_count": 12,
9
- "additional_task_count": 8,
10
- "combined_task_count": 20,
11
- "tasks_1_to_12_metrics": "docs/data/summary_metrics.json",
12
  "unified_protocol": "docs/data/evaluation_protocol.json"
13
  },
14
  "dataset_scope": {
@@ -28,9 +27,9 @@
28
  "raw_data_redistributed": false
29
  },
30
  "setup_alignment": {
31
- "same_window_unit_as_tasks_1_to_12": true,
32
- "same_feature_manifest_as_tasks_1_to_12": "results/episode_task_suite/feature_manifest.json",
33
- "same_shared_tensor_as_tasks_1_to_12": "results/episode_task_suite/shared_windows.npz",
34
  "minimal_baselines": "softmax, ridge regression/projection, and ridge multilabel heads",
35
  "neural_baselines": "compact one-hidden-layer/two-layer PyTorch MLP heads with the same chronological split",
36
  "leakage_policy": "Caption-derived text features are removed whenever the target is a label, object, relation, interaction phrase, or future semantic state."
@@ -135,7 +134,7 @@
135
  "status": "pass",
136
  "task": "long_horizon_next_action",
137
  "task_display_name": "Long-Horizon Next-Action Forecasting",
138
- "suite_position": "tasks_13_to_20",
139
  "model_family": "minimal_softmax",
140
  "input": "Current 20-frame non-caption multimodal window.",
141
  "split": "single_episode_chronological",
@@ -221,7 +220,7 @@
221
  "status": "pass",
222
  "task": "long_horizon_next_action",
223
  "task_display_name": "Long-Horizon Next-Action Forecasting",
224
- "suite_position": "tasks_13_to_20",
225
  "model_family": "neural_mlp",
226
  "input": "Current 20-frame non-caption multimodal window.",
227
  "split": "single_episode_chronological",
@@ -276,7 +275,7 @@
276
  "status": "pass",
277
  "task": "next_subtask_forecast",
278
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
279
- "suite_position": "tasks_13_to_20",
280
  "model_family": "minimal_softmax",
281
  "input": "Current 20-frame non-caption multimodal window.",
282
  "split": "single_episode_chronological",
@@ -361,7 +360,7 @@
361
  "status": "pass",
362
  "task": "next_subtask_forecast",
363
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
364
- "suite_position": "tasks_13_to_20",
365
  "model_family": "neural_mlp",
366
  "input": "Current 20-frame non-caption multimodal window.",
367
  "split": "single_episode_chronological",
@@ -416,7 +415,7 @@
416
  "status": "pass",
417
  "task": "interaction_text_prediction",
418
  "task_display_name": "Interaction Text Prediction",
419
- "suite_position": "tasks_13_to_20",
420
  "model_family": "minimal_softmax",
421
  "input": "Current 20-frame sensor window with caption-text features removed.",
422
  "split": "single_episode_chronological",
@@ -512,7 +511,7 @@
512
  "status": "pass",
513
  "task": "interaction_text_prediction",
514
  "task_display_name": "Interaction Text Prediction",
515
- "suite_position": "tasks_13_to_20",
516
  "model_family": "neural_mlp",
517
  "input": "Current 20-frame sensor window with caption-text features removed.",
518
  "split": "single_episode_chronological",
@@ -567,7 +566,7 @@
567
  "status": "pass",
568
  "task": "action_object_relation",
569
  "task_display_name": "Action-Object Relation Prediction",
570
- "suite_position": "tasks_13_to_20",
571
  "model_family": "minimal_softmax",
572
  "input": "Current 20-frame sensor window with caption-text features removed.",
573
  "split": "single_episode_chronological",
@@ -659,7 +658,7 @@
659
  "status": "pass",
660
  "task": "action_object_relation",
661
  "task_display_name": "Action-Object Relation Prediction",
662
- "suite_position": "tasks_13_to_20",
663
  "model_family": "neural_mlp",
664
  "input": "Current 20-frame sensor window with caption-text features removed.",
665
  "split": "single_episode_chronological",
@@ -713,7 +712,7 @@
713
  "status": "pass",
714
  "task": "object_set_forecast",
715
  "task_display_name": "Future Object-Set Forecasting",
716
- "suite_position": "tasks_13_to_20",
717
  "model_family": "minimal_ridge_multilabel",
718
  "input": "Current 20-frame sensor window with caption-text features removed.",
719
  "split": "single_episode_chronological",
@@ -747,7 +746,7 @@
747
  "status": "pass",
748
  "task": "object_set_forecast",
749
  "task_display_name": "Future Object-Set Forecasting",
750
- "suite_position": "tasks_13_to_20",
751
  "model_family": "neural_mlp_multilabel",
752
  "input": "Current 20-frame sensor window with caption-text features removed.",
753
  "split": "single_episode_chronological",
@@ -795,7 +794,7 @@
795
  "status": "pass",
796
  "task": "imu_to_hand_pose",
797
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
798
- "suite_position": "tasks_13_to_20",
799
  "model_family": "minimal_ridge_regression",
800
  "input": "Current IMU acceleration/gyroscope feature block only.",
801
  "split": "single_episode_chronological",
@@ -814,7 +813,7 @@
814
  "status": "pass",
815
  "task": "imu_to_hand_pose",
816
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
817
- "suite_position": "tasks_13_to_20",
818
  "model_family": "neural_mlp_regression",
819
  "input": "Current IMU acceleration/gyroscope feature block only.",
820
  "split": "single_episode_chronological",
@@ -864,7 +863,7 @@
864
  "status": "pass",
865
  "task": "camera_view_sync_retrieval",
866
  "task_display_name": "Camera-View Synchronization Retrieval",
867
- "suite_position": "tasks_13_to_20",
868
  "model_family": "minimal_ridge_projection_cosine_retrieval",
869
  "input": "Fisheye camera-1 feature query projected into fisheye camera-3 feature space.",
870
  "split": "single_episode_chronological",
@@ -885,7 +884,7 @@
885
  "status": "pass",
886
  "task": "camera_view_sync_retrieval",
887
  "task_display_name": "Camera-View Synchronization Retrieval",
888
- "suite_position": "tasks_13_to_20",
889
  "model_family": "neural_mlp_projection_cosine_retrieval",
890
  "input": "Fisheye camera-1 feature query projected into fisheye camera-3 feature space.",
891
  "split": "single_episode_chronological",
@@ -934,7 +933,7 @@
934
  "status": "pass",
935
  "task": "time_to_transition",
936
  "task_display_name": "Time-to-Next-Transition Regression",
937
- "suite_position": "tasks_13_to_20",
938
  "model_family": "minimal_ridge_regression",
939
  "input": "Current 20-frame non-caption multimodal window.",
940
  "split": "single_episode_chronological",
@@ -954,7 +953,7 @@
954
  "status": "pass",
955
  "task": "time_to_transition",
956
  "task_display_name": "Time-to-Next-Transition Regression",
957
- "suite_position": "tasks_13_to_20",
958
  "model_family": "neural_mlp_regression",
959
  "input": "Current 20-frame non-caption multimodal window.",
960
  "split": "single_episode_chronological",
 
2
  "title": "Ropedia Xperience-10M Unified 20-Task Provenance Bundle",
3
  "status": "pass",
4
  "generated_at_utc": "2026-06-16T06:25:58+00:00",
5
+ "suite_position": "unified_20_task_provenance",
6
  "legacy_path_note": "The tier2_task_suite file and directory names are retained for stable public links; this bundle is provenance inside the unified 20-task suite, not a separate public tier.",
7
+ "unified_task_integration": {
8
+ "total_task_count": 20,
9
+ "legacy_provenance_row_count": 8,
10
+ "shared_metrics": "docs/data/summary_metrics.json",
 
11
  "unified_protocol": "docs/data/evaluation_protocol.json"
12
  },
13
  "dataset_scope": {
 
27
  "raw_data_redistributed": false
28
  },
29
  "setup_alignment": {
30
+ "same_window_unit_as_unified_suite": true,
31
+ "same_feature_manifest_as_unified_suite": "results/episode_task_suite/feature_manifest.json",
32
+ "same_shared_tensor_as_unified_suite": "results/episode_task_suite/shared_windows.npz",
33
  "minimal_baselines": "softmax, ridge regression/projection, and ridge multilabel heads",
34
  "neural_baselines": "compact one-hidden-layer/two-layer PyTorch MLP heads with the same chronological split",
35
  "leakage_policy": "Caption-derived text features are removed whenever the target is a label, object, relation, interaction phrase, or future semantic state."
 
134
  "status": "pass",
135
  "task": "long_horizon_next_action",
136
  "task_display_name": "Long-Horizon Next-Action Forecasting",
137
+ "suite_position": "unified_20_task_provenance",
138
  "model_family": "minimal_softmax",
139
  "input": "Current 20-frame non-caption multimodal window.",
140
  "split": "single_episode_chronological",
 
220
  "status": "pass",
221
  "task": "long_horizon_next_action",
222
  "task_display_name": "Long-Horizon Next-Action Forecasting",
223
+ "suite_position": "unified_20_task_provenance",
224
  "model_family": "neural_mlp",
225
  "input": "Current 20-frame non-caption multimodal window.",
226
  "split": "single_episode_chronological",
 
275
  "status": "pass",
276
  "task": "next_subtask_forecast",
277
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
278
+ "suite_position": "unified_20_task_provenance",
279
  "model_family": "minimal_softmax",
280
  "input": "Current 20-frame non-caption multimodal window.",
281
  "split": "single_episode_chronological",
 
360
  "status": "pass",
361
  "task": "next_subtask_forecast",
362
  "task_display_name": "Long-Horizon Next-Subtask Forecasting",
363
+ "suite_position": "unified_20_task_provenance",
364
  "model_family": "neural_mlp",
365
  "input": "Current 20-frame non-caption multimodal window.",
366
  "split": "single_episode_chronological",
 
415
  "status": "pass",
416
  "task": "interaction_text_prediction",
417
  "task_display_name": "Interaction Text Prediction",
418
+ "suite_position": "unified_20_task_provenance",
419
  "model_family": "minimal_softmax",
420
  "input": "Current 20-frame sensor window with caption-text features removed.",
421
  "split": "single_episode_chronological",
 
511
  "status": "pass",
512
  "task": "interaction_text_prediction",
513
  "task_display_name": "Interaction Text Prediction",
514
+ "suite_position": "unified_20_task_provenance",
515
  "model_family": "neural_mlp",
516
  "input": "Current 20-frame sensor window with caption-text features removed.",
517
  "split": "single_episode_chronological",
 
566
  "status": "pass",
567
  "task": "action_object_relation",
568
  "task_display_name": "Action-Object Relation Prediction",
569
+ "suite_position": "unified_20_task_provenance",
570
  "model_family": "minimal_softmax",
571
  "input": "Current 20-frame sensor window with caption-text features removed.",
572
  "split": "single_episode_chronological",
 
658
  "status": "pass",
659
  "task": "action_object_relation",
660
  "task_display_name": "Action-Object Relation Prediction",
661
+ "suite_position": "unified_20_task_provenance",
662
  "model_family": "neural_mlp",
663
  "input": "Current 20-frame sensor window with caption-text features removed.",
664
  "split": "single_episode_chronological",
 
712
  "status": "pass",
713
  "task": "object_set_forecast",
714
  "task_display_name": "Future Object-Set Forecasting",
715
+ "suite_position": "unified_20_task_provenance",
716
  "model_family": "minimal_ridge_multilabel",
717
  "input": "Current 20-frame sensor window with caption-text features removed.",
718
  "split": "single_episode_chronological",
 
746
  "status": "pass",
747
  "task": "object_set_forecast",
748
  "task_display_name": "Future Object-Set Forecasting",
749
+ "suite_position": "unified_20_task_provenance",
750
  "model_family": "neural_mlp_multilabel",
751
  "input": "Current 20-frame sensor window with caption-text features removed.",
752
  "split": "single_episode_chronological",
 
794
  "status": "pass",
795
  "task": "imu_to_hand_pose",
796
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
797
+ "suite_position": "unified_20_task_provenance",
798
  "model_family": "minimal_ridge_regression",
799
  "input": "Current IMU acceleration/gyroscope feature block only.",
800
  "split": "single_episode_chronological",
 
813
  "status": "pass",
814
  "task": "imu_to_hand_pose",
815
  "task_display_name": "IMU-to-Hand Pose Reconstruction",
816
+ "suite_position": "unified_20_task_provenance",
817
  "model_family": "neural_mlp_regression",
818
  "input": "Current IMU acceleration/gyroscope feature block only.",
819
  "split": "single_episode_chronological",
 
863
  "status": "pass",
864
  "task": "camera_view_sync_retrieval",
865
  "task_display_name": "Camera-View Synchronization Retrieval",
866
+ "suite_position": "unified_20_task_provenance",
867
  "model_family": "minimal_ridge_projection_cosine_retrieval",
868
  "input": "Fisheye camera-1 feature query projected into fisheye camera-3 feature space.",
869
  "split": "single_episode_chronological",
 
884
  "status": "pass",
885
  "task": "camera_view_sync_retrieval",
886
  "task_display_name": "Camera-View Synchronization Retrieval",
887
+ "suite_position": "unified_20_task_provenance",
888
  "model_family": "neural_mlp_projection_cosine_retrieval",
889
  "input": "Fisheye camera-1 feature query projected into fisheye camera-3 feature space.",
890
  "split": "single_episode_chronological",
 
933
  "status": "pass",
934
  "task": "time_to_transition",
935
  "task_display_name": "Time-to-Next-Transition Regression",
936
+ "suite_position": "unified_20_task_provenance",
937
  "model_family": "minimal_ridge_regression",
938
  "input": "Current 20-frame non-caption multimodal window.",
939
  "split": "single_episode_chronological",
 
953
  "status": "pass",
954
  "task": "time_to_transition",
955
  "task_display_name": "Time-to-Next-Transition Regression",
956
+ "suite_position": "unified_20_task_provenance",
957
  "model_family": "neural_mlp_regression",
958
  "input": "Current 20-frame non-caption multimodal window.",
959
  "split": "single_episode_chronological",
data/unified_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T10:47:17+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
@@ -235,7 +235,7 @@
235
  "label": "Action Recognition",
236
  "axis_label": "01 Action Recognition",
237
  "short_label": "Action",
238
- "origin": "original_public_sample_tasks",
239
  "metric_key": "macro_f1",
240
  "metric_name": "macro-F1",
241
  "metric_direction": "higher",
@@ -346,7 +346,7 @@
346
  "label": "Procedure Step Recognition",
347
  "axis_label": "02 Procedure Step Recognition",
348
  "short_label": "Step",
349
- "origin": "original_public_sample_tasks",
350
  "metric_key": "macro_f1",
351
  "metric_name": "macro-F1",
352
  "metric_direction": "higher",
@@ -457,7 +457,7 @@
457
  "label": "Action Boundary Detection",
458
  "axis_label": "03 Action Boundary Detection",
459
  "short_label": "Boundary",
460
- "origin": "original_public_sample_tasks",
461
  "metric_key": "macro_f1",
462
  "metric_name": "macro-F1",
463
  "metric_direction": "higher",
@@ -568,7 +568,7 @@
568
  "label": "Next-Action Prediction",
569
  "axis_label": "04 Next-Action Prediction",
570
  "short_label": "Next act",
571
- "origin": "original_public_sample_tasks",
572
  "metric_key": "macro_f1",
573
  "metric_name": "macro-F1",
574
  "metric_direction": "higher",
@@ -679,7 +679,7 @@
679
  "label": "Hand Trajectory Forecasting",
680
  "axis_label": "05 Hand Trajectory Forecasting",
681
  "short_label": "Hand traj",
682
- "origin": "original_public_sample_tasks",
683
  "metric_key": "mpjpe",
684
  "metric_name": "MPJPE",
685
  "metric_direction": "lower",
@@ -790,7 +790,7 @@
790
  "label": "Contact State Prediction",
791
  "axis_label": "06 Contact State Prediction",
792
  "short_label": "Contact",
793
- "origin": "original_public_sample_tasks",
794
  "metric_key": "macro_f1",
795
  "metric_name": "macro-F1",
796
  "metric_direction": "higher",
@@ -901,7 +901,7 @@
901
  "label": "Object Relevance Prediction",
902
  "axis_label": "07 Object Relevance Prediction",
903
  "short_label": "Objects",
904
- "origin": "original_public_sample_tasks",
905
  "metric_key": "micro_f1",
906
  "metric_name": "micro-F1",
907
  "metric_direction": "higher",
@@ -1012,7 +1012,7 @@
1012
  "label": "Language Grounding",
1013
  "axis_label": "08 Language Grounding",
1014
  "short_label": "Language",
1015
- "origin": "original_public_sample_tasks",
1016
  "metric_key": "mrr",
1017
  "metric_name": "MRR",
1018
  "metric_direction": "higher",
@@ -1123,7 +1123,7 @@
1123
  "label": "Cross-Modal Retrieval",
1124
  "axis_label": "09 Cross-Modal Retrieval",
1125
  "short_label": "X-modal",
1126
- "origin": "original_public_sample_tasks",
1127
  "metric_key": "mrr",
1128
  "metric_name": "MRR",
1129
  "metric_direction": "higher",
@@ -1234,7 +1234,7 @@
1234
  "label": "Cross-Modal Reconstruction",
1235
  "axis_label": "10 Cross-Modal Reconstruction",
1236
  "short_label": "Recon",
1237
- "origin": "original_public_sample_tasks",
1238
  "metric_key": "r2",
1239
  "metric_name": "R2",
1240
  "metric_direction": "higher",
@@ -1345,7 +1345,7 @@
1345
  "label": "Temporal Order Verification",
1346
  "axis_label": "11 Temporal Order Verification",
1347
  "short_label": "Order",
1348
- "origin": "original_public_sample_tasks",
1349
  "metric_key": "f1",
1350
  "metric_name": "F1",
1351
  "metric_direction": "higher",
@@ -1456,7 +1456,7 @@
1456
  "label": "Multimodal Synchronization Detection",
1457
  "axis_label": "12 Multimodal Synchronization Detection",
1458
  "short_label": "Sync",
1459
- "origin": "original_public_sample_tasks",
1460
  "metric_key": "f1",
1461
  "metric_name": "F1",
1462
  "metric_direction": "higher",
@@ -1567,7 +1567,7 @@
1567
  "label": "Long-Horizon Next-Action Forecasting",
1568
  "axis_label": "13 Long-Horizon Next-Action Forecasting",
1569
  "short_label": "Long act",
1570
- "origin": "additional_public_sample_tasks",
1571
  "metric_key": "macro_f1",
1572
  "metric_name": "macro-F1",
1573
  "metric_direction": "higher",
@@ -1678,7 +1678,7 @@
1678
  "label": "Long-Horizon Next-Subtask Forecasting",
1679
  "axis_label": "14 Long-Horizon Next-Subtask Forecasting",
1680
  "short_label": "Long step",
1681
- "origin": "additional_public_sample_tasks",
1682
  "metric_key": "macro_f1",
1683
  "metric_name": "macro-F1",
1684
  "metric_direction": "higher",
@@ -1789,7 +1789,7 @@
1789
  "label": "Interaction Text Prediction",
1790
  "axis_label": "15 Interaction Text Prediction",
1791
  "short_label": "Interact txt",
1792
- "origin": "additional_public_sample_tasks",
1793
  "metric_key": "macro_f1",
1794
  "metric_name": "macro-F1",
1795
  "metric_direction": "higher",
@@ -1900,7 +1900,7 @@
1900
  "label": "Action-Object Relation Prediction",
1901
  "axis_label": "16 Action-Object Relation Prediction",
1902
  "short_label": "Act+obj",
1903
- "origin": "additional_public_sample_tasks",
1904
  "metric_key": "macro_f1",
1905
  "metric_name": "macro-F1",
1906
  "metric_direction": "higher",
@@ -2011,7 +2011,7 @@
2011
  "label": "Future Object-Set Forecasting",
2012
  "axis_label": "17 Future Object-Set Forecasting",
2013
  "short_label": "Future obj",
2014
- "origin": "additional_public_sample_tasks",
2015
  "metric_key": "micro_f1",
2016
  "metric_name": "micro-F1",
2017
  "metric_direction": "higher",
@@ -2122,7 +2122,7 @@
2122
  "label": "IMU-to-Hand Pose Reconstruction",
2123
  "axis_label": "18 IMU-to-Hand Pose Reconstruction",
2124
  "short_label": "IMU->hand",
2125
- "origin": "additional_public_sample_tasks",
2126
  "metric_key": "mae",
2127
  "metric_name": "MAE",
2128
  "metric_direction": "lower",
@@ -2233,7 +2233,7 @@
2233
  "label": "Camera-View Synchronization Retrieval",
2234
  "axis_label": "19 Camera-View Synchronization Retrieval",
2235
  "short_label": "Cam sync",
2236
- "origin": "additional_public_sample_tasks",
2237
  "metric_key": "mrr",
2238
  "metric_name": "MRR",
2239
  "metric_direction": "higher",
@@ -2344,7 +2344,7 @@
2344
  "label": "Time-to-Next-Transition Regression",
2345
  "axis_label": "20 Time-to-Next-Transition Regression",
2346
  "short_label": "Time2bdry",
2347
- "origin": "additional_public_sample_tasks",
2348
  "metric_key": "mae",
2349
  "metric_name": "MAE frames",
2350
  "metric_direction": "lower",
 
1
  {
2
  "title": "Unified 20-Task Model Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:20:34+00:00",
5
  "task_count": 20,
6
  "method_count": 9,
7
  "method_task_record_count": 180,
 
235
  "label": "Action Recognition",
236
  "axis_label": "01 Action Recognition",
237
  "short_label": "Action",
238
+ "provenance_source": "walkthrough_backed_task_contract",
239
  "metric_key": "macro_f1",
240
  "metric_name": "macro-F1",
241
  "metric_direction": "higher",
 
346
  "label": "Procedure Step Recognition",
347
  "axis_label": "02 Procedure Step Recognition",
348
  "short_label": "Step",
349
+ "provenance_source": "walkthrough_backed_task_contract",
350
  "metric_key": "macro_f1",
351
  "metric_name": "macro-F1",
352
  "metric_direction": "higher",
 
457
  "label": "Action Boundary Detection",
458
  "axis_label": "03 Action Boundary Detection",
459
  "short_label": "Boundary",
460
+ "provenance_source": "walkthrough_backed_task_contract",
461
  "metric_key": "macro_f1",
462
  "metric_name": "macro-F1",
463
  "metric_direction": "higher",
 
568
  "label": "Next-Action Prediction",
569
  "axis_label": "04 Next-Action Prediction",
570
  "short_label": "Next act",
571
+ "provenance_source": "walkthrough_backed_task_contract",
572
  "metric_key": "macro_f1",
573
  "metric_name": "macro-F1",
574
  "metric_direction": "higher",
 
679
  "label": "Hand Trajectory Forecasting",
680
  "axis_label": "05 Hand Trajectory Forecasting",
681
  "short_label": "Hand traj",
682
+ "provenance_source": "walkthrough_backed_task_contract",
683
  "metric_key": "mpjpe",
684
  "metric_name": "MPJPE",
685
  "metric_direction": "lower",
 
790
  "label": "Contact State Prediction",
791
  "axis_label": "06 Contact State Prediction",
792
  "short_label": "Contact",
793
+ "provenance_source": "walkthrough_backed_task_contract",
794
  "metric_key": "macro_f1",
795
  "metric_name": "macro-F1",
796
  "metric_direction": "higher",
 
901
  "label": "Object Relevance Prediction",
902
  "axis_label": "07 Object Relevance Prediction",
903
  "short_label": "Objects",
904
+ "provenance_source": "walkthrough_backed_task_contract",
905
  "metric_key": "micro_f1",
906
  "metric_name": "micro-F1",
907
  "metric_direction": "higher",
 
1012
  "label": "Language Grounding",
1013
  "axis_label": "08 Language Grounding",
1014
  "short_label": "Language",
1015
+ "provenance_source": "walkthrough_backed_task_contract",
1016
  "metric_key": "mrr",
1017
  "metric_name": "MRR",
1018
  "metric_direction": "higher",
 
1123
  "label": "Cross-Modal Retrieval",
1124
  "axis_label": "09 Cross-Modal Retrieval",
1125
  "short_label": "X-modal",
1126
+ "provenance_source": "walkthrough_backed_task_contract",
1127
  "metric_key": "mrr",
1128
  "metric_name": "MRR",
1129
  "metric_direction": "higher",
 
1234
  "label": "Cross-Modal Reconstruction",
1235
  "axis_label": "10 Cross-Modal Reconstruction",
1236
  "short_label": "Recon",
1237
+ "provenance_source": "walkthrough_backed_task_contract",
1238
  "metric_key": "r2",
1239
  "metric_name": "R2",
1240
  "metric_direction": "higher",
 
1345
  "label": "Temporal Order Verification",
1346
  "axis_label": "11 Temporal Order Verification",
1347
  "short_label": "Order",
1348
+ "provenance_source": "walkthrough_backed_task_contract",
1349
  "metric_key": "f1",
1350
  "metric_name": "F1",
1351
  "metric_direction": "higher",
 
1456
  "label": "Multimodal Synchronization Detection",
1457
  "axis_label": "12 Multimodal Synchronization Detection",
1458
  "short_label": "Sync",
1459
+ "provenance_source": "walkthrough_backed_task_contract",
1460
  "metric_key": "f1",
1461
  "metric_name": "F1",
1462
  "metric_direction": "higher",
 
1567
  "label": "Long-Horizon Next-Action Forecasting",
1568
  "axis_label": "13 Long-Horizon Next-Action Forecasting",
1569
  "short_label": "Long act",
1570
+ "provenance_source": "historical_result_bundle",
1571
  "metric_key": "macro_f1",
1572
  "metric_name": "macro-F1",
1573
  "metric_direction": "higher",
 
1678
  "label": "Long-Horizon Next-Subtask Forecasting",
1679
  "axis_label": "14 Long-Horizon Next-Subtask Forecasting",
1680
  "short_label": "Long step",
1681
+ "provenance_source": "historical_result_bundle",
1682
  "metric_key": "macro_f1",
1683
  "metric_name": "macro-F1",
1684
  "metric_direction": "higher",
 
1789
  "label": "Interaction Text Prediction",
1790
  "axis_label": "15 Interaction Text Prediction",
1791
  "short_label": "Interact txt",
1792
+ "provenance_source": "historical_result_bundle",
1793
  "metric_key": "macro_f1",
1794
  "metric_name": "macro-F1",
1795
  "metric_direction": "higher",
 
1900
  "label": "Action-Object Relation Prediction",
1901
  "axis_label": "16 Action-Object Relation Prediction",
1902
  "short_label": "Act+obj",
1903
+ "provenance_source": "historical_result_bundle",
1904
  "metric_key": "macro_f1",
1905
  "metric_name": "macro-F1",
1906
  "metric_direction": "higher",
 
2011
  "label": "Future Object-Set Forecasting",
2012
  "axis_label": "17 Future Object-Set Forecasting",
2013
  "short_label": "Future obj",
2014
+ "provenance_source": "historical_result_bundle",
2015
  "metric_key": "micro_f1",
2016
  "metric_name": "micro-F1",
2017
  "metric_direction": "higher",
 
2122
  "label": "IMU-to-Hand Pose Reconstruction",
2123
  "axis_label": "18 IMU-to-Hand Pose Reconstruction",
2124
  "short_label": "IMU->hand",
2125
+ "provenance_source": "historical_result_bundle",
2126
  "metric_key": "mae",
2127
  "metric_name": "MAE",
2128
  "metric_direction": "lower",
 
2233
  "label": "Camera-View Synchronization Retrieval",
2234
  "axis_label": "19 Camera-View Synchronization Retrieval",
2235
  "short_label": "Cam sync",
2236
+ "provenance_source": "historical_result_bundle",
2237
  "metric_key": "mrr",
2238
  "metric_name": "MRR",
2239
  "metric_direction": "higher",
 
2344
  "label": "Time-to-Next-Transition Regression",
2345
  "axis_label": "20 Time-to-Next-Transition Regression",
2346
  "short_label": "Time2bdry",
2347
+ "provenance_source": "historical_result_bundle",
2348
  "metric_key": "mae",
2349
  "metric_name": "MAE frames",
2350
  "metric_direction": "lower",
data/website_integrity.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-21T14:45:02+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
7
  "html_pages": 4,
8
- "local_references": 254,
9
  "external_reference_count": 157,
10
  "json_files": 55,
11
- "image_assets_referenced": 29,
12
  "failure_count": 0
13
  },
14
  "failures": {
@@ -81,7 +81,7 @@
81
  "status": "pass",
82
  "reason": "The project overview should appear before the deeper progress ledger.",
83
  "overview_index": 121816,
84
- "evidence_index": 167645
85
  },
86
  {
87
  "name": "project_status_links_json",
@@ -161,7 +161,7 @@
161
  "reason": "The evaluation protocol should appear before the deeper evidence ledger.",
162
  "overview_index": 121816,
163
  "protocol_index": 163835,
164
- "evidence_index": 167645
165
  },
166
  {
167
  "name": "evaluation_protocol_links_json",
@@ -277,8 +277,8 @@
277
  {
278
  "path": "index.html",
279
  "id_count": 96,
280
- "reference_count": 226,
281
- "image_count": 35
282
  },
283
  {
284
  "path": "research_roadmap.html",
@@ -301,7 +301,7 @@
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
- "bytes": 124294,
305
  "top_level_type": "dict"
306
  },
307
  {
@@ -316,12 +316,12 @@
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
- "bytes": 184992,
320
  "top_level_type": "dict"
321
  },
322
  {
323
  "path": "data/evaluation_protocol.json",
324
- "bytes": 24007,
325
  "top_level_type": "dict"
326
  },
327
  {
@@ -331,7 +331,7 @@
331
  },
332
  {
333
  "path": "data/figure_index.json",
334
- "bytes": 19469,
335
  "top_level_type": "dict"
336
  },
337
  {
@@ -351,7 +351,7 @@
351
  },
352
  {
353
  "path": "data/live_publication_status.json",
354
- "bytes": 189922,
355
  "top_level_type": "dict"
356
  },
357
  {
@@ -371,27 +371,27 @@
371
  },
372
  {
373
  "path": "data/omni_model_comparison.json",
374
- "bytes": 82088,
375
  "top_level_type": "dict"
376
  },
377
  {
378
  "path": "data/project_brief.json",
379
- "bytes": 4019,
380
  "top_level_type": "dict"
381
  },
382
  {
383
  "path": "data/project_manifest.json",
384
- "bytes": 5774,
385
  "top_level_type": "dict"
386
  },
387
  {
388
  "path": "data/project_packet.json",
389
- "bytes": 10009,
390
  "top_level_type": "dict"
391
  },
392
  {
393
  "path": "data/project_status.json",
394
- "bytes": 23255,
395
  "top_level_type": "dict"
396
  },
397
  {
@@ -401,7 +401,7 @@
401
  },
402
  {
403
  "path": "data/public_surface_qa.json",
404
- "bytes": 7690,
405
  "top_level_type": "dict"
406
  },
407
  {
@@ -441,7 +441,7 @@
441
  },
442
  {
443
  "path": "data/reproducibility_matrix.json",
444
- "bytes": 6815,
445
  "top_level_type": "dict"
446
  },
447
  {
@@ -466,7 +466,7 @@
466
  },
467
  {
468
  "path": "data/research_takeaways.json",
469
- "bytes": 7162,
470
  "top_level_type": "dict"
471
  },
472
  {
@@ -481,7 +481,7 @@
481
  },
482
  {
483
  "path": "data/single_episode_task_model_radar.json",
484
- "bytes": 51107,
485
  "top_level_type": "dict"
486
  },
487
  {
@@ -511,7 +511,7 @@
511
  },
512
  {
513
  "path": "data/task_suite_20.json",
514
- "bytes": 34597,
515
  "top_level_type": "dict"
516
  },
517
  {
@@ -536,7 +536,7 @@
536
  },
537
  {
538
  "path": "data/tier2_task_suite.json",
539
- "bytes": 33411,
540
  "top_level_type": "dict"
541
  },
542
  {
@@ -551,7 +551,7 @@
551
  },
552
  {
553
  "path": "data/unified_task_model_radar.json",
554
- "bytes": 228815,
555
  "top_level_type": "dict"
556
  },
557
  {
@@ -656,13 +656,6 @@
656
  "format": "SVG",
657
  "has_viewbox": true
658
  },
659
- {
660
- "path": "assets/charts/tier2_task_suite.svg",
661
- "exists": true,
662
- "bytes": 5453,
663
- "format": "SVG",
664
- "has_viewbox": true
665
- },
666
  {
667
  "path": "assets/charts/two_evidence_line_map.svg",
668
  "exists": true,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-21T15:21:58+00:00",
4
  "docs_root": "docs",
5
  "site_base": "/ropedia-xperience-10m-task-suite/",
6
  "summary": {
7
  "html_pages": 4,
8
+ "local_references": 256,
9
  "external_reference_count": 157,
10
  "json_files": 55,
11
+ "image_assets_referenced": 28,
12
  "failure_count": 0
13
  },
14
  "failures": {
 
81
  "status": "pass",
82
  "reason": "The project overview should appear before the deeper progress ledger.",
83
  "overview_index": 121816,
84
+ "evidence_index": 167655
85
  },
86
  {
87
  "name": "project_status_links_json",
 
161
  "reason": "The evaluation protocol should appear before the deeper evidence ledger.",
162
  "overview_index": 121816,
163
  "protocol_index": 163835,
164
+ "evidence_index": 167655
165
  },
166
  {
167
  "name": "evaluation_protocol_links_json",
 
277
  {
278
  "path": "index.html",
279
  "id_count": 96,
280
+ "reference_count": 228,
281
+ "image_count": 34
282
  },
283
  {
284
  "path": "research_roadmap.html",
 
301
  },
302
  {
303
  "path": "data/artifact_index.json",
304
+ "bytes": 124341,
305
  "top_level_type": "dict"
306
  },
307
  {
 
316
  },
317
  {
318
  "path": "data/episode128_task_model_radar.json",
319
+ "bytes": 185212,
320
  "top_level_type": "dict"
321
  },
322
  {
323
  "path": "data/evaluation_protocol.json",
324
+ "bytes": 24267,
325
  "top_level_type": "dict"
326
  },
327
  {
 
331
  },
332
  {
333
  "path": "data/figure_index.json",
334
+ "bytes": 19485,
335
  "top_level_type": "dict"
336
  },
337
  {
 
351
  },
352
  {
353
  "path": "data/live_publication_status.json",
354
+ "bytes": 189990,
355
  "top_level_type": "dict"
356
  },
357
  {
 
371
  },
372
  {
373
  "path": "data/omni_model_comparison.json",
374
+ "bytes": 82102,
375
  "top_level_type": "dict"
376
  },
377
  {
378
  "path": "data/project_brief.json",
379
+ "bytes": 4032,
380
  "top_level_type": "dict"
381
  },
382
  {
383
  "path": "data/project_manifest.json",
384
+ "bytes": 5739,
385
  "top_level_type": "dict"
386
  },
387
  {
388
  "path": "data/project_packet.json",
389
+ "bytes": 10018,
390
  "top_level_type": "dict"
391
  },
392
  {
393
  "path": "data/project_status.json",
394
+ "bytes": 23232,
395
  "top_level_type": "dict"
396
  },
397
  {
 
401
  },
402
  {
403
  "path": "data/public_surface_qa.json",
404
+ "bytes": 7691,
405
  "top_level_type": "dict"
406
  },
407
  {
 
441
  },
442
  {
443
  "path": "data/reproducibility_matrix.json",
444
+ "bytes": 6836,
445
  "top_level_type": "dict"
446
  },
447
  {
 
466
  },
467
  {
468
  "path": "data/research_takeaways.json",
469
+ "bytes": 7165,
470
  "top_level_type": "dict"
471
  },
472
  {
 
481
  },
482
  {
483
  "path": "data/single_episode_task_model_radar.json",
484
+ "bytes": 51327,
485
  "top_level_type": "dict"
486
  },
487
  {
 
511
  },
512
  {
513
  "path": "data/task_suite_20.json",
514
+ "bytes": 34805,
515
  "top_level_type": "dict"
516
  },
517
  {
 
536
  },
537
  {
538
  "path": "data/tier2_task_suite.json",
539
+ "bytes": 33575,
540
  "top_level_type": "dict"
541
  },
542
  {
 
551
  },
552
  {
553
  "path": "data/unified_task_model_radar.json",
554
+ "bytes": 229035,
555
  "top_level_type": "dict"
556
  },
557
  {
 
656
  "format": "SVG",
657
  "has_viewbox": true
658
  },
 
 
 
 
 
 
 
659
  {
660
  "path": "assets/charts/two_evidence_line_map.svg",
661
  "exists": true,
index.html CHANGED
@@ -4787,7 +4787,7 @@
4787
  <article class="artifact"><h3>Split policy</h3><p>Single-episode chronological 70/30 train/test split. This avoids random future-window mixing; cross-episode generalization is measured in the later multi-episode pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVALUATION_PROTOCOL.md">protocol document</a></article>
4788
  <article class="artifact"><h3>Metric contract</h3><p>All 20 tasks list input, target, primary metric, baseline score, and source artifact path in the unified suite file.</p><a href="data/task_suite_20.json">task_suite_20.json</a></article>
4789
  <article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
4790
- <article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across the original task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
4791
  <article class="artifact"><h3>Foundation track selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 is the world-model track with a camera-pose proxy forward-dynamics contract ready for trainer work, policy models wait for robot-compatible action targets, and Xperience-native pretraining remains a later full-corpus goal.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
4792
  <article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. The selected multi-episode Qwen3-Omni final diagnostic result is verified and meets the JSON-validity target; Cosmos3-Nano has a verified future-window compatibility package; and Cosmos3-Super has a verified base-weight JSON-task evaluation plus a fine-tuned forward-dynamics LoRA branch. The next stage is action/subtask error analysis, stronger model-quality runs, and policy-target conversion.</p><a href="data/omni_model_comparison.json">result comparison</a></article>
4793
  <article class="artifact"><h3>128-Episode Task Suite Enhancement Pack</h3><p>Before adding episodes, the suite should try `multiscale_20s10_40s20_80s40`, hierarchical action/subtask targets, label-normalized scoring, and compact raw-feature shards for unsupported tasks.</p><a href="data/task_suite_enhancement_128.json">task_suite_enhancement_128.json</a></article>
@@ -4824,7 +4824,7 @@
4824
  <article class="evidence-card">
4825
  <span class="status-pill">verified</span>
4826
  <h3>Audio contribution is measured task by task</h3>
4827
- <p>Audio variants improve the primary metric on 6 of the original task contracts in this single-episode setting.</p>
4828
  <div class="evidence-links">
4829
  <a href="data/audio_ablation_summary.json">audio summary</a>
4830
  <a href="assets/charts/audio_ablation_delta.svg">delta chart</a>
@@ -5463,7 +5463,7 @@
5463
  <section id="directions" data-project-tab="directions" role="tabpanel" aria-labelledby="tab-directions" tabindex="-1">
5464
  <div class="wrap">
5465
  <div class="section-head">
5466
- <h2>The original tasks organized into four research directions.</h2>
5467
  <p>Each task is mapped as direct, proxy, or diagnostic evidence for the Ropedia research tracks. The mapping uses two current baselines: minimal interpretable heads and neural MLP heads over the same feature contract.</p>
5468
  </div>
5469
  <div class="direction-grid">
@@ -5510,76 +5510,18 @@
5510
  <div class="wrap">
5511
  <div class="section-head">
5512
  <h2>Unified 20-task evidence and provenance.</h2>
5513
- <p>All 20 tasks now live in the same task table, task-card grid, radar, and 180-record result matrix. The chart below is retained as provenance for the historically named result bundle, not as a separate task tier.</p>
5514
- </div>
5515
- <img class="chart" src="assets/charts/tier2_task_suite.svg?v=xperience10m-tier2" alt="Historical additional-task provenance chart for the unified Xperience-10M 20-task suite">
5516
- <div class="extension-grid">
5517
- <article class="extension-card">
5518
- <span class="status-pill">Task 13 / forecast</span>
5519
- <h3>Long-Horizon Next-Action Forecasting</h3>
5520
- <p><strong>Input:</strong> current non-caption multimodal window.</p>
5521
- <p><strong>Output:</strong> action label five seconds later.</p>
5522
- <div class="extension-metrics"><span><strong>0.0750</strong>minimal macro-F1</span><span><strong>0.0655</strong>neural macro-F1</span></div>
5523
- </article>
5524
- <article class="extension-card">
5525
- <span class="status-pill">Task 14 / procedure</span>
5526
- <h3>Long-Horizon Next-Subtask Forecasting</h3>
5527
- <p><strong>Input:</strong> current non-caption multimodal window.</p>
5528
- <p><strong>Output:</strong> procedure subtask five seconds later.</p>
5529
- <div class="extension-metrics"><span><strong>0.0455</strong>minimal macro-F1</span><span><strong>0.0507</strong>neural macro-F1</span></div>
5530
- </article>
5531
- <article class="extension-card">
5532
- <span class="status-pill">Task 15 / language</span>
5533
- <h3>Interaction Text Prediction</h3>
5534
- <p><strong>Input:</strong> current sensor window with caption features removed.</p>
5535
- <p><strong>Output:</strong> raw annotation interaction phrase.</p>
5536
- <div class="extension-metrics"><span><strong>0.0444</strong>minimal macro-F1</span><span><strong>0.0381</strong>neural macro-F1</span></div>
5537
- </article>
5538
- <article class="extension-card">
5539
- <span class="status-pill">Task 16 / relation</span>
5540
- <h3>Action-Object Relation Prediction</h3>
5541
- <p><strong>Input:</strong> current sensor window with caption features removed.</p>
5542
- <p><strong>Output:</strong> joint action plus active object-set label.</p>
5543
- <div class="extension-metrics"><span><strong>0.0000</strong>minimal macro-F1</span><span><strong>0.0000</strong>neural macro-F1</span></div>
5544
- </article>
5545
- <article class="extension-card">
5546
- <span class="status-pill">Task 17 / objects</span>
5547
- <h3>Future Object-Set Forecasting</h3>
5548
- <p><strong>Input:</strong> current sensor window with caption features removed.</p>
5549
- <p><strong>Output:</strong> object set active five seconds later.</p>
5550
- <div class="extension-metrics"><span><strong>0.1694</strong>minimal micro-F1</span><span><strong>0.1972</strong>neural micro-F1</span></div>
5551
- </article>
5552
- <article class="extension-card">
5553
- <span class="status-pill">Task 18 / sensor bridge</span>
5554
- <h3>IMU-to-Hand Pose Reconstruction</h3>
5555
- <p><strong>Input:</strong> IMU acceleration and gyroscope features only.</p>
5556
- <p><strong>Output:</strong> current left/right hand joint feature blocks.</p>
5557
- <div class="extension-metrics"><span><strong>0.0420</strong>minimal MAE</span><span><strong>0.0426</strong>neural MAE</span></div>
5558
- </article>
5559
- <article class="extension-card">
5560
- <span class="status-pill">Task 19 / camera sync</span>
5561
- <h3>Camera-View Synchronization Retrieval</h3>
5562
- <p><strong>Input:</strong> fisheye camera-1 feature query.</p>
5563
- <p><strong>Output:</strong> synchronized fisheye camera-3 window rank.</p>
5564
- <div class="extension-metrics"><span><strong>0.4943</strong>minimal MRR</span><span><strong>0.2409</strong>neural MRR</span></div>
5565
- </article>
5566
- <article class="extension-card">
5567
- <span class="status-pill">Task 20 / timing</span>
5568
- <h3>Time-to-Next-Transition Regression</h3>
5569
- <p><strong>Input:</strong> current non-caption multimodal window.</p>
5570
- <p><strong>Output:</strong> capped frames until the next action boundary.</p>
5571
- <div class="extension-metrics"><span><strong>10.5374</strong>minimal MAE frames</span><span><strong>10.5545</strong>neural MAE frames</span></div>
5572
- </article>
5573
  </div>
5574
  <div class="callout-row">
5575
  <div class="callout">
5576
  <h3>Unified task artifact package</h3>
5577
- <p>The public task package has the 20-task JSON, per-task metrics, prediction/rank files, Markdown summaries, and charts generated from the local public-sample annotation and committed shared-window tensor.</p>
5578
- <p><a href="data/task_suite_20.json">Open unified 20-task JSON</a> · <a href="data/tier2_task_suite.json">Open historical provenance JSON</a></p>
5579
  </div>
5580
  <div class="callout">
5581
  <h3>One setup, one task surface</h3>
5582
  <p>Every task uses the same 20-frame window unit, 5-frame stride, 8,546-dimensional feature manifest, chronological split discipline, and minimal/neural comparison pattern unless a task-specific leakage rule removes target-side features.</p>
 
5583
  </div>
5584
  </div>
5585
  <img class="chart" src="assets/charts/research_direction_extension_tasks.svg?v=xperience10m-ext" alt="Four Xperience-10M research-direction extension probes with minimal and neural metrics">
@@ -5633,7 +5575,7 @@
5633
  <section id="architectures" data-project-tab="method" role="tabpanel" aria-labelledby="tab-method" tabindex="-1">
5634
  <div class="wrap">
5635
  <div class="section-head">
5636
- <h2>The original task heads share four head families.</h2>
5637
  <p>The diagram separates the shared episode-window representation from the task-specific heads, so the task contracts stay readable before scaling to larger models.</p>
5638
  </div>
5639
  <img class="architecture-image" src="assets/task_architectures.png?v=xperience10m-nn" alt="Verified minimal and neural architecture diagram for Ropedia Xperience-10M task heads">
@@ -5732,7 +5674,7 @@
5732
  <img class="chart" src="assets/charts/cross_modal_retrieval.svg" alt="Cross modal retrieval chart">
5733
  <img class="chart" src="assets/charts/episode_task_scores_neural_mlp.svg" alt="Neural MLP task score chart">
5734
  <img class="chart" src="assets/charts/episode_task_scores_minimal_vs_neural.svg" alt="Minimal versus neural score chart">
5735
- <img class="chart" src="assets/charts/audio_ablation_delta.svg" alt="Measured audio delta chart across original task contracts">
5736
  </div>
5737
  <p class="section-note"><a href="single_episode_explorer.html">Open the single-episode explorer</a> to inspect window-level labels, predictions, modality statistics, object labels, and diagnostic scores. The <a href="data/audio_ablation_summary.json">audio ablation summary</a> records the task-by-task audio contribution.</p>
5738
  </div>
@@ -5861,9 +5803,9 @@
5861
  <article class="artifact"><h3>Windows table</h3><p>Window start/end frames and aligned action/subtask labels for the public sample episode.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/episode_task_suite/windows.csv">window table</a></article>
5862
  <article class="artifact"><h3>Feature inputs</h3><p>Source map for the current modality inputs used by the task suite.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/episode_task_suite/feature_manifest.json">feature inputs</a></article>
5863
  <article class="artifact"><h3>Neural MLP task results</h3><p>Per-task PyTorch MLP metrics, predictions, histories, and checkpoints for the unified task contracts, with historical result-bundle paths retained for provenance.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/neural_mlp">neural MLP outputs</a></article>
5864
- <article class="artifact"><h3>Four-direction taxonomy</h3><p>Maps the original tasks to the four research tracks: human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/research_directions">research direction outputs</a></article>
5865
  <article class="artifact"><h3>Direction extension probes</h3><p>Four coded probes, one per research direction, with minimal and neural metrics plus prediction/rank CSVs.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/research_direction_extensions">extension probe outputs</a></article>
5866
- <article class="artifact"><h3>Task walkthroughs</h3><p>Case studies for the original tasks, including input, middle process modules, output, metric, limitation, and task-player data.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/task_walkthroughs">walkthrough outputs</a></article>
5867
  <article class="artifact"><h3>Audio ablation and raw upgrade</h3><p>All 72 task/variant rows comparing current audio, no audio, raw audio, replacement, and combined-input settings.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/audio_ablation">audio ablation outputs</a></article>
5868
  <article class="artifact"><h3>Single-episode explorer</h3><p>Interactive window-level view of labels, predictions, modality statistics, object labels, and diagnostics.</p><a href="single_episode_explorer.html">open explorer</a></article>
5869
  <article class="artifact"><h3>Cross-modal retrieval</h3><p>The strongest self-supervised signal from the single episode.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/episode_task_suite/cross_modal_retrieval/metrics.json">retrieval metrics</a></article>
@@ -5917,7 +5859,7 @@
5917
  <div class="artifact-grid">
5918
  <article class="artifact"><h3>Project brief</h3><p>The fastest written overview of the dataset sample, tasks, baselines, and scale-up plan.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_BRIEF.md">brief</a></article>
5919
  <article class="artifact"><h3>Glossary</h3><p>Plain-language definitions for the terms most likely to confuse first-time readers and reviewers.</p><a href="data/glossary.json">glossary</a></article>
5920
- <article class="artifact"><h3>Task walkthroughs</h3><p>Human-readable case studies for the original tasks, including input, process modules, output, metric, and limitation.</p><a href="data/task_walkthroughs.json">walkthroughs</a></article>
5921
  <article class="artifact"><h3>Task results</h3><p>Minimal and neural-head metrics for the same sample windows and chronological split.</p><a href="data/summary_metrics.json">metrics</a></article>
5922
  <article class="artifact"><h3>Visual figures</h3><p>Task-suite map, modality atlas, pipeline diagram, model architecture figure, and Qwen3-Omni LoRA training-flow figure.</p><a href="assets/task_suite_infographic.png">task-suite figure</a></article>
5923
  <article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
 
4787
  <article class="artifact"><h3>Split policy</h3><p>Single-episode chronological 70/30 train/test split. This avoids random future-window mixing; cross-episode generalization is measured in the later multi-episode pilot.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/EVALUATION_PROTOCOL.md">protocol document</a></article>
4788
  <article class="artifact"><h3>Metric contract</h3><p>All 20 tasks list input, target, primary metric, baseline score, and source artifact path in the unified suite file.</p><a href="data/task_suite_20.json">task_suite_20.json</a></article>
4789
  <article class="artifact"><h3>Leakage controls</h3><p>Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/scripts/build_evaluation_protocol.py">builder script</a></article>
4790
+ <article class="artifact"><h3>Audio ablation</h3><p>Audio and no-audio variants are evaluated across the walkthrough-backed task contracts under the same chronological split.</p><a href="data/audio_ablation_summary.json">audio summary</a></article>
4791
  <article class="artifact"><h3>Foundation track selection</h3><p>Qwen3-Omni is the first trainable baseline, Cosmos 3 is the world-model track with a camera-pose proxy forward-dynamics contract ready for trainer work, policy models wait for robot-compatible action targets, and Xperience-native pretraining remains a later full-corpus goal.</p><a href="data/foundation_model_plan.json">backbone plan</a></article>
4792
  <article class="artifact"><h3>Next evaluation stage</h3><p>This public-sample run covers single-episode task development. The selected multi-episode Qwen3-Omni final diagnostic result is verified and meets the JSON-validity target; Cosmos3-Nano has a verified future-window compatibility package; and Cosmos3-Super has a verified base-weight JSON-task evaluation plus a fine-tuned forward-dynamics LoRA branch. The next stage is action/subtask error analysis, stronger model-quality runs, and policy-target conversion.</p><a href="data/omni_model_comparison.json">result comparison</a></article>
4793
  <article class="artifact"><h3>128-Episode Task Suite Enhancement Pack</h3><p>Before adding episodes, the suite should try `multiscale_20s10_40s20_80s40`, hierarchical action/subtask targets, label-normalized scoring, and compact raw-feature shards for unsupported tasks.</p><a href="data/task_suite_enhancement_128.json">task_suite_enhancement_128.json</a></article>
 
4824
  <article class="evidence-card">
4825
  <span class="status-pill">verified</span>
4826
  <h3>Audio contribution is measured task by task</h3>
4827
+ <p>Audio variants improve the primary metric on 6 walkthrough-backed task contracts in this single-episode setting.</p>
4828
  <div class="evidence-links">
4829
  <a href="data/audio_ablation_summary.json">audio summary</a>
4830
  <a href="assets/charts/audio_ablation_delta.svg">delta chart</a>
 
5463
  <section id="directions" data-project-tab="directions" role="tabpanel" aria-labelledby="tab-directions" tabindex="-1">
5464
  <div class="wrap">
5465
  <div class="section-head">
5466
+ <h2>The walkthrough-backed tasks organized into four research directions.</h2>
5467
  <p>Each task is mapped as direct, proxy, or diagnostic evidence for the Ropedia research tracks. The mapping uses two current baselines: minimal interpretable heads and neural MLP heads over the same feature contract.</p>
5468
  </div>
5469
  <div class="direction-grid">
 
5510
  <div class="wrap">
5511
  <div class="section-head">
5512
  <h2>Unified 20-task evidence and provenance.</h2>
5513
+ <p>All 20 tasks live in the same task table, task-card grid, radar, and 180-record result matrix. Historical result paths are retained only for exact provenance links.</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5514
  </div>
5515
  <div class="callout-row">
5516
  <div class="callout">
5517
  <h3>Unified task artifact package</h3>
5518
+ <p>The public task package has one 20-task JSON, per-task metrics, prediction/rank files, Markdown summaries, radar charts, and the 180-record method-task matrix.</p>
5519
+ <p><a href="data/task_suite_20.json">Open unified 20-task JSON</a> · <a href="data/task_method_20_result_matrix.json">Open 180-record matrix</a> · <a href="assets/charts/unified_task_model_radar.svg">Open unified radar</a></p>
5520
  </div>
5521
  <div class="callout">
5522
  <h3>One setup, one task surface</h3>
5523
  <p>Every task uses the same 20-frame window unit, 5-frame stride, 8,546-dimensional feature manifest, chronological split discipline, and minimal/neural comparison pattern unless a task-specific leakage rule removes target-side features.</p>
5524
+ <p><a href="data/tier2_task_suite.json">Historical provenance JSON</a> and <a href="assets/charts/tier2_task_suite.svg">historical provenance chart</a> remain available for exact source tracing.</p>
5525
  </div>
5526
  </div>
5527
  <img class="chart" src="assets/charts/research_direction_extension_tasks.svg?v=xperience10m-ext" alt="Four Xperience-10M research-direction extension probes with minimal and neural metrics">
 
5575
  <section id="architectures" data-project-tab="method" role="tabpanel" aria-labelledby="tab-method" tabindex="-1">
5576
  <div class="wrap">
5577
  <div class="section-head">
5578
+ <h2>The baseline task heads share four head families.</h2>
5579
  <p>The diagram separates the shared episode-window representation from the task-specific heads, so the task contracts stay readable before scaling to larger models.</p>
5580
  </div>
5581
  <img class="architecture-image" src="assets/task_architectures.png?v=xperience10m-nn" alt="Verified minimal and neural architecture diagram for Ropedia Xperience-10M task heads">
 
5674
  <img class="chart" src="assets/charts/cross_modal_retrieval.svg" alt="Cross modal retrieval chart">
5675
  <img class="chart" src="assets/charts/episode_task_scores_neural_mlp.svg" alt="Neural MLP task score chart">
5676
  <img class="chart" src="assets/charts/episode_task_scores_minimal_vs_neural.svg" alt="Minimal versus neural score chart">
5677
+ <img class="chart" src="assets/charts/audio_ablation_delta.svg" alt="Measured audio delta chart across walkthrough-backed task contracts">
5678
  </div>
5679
  <p class="section-note"><a href="single_episode_explorer.html">Open the single-episode explorer</a> to inspect window-level labels, predictions, modality statistics, object labels, and diagnostic scores. The <a href="data/audio_ablation_summary.json">audio ablation summary</a> records the task-by-task audio contribution.</p>
5680
  </div>
 
5803
  <article class="artifact"><h3>Windows table</h3><p>Window start/end frames and aligned action/subtask labels for the public sample episode.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/episode_task_suite/windows.csv">window table</a></article>
5804
  <article class="artifact"><h3>Feature inputs</h3><p>Source map for the current modality inputs used by the task suite.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/episode_task_suite/feature_manifest.json">feature inputs</a></article>
5805
  <article class="artifact"><h3>Neural MLP task results</h3><p>Per-task PyTorch MLP metrics, predictions, histories, and checkpoints for the unified task contracts, with historical result-bundle paths retained for provenance.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/neural_mlp">neural MLP outputs</a></article>
5806
+ <article class="artifact"><h3>Four-direction taxonomy</h3><p>Maps the walkthrough-backed task contracts to the four research tracks: human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/research_directions">research direction outputs</a></article>
5807
  <article class="artifact"><h3>Direction extension probes</h3><p>Four coded probes, one per research direction, with minimal and neural metrics plus prediction/rank CSVs.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/research_direction_extensions">extension probe outputs</a></article>
5808
+ <article class="artifact"><h3>Task walkthroughs</h3><p>Case studies for the walkthrough-backed task contracts, including input, middle process modules, output, metric, limitation, and task-player data.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/episode_task_suite/task_walkthroughs">walkthrough outputs</a></article>
5809
  <article class="artifact"><h3>Audio ablation and raw upgrade</h3><p>All 72 task/variant rows comparing current audio, no audio, raw audio, replacement, and combined-input settings.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/tree/main/results/audio_ablation">audio ablation outputs</a></article>
5810
  <article class="artifact"><h3>Single-episode explorer</h3><p>Interactive window-level view of labels, predictions, modality statistics, object labels, and diagnostics.</p><a href="single_episode_explorer.html">open explorer</a></article>
5811
  <article class="artifact"><h3>Cross-modal retrieval</h3><p>The strongest self-supervised signal from the single episode.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/results/episode_task_suite/cross_modal_retrieval/metrics.json">retrieval metrics</a></article>
 
5859
  <div class="artifact-grid">
5860
  <article class="artifact"><h3>Project brief</h3><p>The fastest written overview of the dataset sample, tasks, baselines, and scale-up plan.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PROJECT_BRIEF.md">brief</a></article>
5861
  <article class="artifact"><h3>Glossary</h3><p>Plain-language definitions for the terms most likely to confuse first-time readers and reviewers.</p><a href="data/glossary.json">glossary</a></article>
5862
+ <article class="artifact"><h3>Task walkthroughs</h3><p>Human-readable case studies for the walkthrough-backed task contracts, including input, process modules, output, metric, and limitation.</p><a href="data/task_walkthroughs.json">walkthroughs</a></article>
5863
  <article class="artifact"><h3>Task results</h3><p>Minimal and neural-head metrics for the same sample windows and chronological split.</p><a href="data/summary_metrics.json">metrics</a></article>
5864
  <article class="artifact"><h3>Visual figures</h3><p>Task-suite map, modality atlas, pipeline diagram, model architecture figure, and Qwen3-Omni LoRA training-flow figure.</p><a href="assets/task_suite_infographic.png">task-suite figure</a></article>
5865
  <article class="artifact"><h3>Dataset notes</h3><p>Official dataset links, public sample source, modalities, access boundary, and current project subset.</p><a href="https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/XPERIENCE10M_DATASET_CARD_ALIGNMENT.md">dataset notes</a></article>
metrics/episode128_task_model_radar.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T10:47:17+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3-Omni v6, Cosmos3-Super, and Cosmos3-Nano diagnostics. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
@@ -192,7 +192,7 @@
192
  "label": "Action Recognition",
193
  "axis_label": "01 Action Recognition",
194
  "short_label": "Action",
195
- "origin": "original_public_sample_tasks",
196
  "metric_key": "macro_f1",
197
  "metric_name": "macro-F1",
198
  "metric_direction": "higher",
@@ -283,7 +283,7 @@
283
  "label": "Procedure Step Recognition",
284
  "axis_label": "02 Procedure Step Recognition",
285
  "short_label": "Step",
286
- "origin": "original_public_sample_tasks",
287
  "metric_key": "macro_f1",
288
  "metric_name": "macro-F1",
289
  "metric_direction": "higher",
@@ -374,7 +374,7 @@
374
  "label": "Action Boundary Detection",
375
  "axis_label": "03 Action Boundary Detection",
376
  "short_label": "Boundary",
377
- "origin": "original_public_sample_tasks",
378
  "metric_key": "macro_f1",
379
  "metric_name": "macro-F1",
380
  "metric_direction": "higher",
@@ -465,7 +465,7 @@
465
  "label": "Next-Action Prediction",
466
  "axis_label": "04 Next-Action Prediction",
467
  "short_label": "Next act",
468
- "origin": "original_public_sample_tasks",
469
  "metric_key": "macro_f1",
470
  "metric_name": "macro-F1",
471
  "metric_direction": "higher",
@@ -556,7 +556,7 @@
556
  "label": "Hand Trajectory Forecasting",
557
  "axis_label": "05 Hand Trajectory Forecasting",
558
  "short_label": "Hand traj",
559
- "origin": "original_public_sample_tasks",
560
  "metric_key": "mpjpe",
561
  "metric_name": "MPJPE",
562
  "metric_direction": "lower",
@@ -647,7 +647,7 @@
647
  "label": "Contact State Prediction",
648
  "axis_label": "06 Contact State Prediction",
649
  "short_label": "Contact",
650
- "origin": "original_public_sample_tasks",
651
  "metric_key": "macro_f1",
652
  "metric_name": "macro-F1",
653
  "metric_direction": "higher",
@@ -738,7 +738,7 @@
738
  "label": "Object Relevance Prediction",
739
  "axis_label": "07 Object Relevance Prediction",
740
  "short_label": "Objects",
741
- "origin": "original_public_sample_tasks",
742
  "metric_key": "micro_f1",
743
  "metric_name": "micro-F1",
744
  "metric_direction": "higher",
@@ -829,7 +829,7 @@
829
  "label": "Language Grounding",
830
  "axis_label": "08 Language Grounding",
831
  "short_label": "Language",
832
- "origin": "original_public_sample_tasks",
833
  "metric_key": "mrr",
834
  "metric_name": "MRR",
835
  "metric_direction": "higher",
@@ -920,7 +920,7 @@
920
  "label": "Cross-Modal Retrieval",
921
  "axis_label": "09 Cross-Modal Retrieval",
922
  "short_label": "X-modal",
923
- "origin": "original_public_sample_tasks",
924
  "metric_key": "mrr",
925
  "metric_name": "MRR",
926
  "metric_direction": "higher",
@@ -1011,7 +1011,7 @@
1011
  "label": "Cross-Modal Reconstruction",
1012
  "axis_label": "10 Cross-Modal Reconstruction",
1013
  "short_label": "Recon",
1014
- "origin": "original_public_sample_tasks",
1015
  "metric_key": "r2",
1016
  "metric_name": "R2",
1017
  "metric_direction": "higher",
@@ -1102,7 +1102,7 @@
1102
  "label": "Temporal Order Verification",
1103
  "axis_label": "11 Temporal Order Verification",
1104
  "short_label": "Order",
1105
- "origin": "original_public_sample_tasks",
1106
  "metric_key": "f1",
1107
  "metric_name": "F1",
1108
  "metric_direction": "higher",
@@ -1193,7 +1193,7 @@
1193
  "label": "Multimodal Synchronization Detection",
1194
  "axis_label": "12 Multimodal Synchronization Detection",
1195
  "short_label": "Sync",
1196
- "origin": "original_public_sample_tasks",
1197
  "metric_key": "f1",
1198
  "metric_name": "F1",
1199
  "metric_direction": "higher",
@@ -1284,7 +1284,7 @@
1284
  "label": "Long-Horizon Next-Action Forecasting",
1285
  "axis_label": "13 Long-Horizon Next-Action Forecasting",
1286
  "short_label": "Long act",
1287
- "origin": "additional_public_sample_tasks",
1288
  "metric_key": "macro_f1",
1289
  "metric_name": "macro-F1",
1290
  "metric_direction": "higher",
@@ -1375,7 +1375,7 @@
1375
  "label": "Long-Horizon Next-Subtask Forecasting",
1376
  "axis_label": "14 Long-Horizon Next-Subtask Forecasting",
1377
  "short_label": "Long step",
1378
- "origin": "additional_public_sample_tasks",
1379
  "metric_key": "macro_f1",
1380
  "metric_name": "macro-F1",
1381
  "metric_direction": "higher",
@@ -1466,7 +1466,7 @@
1466
  "label": "Interaction Text Prediction",
1467
  "axis_label": "15 Interaction Text Prediction",
1468
  "short_label": "Interact txt",
1469
- "origin": "additional_public_sample_tasks",
1470
  "metric_key": "macro_f1",
1471
  "metric_name": "macro-F1",
1472
  "metric_direction": "higher",
@@ -1557,7 +1557,7 @@
1557
  "label": "Action-Object Relation Prediction",
1558
  "axis_label": "16 Action-Object Relation Prediction",
1559
  "short_label": "Act+obj",
1560
- "origin": "additional_public_sample_tasks",
1561
  "metric_key": "macro_f1",
1562
  "metric_name": "macro-F1",
1563
  "metric_direction": "higher",
@@ -1648,7 +1648,7 @@
1648
  "label": "Future Object-Set Forecasting",
1649
  "axis_label": "17 Future Object-Set Forecasting",
1650
  "short_label": "Future obj",
1651
- "origin": "additional_public_sample_tasks",
1652
  "metric_key": "micro_f1",
1653
  "metric_name": "micro-F1",
1654
  "metric_direction": "higher",
@@ -1739,7 +1739,7 @@
1739
  "label": "IMU-to-Hand Pose Reconstruction",
1740
  "axis_label": "18 IMU-to-Hand Pose Reconstruction",
1741
  "short_label": "IMU->hand",
1742
- "origin": "additional_public_sample_tasks",
1743
  "metric_key": "mae",
1744
  "metric_name": "MAE",
1745
  "metric_direction": "lower",
@@ -1830,7 +1830,7 @@
1830
  "label": "Camera-View Synchronization Retrieval",
1831
  "axis_label": "19 Camera-View Synchronization Retrieval",
1832
  "short_label": "Cam sync",
1833
- "origin": "additional_public_sample_tasks",
1834
  "metric_key": "mrr",
1835
  "metric_name": "MRR",
1836
  "metric_direction": "higher",
@@ -1921,7 +1921,7 @@
1921
  "label": "Time-to-Next-Transition Regression",
1922
  "axis_label": "20 Time-to-Next-Transition Regression",
1923
  "short_label": "Time2bdry",
1924
- "origin": "additional_public_sample_tasks",
1925
  "metric_key": "mae",
1926
  "metric_name": "MAE frames",
1927
  "metric_direction": "lower",
 
1
  {
2
  "title": "128-Episode 20-Task Radar",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:20:34+00:00",
5
  "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3-Omni v6, Cosmos3-Super, and Cosmos3-Nano diagnostics. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
6
  "task_count": 20,
7
  "method_count": 7,
 
192
  "label": "Action Recognition",
193
  "axis_label": "01 Action Recognition",
194
  "short_label": "Action",
195
+ "provenance_source": "walkthrough_backed_task_contract",
196
  "metric_key": "macro_f1",
197
  "metric_name": "macro-F1",
198
  "metric_direction": "higher",
 
283
  "label": "Procedure Step Recognition",
284
  "axis_label": "02 Procedure Step Recognition",
285
  "short_label": "Step",
286
+ "provenance_source": "walkthrough_backed_task_contract",
287
  "metric_key": "macro_f1",
288
  "metric_name": "macro-F1",
289
  "metric_direction": "higher",
 
374
  "label": "Action Boundary Detection",
375
  "axis_label": "03 Action Boundary Detection",
376
  "short_label": "Boundary",
377
+ "provenance_source": "walkthrough_backed_task_contract",
378
  "metric_key": "macro_f1",
379
  "metric_name": "macro-F1",
380
  "metric_direction": "higher",
 
465
  "label": "Next-Action Prediction",
466
  "axis_label": "04 Next-Action Prediction",
467
  "short_label": "Next act",
468
+ "provenance_source": "walkthrough_backed_task_contract",
469
  "metric_key": "macro_f1",
470
  "metric_name": "macro-F1",
471
  "metric_direction": "higher",
 
556
  "label": "Hand Trajectory Forecasting",
557
  "axis_label": "05 Hand Trajectory Forecasting",
558
  "short_label": "Hand traj",
559
+ "provenance_source": "walkthrough_backed_task_contract",
560
  "metric_key": "mpjpe",
561
  "metric_name": "MPJPE",
562
  "metric_direction": "lower",
 
647
  "label": "Contact State Prediction",
648
  "axis_label": "06 Contact State Prediction",
649
  "short_label": "Contact",
650
+ "provenance_source": "walkthrough_backed_task_contract",
651
  "metric_key": "macro_f1",
652
  "metric_name": "macro-F1",
653
  "metric_direction": "higher",
 
738
  "label": "Object Relevance Prediction",
739
  "axis_label": "07 Object Relevance Prediction",
740
  "short_label": "Objects",
741
+ "provenance_source": "walkthrough_backed_task_contract",
742
  "metric_key": "micro_f1",
743
  "metric_name": "micro-F1",
744
  "metric_direction": "higher",
 
829
  "label": "Language Grounding",
830
  "axis_label": "08 Language Grounding",
831
  "short_label": "Language",
832
+ "provenance_source": "walkthrough_backed_task_contract",
833
  "metric_key": "mrr",
834
  "metric_name": "MRR",
835
  "metric_direction": "higher",
 
920
  "label": "Cross-Modal Retrieval",
921
  "axis_label": "09 Cross-Modal Retrieval",
922
  "short_label": "X-modal",
923
+ "provenance_source": "walkthrough_backed_task_contract",
924
  "metric_key": "mrr",
925
  "metric_name": "MRR",
926
  "metric_direction": "higher",
 
1011
  "label": "Cross-Modal Reconstruction",
1012
  "axis_label": "10 Cross-Modal Reconstruction",
1013
  "short_label": "Recon",
1014
+ "provenance_source": "walkthrough_backed_task_contract",
1015
  "metric_key": "r2",
1016
  "metric_name": "R2",
1017
  "metric_direction": "higher",
 
1102
  "label": "Temporal Order Verification",
1103
  "axis_label": "11 Temporal Order Verification",
1104
  "short_label": "Order",
1105
+ "provenance_source": "walkthrough_backed_task_contract",
1106
  "metric_key": "f1",
1107
  "metric_name": "F1",
1108
  "metric_direction": "higher",
 
1193
  "label": "Multimodal Synchronization Detection",
1194
  "axis_label": "12 Multimodal Synchronization Detection",
1195
  "short_label": "Sync",
1196
+ "provenance_source": "walkthrough_backed_task_contract",
1197
  "metric_key": "f1",
1198
  "metric_name": "F1",
1199
  "metric_direction": "higher",
 
1284
  "label": "Long-Horizon Next-Action Forecasting",
1285
  "axis_label": "13 Long-Horizon Next-Action Forecasting",
1286
  "short_label": "Long act",
1287
+ "provenance_source": "historical_result_bundle",
1288
  "metric_key": "macro_f1",
1289
  "metric_name": "macro-F1",
1290
  "metric_direction": "higher",
 
1375
  "label": "Long-Horizon Next-Subtask Forecasting",
1376
  "axis_label": "14 Long-Horizon Next-Subtask Forecasting",
1377
  "short_label": "Long step",
1378
+ "provenance_source": "historical_result_bundle",
1379
  "metric_key": "macro_f1",
1380
  "metric_name": "macro-F1",
1381
  "metric_direction": "higher",
 
1466
  "label": "Interaction Text Prediction",
1467
  "axis_label": "15 Interaction Text Prediction",
1468
  "short_label": "Interact txt",
1469
+ "provenance_source": "historical_result_bundle",
1470
  "metric_key": "macro_f1",
1471
  "metric_name": "macro-F1",
1472
  "metric_direction": "higher",
 
1557
  "label": "Action-Object Relation Prediction",
1558
  "axis_label": "16 Action-Object Relation Prediction",
1559
  "short_label": "Act+obj",
1560
+ "provenance_source": "historical_result_bundle",
1561
  "metric_key": "macro_f1",
1562
  "metric_name": "macro-F1",
1563
  "metric_direction": "higher",
 
1648
  "label": "Future Object-Set Forecasting",
1649
  "axis_label": "17 Future Object-Set Forecasting",
1650
  "short_label": "Future obj",
1651
+ "provenance_source": "historical_result_bundle",
1652
  "metric_key": "micro_f1",
1653
  "metric_name": "micro-F1",
1654
  "metric_direction": "higher",
 
1739
  "label": "IMU-to-Hand Pose Reconstruction",
1740
  "axis_label": "18 IMU-to-Hand Pose Reconstruction",
1741
  "short_label": "IMU->hand",
1742
+ "provenance_source": "historical_result_bundle",
1743
  "metric_key": "mae",
1744
  "metric_name": "MAE",
1745
  "metric_direction": "lower",
 
1830
  "label": "Camera-View Synchronization Retrieval",
1831
  "axis_label": "19 Camera-View Synchronization Retrieval",
1832
  "short_label": "Cam sync",
1833
+ "provenance_source": "historical_result_bundle",
1834
  "metric_key": "mrr",
1835
  "metric_name": "MRR",
1836
  "metric_direction": "higher",
 
1921
  "label": "Time-to-Next-Transition Regression",
1922
  "axis_label": "20 Time-to-Next-Transition Regression",
1923
  "short_label": "Time2bdry",
1924
+ "provenance_source": "historical_result_bundle",
1925
  "metric_key": "mae",
1926
  "metric_name": "MAE frames",
1927
  "metric_direction": "lower",
metrics/figure_index.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Figure Index",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T14:40:33+00:00",
5
  "scope": "Public figures, diagrams, charts, and derived modality thumbnails. Raw Xperience-10M videos, annotations, RRD files, and Qwen weights are excluded.",
6
  "figure_count": 29,
7
  "figures": [
@@ -60,12 +60,12 @@
60
  "id": "task_suite_infographic",
61
  "title": "Original task-suite infographic",
62
  "path": "docs/assets/task_suite_infographic.png",
63
- "role": "Primary visual map of the original task families, verified metrics, and sample modalities; the unified public suite is now documented as 20 tasks.",
64
  "source_script": "scripts/render_task_suite_infographic.py",
65
  "surface": "README, website, HF Space, artifact dataset, model card",
66
  "exists": true,
67
- "bytes": 1903454,
68
- "sha256": "6667eb856cf61ada9f868807b5d5c6ccde06e4f791b2f9dd567d98b71b307415",
69
  "dimensions": {
70
  "format": "PNG",
71
  "width": 1800,
@@ -162,7 +162,7 @@
162
  "id": "task_architectures",
163
  "title": "Minimal and neural task architecture map",
164
  "path": "docs/assets/task_architectures.png",
165
- "role": "Minimal and neural heads for the original task contracts and shared feature contracts.",
166
  "source_script": "scripts/render_overview_figures.py",
167
  "surface": "README, website, HF artifact dataset, model card",
168
  "exists": true,
@@ -392,8 +392,8 @@
392
  "source_script": "scripts/tier2_task_suite.py",
393
  "surface": "website unified task section, README, HF mirrors",
394
  "exists": true,
395
- "bytes": 5437,
396
- "sha256": "3e35e476f559cd6188e5417e4d28c25efc130abafc9cab2d941bc77d559177a1",
397
  "dimensions": {
398
  "format": "SVG",
399
  "width": 1440,
 
1
  {
2
  "title": "Ropedia Xperience-10M Figure Index",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:19:00+00:00",
5
  "scope": "Public figures, diagrams, charts, and derived modality thumbnails. Raw Xperience-10M videos, annotations, RRD files, and Qwen weights are excluded.",
6
  "figure_count": 29,
7
  "figures": [
 
60
  "id": "task_suite_infographic",
61
  "title": "Original task-suite infographic",
62
  "path": "docs/assets/task_suite_infographic.png",
63
+ "role": "Primary visual map of the walkthrough-backed task families, verified metrics, and sample modalities; the unified public suite is documented as 20 tasks.",
64
  "source_script": "scripts/render_task_suite_infographic.py",
65
  "surface": "README, website, HF Space, artifact dataset, model card",
66
  "exists": true,
67
+ "bytes": 1897278,
68
+ "sha256": "71b1ab150e952cf902488226c65b3822d8016974f63d111204c1eb1a7745faad",
69
  "dimensions": {
70
  "format": "PNG",
71
  "width": 1800,
 
162
  "id": "task_architectures",
163
  "title": "Minimal and neural task architecture map",
164
  "path": "docs/assets/task_architectures.png",
165
+ "role": "Minimal and neural heads for the walkthrough-backed task contracts and shared feature contracts.",
166
  "source_script": "scripts/render_overview_figures.py",
167
  "surface": "README, website, HF artifact dataset, model card",
168
  "exists": true,
 
392
  "source_script": "scripts/tier2_task_suite.py",
393
  "surface": "website unified task section, README, HF mirrors",
394
  "exists": true,
395
+ "bytes": 5453,
396
+ "sha256": "e9da29c57f42b29a7a05622fee1335089ac2b6fc9692a3b49fa5b753904db9dc",
397
  "dimensions": {
398
  "format": "SVG",
399
  "width": 1440,
metrics/live_publication_status.json CHANGED
The diff for this file is too large to render. See raw diff
 
metrics/omni_model_comparison.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
- "generated_at_utc": "2026-06-21T10:47:04+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
7
  "comparison_rule": "Compare only rows with the same scope and target. Single-episode raw-feature metrics, 128-episode metadata baselines, Qwen3 structured JSON metrics, and the two Cosmos3 targets answer different questions: Nano future-window retrieval versus Super structured JSON Reasoner evaluation.",
8
  "version_reading_notes": [
9
- "Version 1 is the public-sample 20-task surface: original core heads, tasks 13-20, and the 180-row method-task matrix.",
10
  "Version 2 is the selected 128-episode same-split simple/NN baseline alignment.",
11
  "The selected-128 model-diagnostic group contains the current Qwen3-Omni LoRA JSON-task row, Cosmos3-Nano future-window compatibility result, Cosmos3-Super Reasoner base-weight JSON-task evaluation, and the separate Cosmos3-Super Forward-Dynamics LoRA adapter artifact."
12
  ],
 
1
  {
2
  "title": "Ropedia Xperience-10M Current Result Versions and Model Groups",
3
+ "generated_at_utc": "2026-06-21T15:17:00+00:00",
4
  "status": "pass",
5
  "version_count": 3,
6
  "model_group_count": 5,
7
  "comparison_rule": "Compare only rows with the same scope and target. Single-episode raw-feature metrics, 128-episode metadata baselines, Qwen3 structured JSON metrics, and the two Cosmos3 targets answer different questions: Nano future-window retrieval versus Super structured JSON Reasoner evaluation.",
8
  "version_reading_notes": [
9
+ "Version 1 is the public-sample 20-task surface: unified task heads, historical provenance rows, and the 180-row method-task matrix.",
10
  "Version 2 is the selected 128-episode same-split simple/NN baseline alignment.",
11
  "The selected-128 model-diagnostic group contains the current Qwen3-Omni LoRA JSON-task row, Cosmos3-Nano future-window compatibility result, Cosmos3-Super Reasoner base-weight JSON-task evaluation, and the separate Cosmos3-Super Forward-Dynamics LoRA adapter artifact."
12
  ],
metrics/project_brief.json CHANGED
@@ -52,7 +52,7 @@
52
  "Open EVALUATION_PROTOCOL.md before comparing task scores.",
53
  "Use RESEARCH_TAKEAWAYS.md for the current metric interpretation.",
54
  "Inspect results/episode_task_suite/feature_manifest.json to understand one model input.",
55
- "Use TASK_SUITE_20.md and docs/data/task_suite_20.json to read the unified 20-task suite; the historical docs/data/tier2_task_suite.json path stores the tasks 13-20 result bundle.",
56
  "Use docs/data/omni_finetune_verified_result.json for the current multi-episode Qwen3-Omni pilot result."
57
  ],
58
  "scope_boundary": "The public sample is enough to build and verify task definitions, feature contracts, metrics, visualization, and baseline code. The final multi-episode Qwen3-Omni diagnostic result verifies the training loop and strict-JSON output reliability, but does not yet show strong action/subtask model quality.",
 
52
  "Open EVALUATION_PROTOCOL.md before comparing task scores.",
53
  "Use RESEARCH_TAKEAWAYS.md for the current metric interpretation.",
54
  "Inspect results/episode_task_suite/feature_manifest.json to understand one model input.",
55
+ "Use TASK_SUITE_20.md and docs/data/task_suite_20.json to read the unified 20-task suite; the historical docs/data/tier2_task_suite.json path is retained only for provenance inside that suite.",
56
  "Use docs/data/omni_finetune_verified_result.json for the current multi-episode Qwen3-Omni pilot result."
57
  ],
58
  "scope_boundary": "The public sample is enough to build and verify task definitions, feature contracts, metrics, visualization, and baseline code. The final multi-episode Qwen3-Omni diagnostic result verifies the training loop and strict-JSON output reliability, but does not yet show strong action/subtask model quality.",
metrics/project_packet.json CHANGED
@@ -15,9 +15,8 @@
15
  "cosmos3_super_forward_dynamics_lora_status": "The first Cosmos3-Super fine-tuned adapter branch is verified as a forward-dynamics LoRA over camera-pose proxy targets; it reports loss metrics, not JSON action-label accuracy.",
16
  "task_suite_enhancement_128_status": "Current no-new-episode enhancement pack recommends multiscale_20s10_40s20_80s40, hierarchical action/subtask targets, label-normalized scoring, and raw-feature shards before adding more episodes.",
17
  "task_count": 20,
18
- "original_public_sample_task_count": 12,
19
- "additional_public_sample_task_count": 8,
20
- "legacy_tasks_13_to_20_result_path": "docs/data/tier2_task_suite.json"
21
  },
22
  "reading_path": [
23
  {
@@ -110,7 +109,7 @@
110
  "results/episode_task_suite/neural_mlp/",
111
  "docs/data/summary_metrics.json"
112
  ],
113
- "readout": "The unified suite has 20 task contracts; tasks 1-12 have walkthroughs and neural MLP heads, and tasks 13-20 have aligned minimal/neural result bundles under the historical tier2_task_suite path."
114
  },
115
  {
116
  "step": 8,
 
15
  "cosmos3_super_forward_dynamics_lora_status": "The first Cosmos3-Super fine-tuned adapter branch is verified as a forward-dynamics LoRA over camera-pose proxy targets; it reports loss metrics, not JSON action-label accuracy.",
16
  "task_suite_enhancement_128_status": "Current no-new-episode enhancement pack recommends multiscale_20s10_40s20_80s40, hierarchical action/subtask targets, label-normalized scoring, and raw-feature shards before adding more episodes.",
17
  "task_count": 20,
18
+ "task_surface_framing": "unified_20_task_suite",
19
+ "legacy_provenance_result_path": "docs/data/tier2_task_suite.json"
 
20
  },
21
  "reading_path": [
22
  {
 
109
  "results/episode_task_suite/neural_mlp/",
110
  "docs/data/summary_metrics.json"
111
  ],
112
+ "readout": "The unified suite has 20 task contracts in one task surface. Walkthrough-backed tasks, aligned minimal/neural result bundles, and historical tier2_task_suite provenance paths are all linked from TASK_SUITE_20.md and docs/data/task_suite_20.json."
113
  },
114
  {
115
  "step": 8,
metrics/public_surface_qa.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-21T14:46:49+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
@@ -33,12 +33,12 @@
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
- "generated_at_utc": "2026-06-21T13:32:47+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
- "generated_at_utc": "2026-06-21T13:32:50+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
@@ -48,7 +48,7 @@
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
- "generated_at_utc": "2026-06-21T14:13:08+00:00"
52
  }
53
  },
54
  "failures": {}
@@ -96,7 +96,7 @@
96
  "reason": "Public copy should consistently present the project as Ropedia Xperience-10M, with the Qwen3-Omni scale-up status.",
97
  "marker_counts": {
98
  "Ropedia Xperience-10M Task Suite": 20,
99
- "Xperience-10M": 167,
100
  "20-task": 100,
101
  "Qwen3-Omni": 245,
102
  "128-episode pilot": 1
@@ -137,11 +137,11 @@
137
  "data/unified_task_model_radar.json": 21,
138
  "data/single_episode_task_model_radar.json": 17,
139
  "data/episode128_task_model_radar.json": 16,
140
- "data/task_method_20_result_matrix.json": 24,
141
  "data/task_method_20_gap_audit.json": 23,
142
  "data/language_versions.json": 3,
143
  "assets/charts/two_evidence_line_map.svg": 5,
144
- "assets/charts/unified_task_model_radar.svg": 17,
145
  "assets/charts/single_episode_task_model_radar.svg": 19,
146
  "assets/charts/episode128_task_model_radar.svg": 19,
147
  "data/tier2_task_suite.json": 11
 
1
  {
2
  "title": "Ropedia Xperience-10M Public Project Surface",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:21:42+00:00",
5
  "scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
6
  "checks": [
7
  {
 
33
  "source_alignment": {
34
  "exists": true,
35
  "status": "pass",
36
+ "generated_at_utc": "2026-06-21T14:46:49+00:00"
37
  },
38
  "scale_up_status": {
39
  "exists": true,
40
  "status": "pass",
41
+ "generated_at_utc": "2026-06-21T14:47:03+00:00"
42
  },
43
  "publication_package": {
44
  "exists": true,
 
48
  "mirror_parity": {
49
  "exists": true,
50
  "status": "pass",
51
+ "generated_at_utc": "2026-06-21T14:53:27+00:00"
52
  }
53
  },
54
  "failures": {}
 
96
  "reason": "Public copy should consistently present the project as Ropedia Xperience-10M, with the Qwen3-Omni scale-up status.",
97
  "marker_counts": {
98
  "Ropedia Xperience-10M Task Suite": 20,
99
+ "Xperience-10M": 166,
100
  "20-task": 100,
101
  "Qwen3-Omni": 245,
102
  "128-episode pilot": 1
 
137
  "data/unified_task_model_radar.json": 21,
138
  "data/single_episode_task_model_radar.json": 17,
139
  "data/episode128_task_model_radar.json": 16,
140
+ "data/task_method_20_result_matrix.json": 25,
141
  "data/task_method_20_gap_audit.json": 23,
142
  "data/language_versions.json": 3,
143
  "assets/charts/two_evidence_line_map.svg": 5,
144
+ "assets/charts/unified_task_model_radar.svg": 18,
145
  "assets/charts/single_episode_task_model_radar.svg": 19,
146
  "assets/charts/episode128_task_model_radar.svg": 19,
147
  "data/tier2_task_suite.json": 11
metrics/reproducibility_matrix.json CHANGED
@@ -39,7 +39,7 @@
39
  "id": "original_task_suite",
40
  "status": "reproducible",
41
  "command": "python scripts/episode_task_suite.py --workspace $WORKSPACE --include-neural",
42
- "expected": "original task metrics, predictions, manifests, and neural_mlp task-head artifacts",
43
  "boundary": "8,546-dimensional multimodal window contract"
44
  },
45
  {
@@ -50,11 +50,11 @@
50
  "boundary": "single-episode probes, not full research-direction solutions"
51
  },
52
  {
53
- "id": "tasks_13_to_20_and_unified_index",
54
  "status": "reproducible",
55
  "command": "python scripts/tier2_task_suite.py && python scripts/build_unified_task_suite.py && python scripts/build_unified_task_model_radar.py",
56
- "expected": "tasks 13-20 metrics, prediction/rank artifacts, TASK_SUITE_20.md, docs/data/task_suite_20.json, docs/data/tier2_task_suite.json, docs/assets/charts/tier2_task_suite.svg, docs/data/unified_task_model_radar.json, and docs/assets/charts/unified_task_model_radar.svg",
57
- "boundary": "requires local public-sample annotation.hdf5 plus HOMIE Toolkit or h5py for tasks 13-20; raw HDF5 and MP4 files are not redistributed"
58
  },
59
  {
60
  "id": "source_alignment_audit",
 
39
  "id": "original_task_suite",
40
  "status": "reproducible",
41
  "command": "python scripts/episode_task_suite.py --workspace $WORKSPACE --include-neural",
42
+ "expected": "walkthrough-backed task metrics, predictions, manifests, and neural_mlp task-head artifacts",
43
  "boundary": "8,546-dimensional multimodal window contract"
44
  },
45
  {
 
50
  "boundary": "single-episode probes, not full research-direction solutions"
51
  },
52
  {
53
+ "id": "unified_20_task_index",
54
  "status": "reproducible",
55
  "command": "python scripts/tier2_task_suite.py && python scripts/build_unified_task_suite.py && python scripts/build_unified_task_model_radar.py",
56
+ "expected": "unified 20-task metrics, prediction/rank artifacts, TASK_SUITE_20.md, docs/data/task_suite_20.json, docs/data/tier2_task_suite.json, docs/assets/charts/tier2_task_suite.svg, docs/data/unified_task_model_radar.json, and docs/assets/charts/unified_task_model_radar.svg",
57
+ "boundary": "requires local public-sample annotation.hdf5 plus HOMIE Toolkit or h5py for full public-task regeneration; raw HDF5 and MP4 files are not redistributed"
58
  },
59
  {
60
  "id": "source_alignment_audit",
metrics/research_takeaways.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "title": "Ropedia Xperience-10M Research Takeaways",
3
  "status": "pass",
4
- "generated_at_utc": "2026-06-20T21:27:21+00:00",
5
  "source_files": [
6
  "docs/data/summary_metrics.json",
7
  "results/episode_task_suite/summary_report.json",
@@ -133,7 +133,7 @@
133
  {
134
  "id": "audio_contribution_is_task_specific",
135
  "title": "Audio helps some tasks and hurts others on the public sample",
136
- "readout": "Audio improves the primary metric on 6 of the original task contracts, while raw log-mel replacement improves over the current handcrafted block on 6 of those contracts. The largest current-audio gain appears in feature reconstruction, not in action classification.",
137
  "evidence": [
138
  {
139
  "label": "tasks_where_current_audio_improves",
 
1
  {
2
  "title": "Ropedia Xperience-10M Research Takeaways",
3
  "status": "pass",
4
+ "generated_at_utc": "2026-06-21T15:18:59+00:00",
5
  "source_files": [
6
  "docs/data/summary_metrics.json",
7
  "results/episode_task_suite/summary_report.json",
 
133
  {
134
  "id": "audio_contribution_is_task_specific",
135
  "title": "Audio helps some tasks and hurts others on the public sample",
136
+ "readout": "Audio improves the primary metric on 6 walkthrough-backed task contracts, while raw log-mel replacement improves over the current handcrafted block on 6 of those contracts. The largest current-audio gain appears in feature reconstruction, not in action classification.",
137
  "evidence": [
138
  {
139
  "label": "tasks_where_current_audio_improves",
metrics/task_method_20_gap_audit.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "generated_at_utc": "2026-06-21T08:38:20+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
 
1
  {
2
+ "generated_at_utc": "2026-06-21T15:21:42+00:00",
3
  "immediate_actions": [
4
  {
5
  "artifact": "docs/data/task_method_20_gap_audit.json",
metrics/task_surface_integrity.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-21T14:45:00+00:00",
4
  "summary": {
5
  "original_walkthrough_task_count": 12,
6
  "expected_original_walkthrough_task_count": 12,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-21T15:21:55+00:00",
4
  "summary": {
5
  "original_walkthrough_task_count": 12,
6
  "expected_original_walkthrough_task_count": 12,