Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Add files using upload-large-folder tool
Browse files- .gitattributes +3 -0
- assets/foundation-pipelines/README.md +16 -0
- assets/foundation-pipelines/human-video-world-model-pipeline.png +3 -0
- assets/foundation-pipelines/prompts.md +39 -0
- assets/foundation-pipelines/spatial-intelligence-pipeline.png +3 -0
- assets/foundation-pipelines/vision-language-action-pipeline.png +3 -0
- docs/data/artifact_index.json +57 -23
- docs/data/mirror_parity.json +106 -106
- docs/data/public_surface_qa.json +6 -6
- docs/data/publication_audit.json +8 -5
- docs/data/single_episode_task_model_radar.json +1 -1
- docs/data/source_alignment_audit.json +1 -1
- docs/data/task_method_20_gap_audit.json +1 -1
- docs/data/task_method_20_result_matrix.json +1 -1
- docs/data/task_surface_integrity.json +1 -1
- docs/data/three_foundation_pipelines.json +16 -0
- docs/data/unified_task_model_radar.json +1 -1
- docs/data/website_integrity.json +39 -15
- scripts/omni/collect_qwen3_future_task_probe_results.sh +9 -7
- scripts/omni/eval_qwen3_omni_future_task_probes.py +195 -14
.gitattributes
CHANGED
|
@@ -59,3 +59,6 @@ assets/raw-sample-preview/fisheye_cam1_preview.mp4 filter=lfs diff=lfs merge=lfs
|
|
| 59 |
assets/raw-sample-preview/stereo_right_preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 60 |
assets/raw-sample-preview/stereo_left_preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 61 |
assets/raw-sample-preview/fisheye_cam2_preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
assets/raw-sample-preview/stereo_right_preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 60 |
assets/raw-sample-preview/stereo_left_preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 61 |
assets/raw-sample-preview/fisheye_cam2_preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 62 |
+
assets/foundation-pipelines/human-video-world-model-pipeline.png filter=lfs diff=lfs merge=lfs -text
|
| 63 |
+
assets/foundation-pipelines/vision-language-action-pipeline.png filter=lfs diff=lfs merge=lfs -text
|
| 64 |
+
assets/foundation-pipelines/spatial-intelligence-pipeline.png filter=lfs diff=lfs merge=lfs -text
|
assets/foundation-pipelines/README.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Foundation Pipeline Placeholder Figures
|
| 2 |
+
|
| 3 |
+
These three bitmap figures are ChatGPT image-generated placeholder visuals for
|
| 4 |
+
the foundation pipeline tracks documented in `THREE_FOUNDATION_PIPELINES.md`
|
| 5 |
+
and `docs/data/three_foundation_pipelines.json`.
|
| 6 |
+
|
| 7 |
+
They are **pipeline placeholders**, not evidence of completed foundation-model
|
| 8 |
+
training. Exact technical claims live in the surrounding Markdown, JSON, and
|
| 9 |
+
website labels.
|
| 10 |
+
|
| 11 |
+
| Track | Asset |
|
| 12 |
+
| --- | --- |
|
| 13 |
+
| Spatial intelligence models | `spatial-intelligence-pipeline.png` |
|
| 14 |
+
| Human-video world models | `human-video-world-model-pipeline.png` |
|
| 15 |
+
| Vision-language-action models | `vision-language-action-pipeline.png` |
|
| 16 |
+
|
assets/foundation-pipelines/human-video-world-model-pipeline.png
ADDED
|
Git LFS Details
|
assets/foundation-pipelines/prompts.md
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ChatGPT Image Prompts
|
| 2 |
+
|
| 3 |
+
## Spatial Intelligence
|
| 4 |
+
|
| 5 |
+
Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
|
| 6 |
+
Xperience-10M foundation pipeline track. Create a polished text-free diagram
|
| 7 |
+
image for a spatial intelligence model training pipeline. Show multi-view video
|
| 8 |
+
frames and depth/pose streams flowing into a scene-object memory module, then
|
| 9 |
+
spatial reasoning outputs like 3D structure, object permanence, counting, and
|
| 10 |
+
question answering. Use a premium dark research-product presentation style,
|
| 11 |
+
high contrast, crisp geometric panels, subtle neon green/cyan/white accents,
|
| 12 |
+
clean technical linework, no decorative blobs, no logos, no readable text, no
|
| 13 |
+
watermark.
|
| 14 |
+
|
| 15 |
+
## Human-Video World Models
|
| 16 |
+
|
| 17 |
+
Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
|
| 18 |
+
Xperience-10M foundation pipeline track. Create a polished text-free diagram
|
| 19 |
+
image for a human-video world model training pipeline. Show observed egocentric
|
| 20 |
+
video/audio/sensor windows flowing into a latent world-state model, then
|
| 21 |
+
predicted future frames, future action bars, object/contact state changes, and
|
| 22 |
+
uncertainty bands. Use a premium dark research-product presentation style,
|
| 23 |
+
high contrast, crisp geometric panels, subtle neon green/teal/white accents
|
| 24 |
+
with small amber highlights, clean technical linework, no decorative blobs, no
|
| 25 |
+
logos, no readable text, no watermark.
|
| 26 |
+
|
| 27 |
+
## Vision-Language-Action
|
| 28 |
+
|
| 29 |
+
Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
|
| 30 |
+
Xperience-10M foundation pipeline track. Create a polished text-free diagram
|
| 31 |
+
image for a vision-language-action model training pipeline. Show egocentric
|
| 32 |
+
video frames, language caption tokens, hand/body motion traces, object/contact
|
| 33 |
+
cues, and procedure labels flowing into a multimodal action policy module, then
|
| 34 |
+
predicted action chunks, hand trajectory curves, contact decisions, and policy
|
| 35 |
+
evaluation panels. Use a premium dark research-product presentation style,
|
| 36 |
+
high contrast, crisp geometric panels, subtle neon green/cyan/white accents
|
| 37 |
+
with small magenta highlights, clean technical linework, no decorative blobs,
|
| 38 |
+
no logos, no readable text, no watermark.
|
| 39 |
+
|
assets/foundation-pipelines/spatial-intelligence-pipeline.png
ADDED
|
Git LFS Details
|
assets/foundation-pipelines/vision-language-action-pipeline.png
ADDED
|
Git LFS Details
|
docs/data/artifact_index.json
CHANGED
|
@@ -1,11 +1,12 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"status": "pass",
|
| 5 |
-
"artifact_count":
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
"project_path": 16,
|
|
|
|
| 9 |
"scaleup_contract": 7,
|
| 10 |
"scaleup_status": 44,
|
| 11 |
"publication_workflow": 6,
|
|
@@ -134,8 +135,8 @@
|
|
| 134 |
"surface": "repo_hf",
|
| 135 |
"shows": "Frames spatial intelligence, human-video world modeling, and vision-language-action as three pipeline tracks with explicit inputs, outputs, maturity, and next evidence gates.",
|
| 136 |
"exists": true,
|
| 137 |
-
"bytes":
|
| 138 |
-
"sha256": "
|
| 139 |
},
|
| 140 |
{
|
| 141 |
"id": "three_foundation_pipelines_json",
|
|
@@ -145,8 +146,41 @@
|
|
| 145 |
"surface": "website_hf",
|
| 146 |
"shows": "Machine-readable pipeline-track contract for the website and Hugging Face mirrors.",
|
| 147 |
"exists": true,
|
| 148 |
-
"bytes":
|
| 149 |
-
"sha256": "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
},
|
| 151 |
{
|
| 152 |
"id": "omni_model_extension_contract",
|
|
@@ -487,7 +521,7 @@
|
|
| 487 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 488 |
"exists": true,
|
| 489 |
"bytes": 4432,
|
| 490 |
-
"sha256": "
|
| 491 |
},
|
| 492 |
{
|
| 493 |
"id": "source_alignment_validator",
|
|
@@ -608,7 +642,7 @@
|
|
| 608 |
"shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
|
| 609 |
"exists": true,
|
| 610 |
"bytes": 231240,
|
| 611 |
-
"sha256": "
|
| 612 |
},
|
| 613 |
{
|
| 614 |
"id": "single_episode_task_model_radar_json",
|
|
@@ -619,7 +653,7 @@
|
|
| 619 |
"shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
|
| 620 |
"exists": true,
|
| 621 |
"bytes": 50973,
|
| 622 |
-
"sha256": "
|
| 623 |
},
|
| 624 |
{
|
| 625 |
"id": "episode128_task_model_radar_json",
|
|
@@ -630,7 +664,7 @@
|
|
| 630 |
"shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
|
| 631 |
"exists": true,
|
| 632 |
"bytes": 187388,
|
| 633 |
-
"sha256": "
|
| 634 |
},
|
| 635 |
{
|
| 636 |
"id": "task_method_20_result_matrix_json",
|
|
@@ -641,7 +675,7 @@
|
|
| 641 |
"shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
|
| 642 |
"exists": true,
|
| 643 |
"bytes": 129749,
|
| 644 |
-
"sha256": "
|
| 645 |
},
|
| 646 |
{
|
| 647 |
"id": "task_method_20_result_matrix",
|
|
@@ -663,7 +697,7 @@
|
|
| 663 |
"shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
|
| 664 |
"exists": true,
|
| 665 |
"bytes": 55745,
|
| 666 |
-
"sha256": "
|
| 667 |
},
|
| 668 |
{
|
| 669 |
"id": "task_method_20_gap_audit",
|
|
@@ -674,7 +708,7 @@
|
|
| 674 |
"shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
|
| 675 |
"exists": true,
|
| 676 |
"bytes": 15690,
|
| 677 |
-
"sha256": "
|
| 678 |
},
|
| 679 |
{
|
| 680 |
"id": "unified_task_model_radar_chart",
|
|
@@ -717,8 +751,8 @@
|
|
| 717 |
"surface": "repo_hf",
|
| 718 |
"shows": "Regenerates the direction-aware radar chart and machine-readable metric overlay JSON.",
|
| 719 |
"exists": true,
|
| 720 |
-
"bytes":
|
| 721 |
-
"sha256": "
|
| 722 |
},
|
| 723 |
{
|
| 724 |
"id": "task_method_20_gap_audit_builder",
|
|
@@ -926,8 +960,8 @@
|
|
| 926 |
"surface": "repo_hf",
|
| 927 |
"shows": "Regenerates visual-asset hashes, dimensions, and source-script provenance.",
|
| 928 |
"exists": true,
|
| 929 |
-
"bytes":
|
| 930 |
-
"sha256": "
|
| 931 |
},
|
| 932 |
{
|
| 933 |
"id": "brand_assets_json",
|
|
@@ -1107,8 +1141,8 @@
|
|
| 1107 |
"surface": "repo",
|
| 1108 |
"shows": "Fetches the published GitHub/HF URLs and compares live hashes and public-card markers against the release assets.",
|
| 1109 |
"exists": true,
|
| 1110 |
-
"bytes":
|
| 1111 |
-
"sha256": "
|
| 1112 |
},
|
| 1113 |
{
|
| 1114 |
"id": "reproducibility_contract",
|
|
@@ -1140,8 +1174,8 @@
|
|
| 1140 |
"surface": "repo_hf",
|
| 1141 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 1142 |
"exists": true,
|
| 1143 |
-
"bytes":
|
| 1144 |
-
"sha256": "
|
| 1145 |
},
|
| 1146 |
{
|
| 1147 |
"id": "publication_audit",
|
|
@@ -1176,7 +1210,7 @@
|
|
| 1176 |
"volatile": true,
|
| 1177 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 1178 |
"exists": true,
|
| 1179 |
-
"bytes":
|
| 1180 |
"hash_policy": "existence_and_size_only"
|
| 1181 |
},
|
| 1182 |
{
|
|
@@ -1188,7 +1222,7 @@
|
|
| 1188 |
"volatile": true,
|
| 1189 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 1190 |
"exists": true,
|
| 1191 |
-
"bytes":
|
| 1192 |
"hash_policy": "existence_and_size_only"
|
| 1193 |
},
|
| 1194 |
{
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Task Suite Artifact Index",
|
| 3 |
+
"generated_at_utc": "2026-06-17T15:16:18+00:00",
|
| 4 |
"status": "pass",
|
| 5 |
+
"artifact_count": 204,
|
| 6 |
"missing": [],
|
| 7 |
"by_kind": {
|
| 8 |
"project_path": 16,
|
| 9 |
+
"visual_asset": 3,
|
| 10 |
"scaleup_contract": 7,
|
| 11 |
"scaleup_status": 44,
|
| 12 |
"publication_workflow": 6,
|
|
|
|
| 135 |
"surface": "repo_hf",
|
| 136 |
"shows": "Frames spatial intelligence, human-video world modeling, and vision-language-action as three pipeline tracks with explicit inputs, outputs, maturity, and next evidence gates.",
|
| 137 |
"exists": true,
|
| 138 |
+
"bytes": 7437,
|
| 139 |
+
"sha256": "281a6349a7fd141460d7f911f0d80a841a38c99456363d1ffd6372cd94ca14b0"
|
| 140 |
},
|
| 141 |
{
|
| 142 |
"id": "three_foundation_pipelines_json",
|
|
|
|
| 146 |
"surface": "website_hf",
|
| 147 |
"shows": "Machine-readable pipeline-track contract for the website and Hugging Face mirrors.",
|
| 148 |
"exists": true,
|
| 149 |
+
"bytes": 6518,
|
| 150 |
+
"sha256": "e337901e7ddd2f8845987d4c41d9362e5fc780d3cb0659494576b7a0da53fb49"
|
| 151 |
+
},
|
| 152 |
+
{
|
| 153 |
+
"id": "spatial_intelligence_pipeline_placeholder",
|
| 154 |
+
"title": "Spatial intelligence pipeline placeholder",
|
| 155 |
+
"path": "docs/assets/foundation-pipelines/spatial-intelligence-pipeline.png",
|
| 156 |
+
"kind": "visual_asset",
|
| 157 |
+
"surface": "website_hf",
|
| 158 |
+
"shows": "ChatGPT image-generated placeholder visual for the spatial intelligence model training pipeline.",
|
| 159 |
+
"exists": true,
|
| 160 |
+
"bytes": 2337155,
|
| 161 |
+
"sha256": "ca98e2f5171497f6b97627ee8d0dee68f4aa929a2ba205e8b8e64e89f7f66f06"
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"id": "human_video_world_model_pipeline_placeholder",
|
| 165 |
+
"title": "Human-video world model pipeline placeholder",
|
| 166 |
+
"path": "docs/assets/foundation-pipelines/human-video-world-model-pipeline.png",
|
| 167 |
+
"kind": "visual_asset",
|
| 168 |
+
"surface": "website_hf",
|
| 169 |
+
"shows": "ChatGPT image-generated placeholder visual for the human-video world-model training pipeline.",
|
| 170 |
+
"exists": true,
|
| 171 |
+
"bytes": 2356312,
|
| 172 |
+
"sha256": "cee2d717a97b88d8f5bae3e58fe202791a4f3073e488cb666acb0214117b735b"
|
| 173 |
+
},
|
| 174 |
+
{
|
| 175 |
+
"id": "vision_language_action_pipeline_placeholder",
|
| 176 |
+
"title": "Vision-language-action pipeline placeholder",
|
| 177 |
+
"path": "docs/assets/foundation-pipelines/vision-language-action-pipeline.png",
|
| 178 |
+
"kind": "visual_asset",
|
| 179 |
+
"surface": "website_hf",
|
| 180 |
+
"shows": "ChatGPT image-generated placeholder visual for the vision-language-action training pipeline.",
|
| 181 |
+
"exists": true,
|
| 182 |
+
"bytes": 2421011,
|
| 183 |
+
"sha256": "f8554f7df26ab79fef348740ce45ac3da032cb4085c490d62910ad1147dd1ecf"
|
| 184 |
},
|
| 185 |
{
|
| 186 |
"id": "omni_model_extension_contract",
|
|
|
|
| 521 |
"shows": "Machine-readable source-alignment pass/fail check for repo, website, and HF surfaces.",
|
| 522 |
"exists": true,
|
| 523 |
"bytes": 4432,
|
| 524 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 525 |
},
|
| 526 |
{
|
| 527 |
"id": "source_alignment_validator",
|
|
|
|
| 642 |
"shows": "Stores normalized 20-axis radar values, raw task metrics, Qwen3/Cosmos overlay mappings, branch-card caveats, and explicit scoreless status records.",
|
| 643 |
"exists": true,
|
| 644 |
"bytes": 231240,
|
| 645 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 646 |
},
|
| 647 |
{
|
| 648 |
"id": "single_episode_task_model_radar_json",
|
|
|
|
| 653 |
"shows": "Machine-readable split radar for the one-episode Minimal and Neural MLP baselines, both scored on all 20 task contracts.",
|
| 654 |
"exists": true,
|
| 655 |
"bytes": 50973,
|
| 656 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 657 |
},
|
| 658 |
{
|
| 659 |
"id": "episode128_task_model_radar_json",
|
|
|
|
| 664 |
"shows": "Machine-readable split radar for selected 128-episode metadata/raw baselines and verified Qwen3/Cosmos branches, preserving explicit scoreless cells.",
|
| 665 |
"exists": true,
|
| 666 |
"bytes": 187388,
|
| 667 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 668 |
},
|
| 669 |
{
|
| 670 |
"id": "task_method_20_result_matrix_json",
|
|
|
|
| 675 |
"shows": "Machine-readable 9-method by 20-task matrix where every method has 20 records and scoreless cells carry unsupported/not-evaluated reasons.",
|
| 676 |
"exists": true,
|
| 677 |
"bytes": 129749,
|
| 678 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 679 |
},
|
| 680 |
{
|
| 681 |
"id": "task_method_20_result_matrix",
|
|
|
|
| 697 |
"shows": "Machine-readable 180-record gap ledger with numeric scores, scoreless cells, explicit status reasons, and next evidence needed before new scores can be published.",
|
| 698 |
"exists": true,
|
| 699 |
"bytes": 55745,
|
| 700 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 701 |
},
|
| 702 |
{
|
| 703 |
"id": "task_method_20_gap_audit",
|
|
|
|
| 708 |
"shows": "Reader-facing ledger that lists every scoreless method-task cell and the concrete target or model-output evidence required before it can become numeric.",
|
| 709 |
"exists": true,
|
| 710 |
"bytes": 15690,
|
| 711 |
+
"sha256": "bec8510557fee7505f68d697590eefdcaad96d70d9d9b201fab7a9bdc361a2ac"
|
| 712 |
},
|
| 713 |
{
|
| 714 |
"id": "unified_task_model_radar_chart",
|
|
|
|
| 751 |
"surface": "repo_hf",
|
| 752 |
"shows": "Regenerates the direction-aware radar chart and machine-readable metric overlay JSON.",
|
| 753 |
"exists": true,
|
| 754 |
+
"bytes": 51243,
|
| 755 |
+
"sha256": "e0f995a01e8589a7f819dc5b766156c26e8b14e4db9c3c0c5e08be7a29b4de56"
|
| 756 |
},
|
| 757 |
{
|
| 758 |
"id": "task_method_20_gap_audit_builder",
|
|
|
|
| 960 |
"surface": "repo_hf",
|
| 961 |
"shows": "Regenerates visual-asset hashes, dimensions, and source-script provenance.",
|
| 962 |
"exists": true,
|
| 963 |
+
"bytes": 16864,
|
| 964 |
+
"sha256": "df362654a5c65d7adedc924f2af93e1fbd248fa861f20aea1576473daaeb0b0d"
|
| 965 |
},
|
| 966 |
{
|
| 967 |
"id": "brand_assets_json",
|
|
|
|
| 1141 |
"surface": "repo",
|
| 1142 |
"shows": "Fetches the published GitHub/HF URLs and compares live hashes and public-card markers against the release assets.",
|
| 1143 |
"exists": true,
|
| 1144 |
+
"bytes": 60253,
|
| 1145 |
+
"sha256": "ad4b408e9e19339285e37e0c47bffac6a450ddd1a439bf11ab80a90cec27b1fb"
|
| 1146 |
},
|
| 1147 |
{
|
| 1148 |
"id": "reproducibility_contract",
|
|
|
|
| 1174 |
"surface": "repo_hf",
|
| 1175 |
"shows": "Generates the selective artifact catalog from local files.",
|
| 1176 |
"exists": true,
|
| 1177 |
+
"bytes": 59218,
|
| 1178 |
+
"sha256": "38985fa362861b3975240ec62cc186378c84a8d0e11727651dc3cd2a87bfdd11"
|
| 1179 |
},
|
| 1180 |
{
|
| 1181 |
"id": "publication_audit",
|
|
|
|
| 1210 |
"volatile": true,
|
| 1211 |
"shows": "Confirms prepared GitHub/HF Space/artifact/model mirrors share the same critical data, figure, website HTML, and validator files.",
|
| 1212 |
"exists": true,
|
| 1213 |
+
"bytes": 902747,
|
| 1214 |
"hash_policy": "existence_and_size_only"
|
| 1215 |
},
|
| 1216 |
{
|
|
|
|
| 1222 |
"volatile": true,
|
| 1223 |
"shows": "Confirms local website links, anchors, JSON data files, and referenced images resolve.",
|
| 1224 |
"exists": true,
|
| 1225 |
+
"bytes": 19052,
|
| 1226 |
"hash_policy": "existence_and_size_only"
|
| 1227 |
},
|
| 1228 |
{
|
docs/data/mirror_parity.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-17T13:
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 611,
|
|
@@ -139,44 +139,44 @@
|
|
| 139 |
"path": "repo:docs/data/artifact_index.json",
|
| 140 |
"exists": true,
|
| 141 |
"bytes": 109674,
|
| 142 |
-
"sha256": "
|
| 143 |
},
|
| 144 |
"mirrors": {
|
| 145 |
"hf_space": {
|
| 146 |
"path": "hf_space:data/artifact_index.json",
|
| 147 |
"exists": true,
|
| 148 |
"bytes": 109674,
|
| 149 |
-
"sha256": "
|
| 150 |
},
|
| 151 |
"hf_artifacts_data": {
|
| 152 |
"path": "hf_artifacts:data/artifact_index.json",
|
| 153 |
"exists": true,
|
| 154 |
"bytes": 109674,
|
| 155 |
-
"sha256": "
|
| 156 |
},
|
| 157 |
"hf_artifacts": {
|
| 158 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 159 |
"exists": true,
|
| 160 |
"bytes": 109674,
|
| 161 |
-
"sha256": "
|
| 162 |
},
|
| 163 |
"hf_model_data": {
|
| 164 |
"path": "hf_model:data/artifact_index.json",
|
| 165 |
"exists": true,
|
| 166 |
"bytes": 109674,
|
| 167 |
-
"sha256": "
|
| 168 |
},
|
| 169 |
"hf_model_docs_data": {
|
| 170 |
"path": "hf_model:docs/data/artifact_index.json",
|
| 171 |
"exists": true,
|
| 172 |
"bytes": 109674,
|
| 173 |
-
"sha256": "
|
| 174 |
},
|
| 175 |
"hf_model": {
|
| 176 |
"path": "hf_model:metrics/artifact_index.json",
|
| 177 |
"exists": true,
|
| 178 |
"bytes": 109674,
|
| 179 |
-
"sha256": "
|
| 180 |
}
|
| 181 |
},
|
| 182 |
"failures": []
|
|
@@ -825,44 +825,44 @@
|
|
| 825 |
"path": "repo:docs/data/publication_audit.json",
|
| 826 |
"exists": true,
|
| 827 |
"bytes": 8299,
|
| 828 |
-
"sha256": "
|
| 829 |
},
|
| 830 |
"mirrors": {
|
| 831 |
"hf_space": {
|
| 832 |
"path": "hf_space:data/publication_audit.json",
|
| 833 |
"exists": true,
|
| 834 |
"bytes": 8299,
|
| 835 |
-
"sha256": "
|
| 836 |
},
|
| 837 |
"hf_artifacts_data": {
|
| 838 |
"path": "hf_artifacts:data/publication_audit.json",
|
| 839 |
"exists": true,
|
| 840 |
"bytes": 8299,
|
| 841 |
-
"sha256": "
|
| 842 |
},
|
| 843 |
"hf_artifacts": {
|
| 844 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 845 |
"exists": true,
|
| 846 |
"bytes": 8299,
|
| 847 |
-
"sha256": "
|
| 848 |
},
|
| 849 |
"hf_model_data": {
|
| 850 |
"path": "hf_model:data/publication_audit.json",
|
| 851 |
"exists": true,
|
| 852 |
"bytes": 8299,
|
| 853 |
-
"sha256": "
|
| 854 |
},
|
| 855 |
"hf_model_docs_data": {
|
| 856 |
"path": "hf_model:docs/data/publication_audit.json",
|
| 857 |
"exists": true,
|
| 858 |
"bytes": 8299,
|
| 859 |
-
"sha256": "
|
| 860 |
},
|
| 861 |
"hf_model": {
|
| 862 |
"path": "hf_model:metrics/publication_audit.json",
|
| 863 |
"exists": true,
|
| 864 |
"bytes": 8299,
|
| 865 |
-
"sha256": "
|
| 866 |
}
|
| 867 |
},
|
| 868 |
"failures": []
|
|
@@ -874,44 +874,44 @@
|
|
| 874 |
"path": "repo:docs/data/public_surface_qa.json",
|
| 875 |
"exists": true,
|
| 876 |
"bytes": 6146,
|
| 877 |
-
"sha256": "
|
| 878 |
},
|
| 879 |
"mirrors": {
|
| 880 |
"hf_space": {
|
| 881 |
"path": "hf_space:data/public_surface_qa.json",
|
| 882 |
"exists": true,
|
| 883 |
"bytes": 6146,
|
| 884 |
-
"sha256": "
|
| 885 |
},
|
| 886 |
"hf_artifacts_data": {
|
| 887 |
"path": "hf_artifacts:data/public_surface_qa.json",
|
| 888 |
"exists": true,
|
| 889 |
"bytes": 6146,
|
| 890 |
-
"sha256": "
|
| 891 |
},
|
| 892 |
"hf_artifacts": {
|
| 893 |
"path": "hf_artifacts:docs/data/public_surface_qa.json",
|
| 894 |
"exists": true,
|
| 895 |
"bytes": 6146,
|
| 896 |
-
"sha256": "
|
| 897 |
},
|
| 898 |
"hf_model_data": {
|
| 899 |
"path": "hf_model:data/public_surface_qa.json",
|
| 900 |
"exists": true,
|
| 901 |
"bytes": 6146,
|
| 902 |
-
"sha256": "
|
| 903 |
},
|
| 904 |
"hf_model_docs_data": {
|
| 905 |
"path": "hf_model:docs/data/public_surface_qa.json",
|
| 906 |
"exists": true,
|
| 907 |
"bytes": 6146,
|
| 908 |
-
"sha256": "
|
| 909 |
},
|
| 910 |
"hf_model": {
|
| 911 |
"path": "hf_model:metrics/public_surface_qa.json",
|
| 912 |
"exists": true,
|
| 913 |
"bytes": 6146,
|
| 914 |
-
"sha256": "
|
| 915 |
}
|
| 916 |
},
|
| 917 |
"failures": []
|
|
@@ -1560,44 +1560,44 @@
|
|
| 1560 |
"path": "repo:docs/data/source_alignment_audit.json",
|
| 1561 |
"exists": true,
|
| 1562 |
"bytes": 4432,
|
| 1563 |
-
"sha256": "
|
| 1564 |
},
|
| 1565 |
"mirrors": {
|
| 1566 |
"hf_space": {
|
| 1567 |
"path": "hf_space:data/source_alignment_audit.json",
|
| 1568 |
"exists": true,
|
| 1569 |
"bytes": 4432,
|
| 1570 |
-
"sha256": "
|
| 1571 |
},
|
| 1572 |
"hf_artifacts_data": {
|
| 1573 |
"path": "hf_artifacts:data/source_alignment_audit.json",
|
| 1574 |
"exists": true,
|
| 1575 |
"bytes": 4432,
|
| 1576 |
-
"sha256": "
|
| 1577 |
},
|
| 1578 |
"hf_artifacts": {
|
| 1579 |
"path": "hf_artifacts:docs/data/source_alignment_audit.json",
|
| 1580 |
"exists": true,
|
| 1581 |
"bytes": 4432,
|
| 1582 |
-
"sha256": "
|
| 1583 |
},
|
| 1584 |
"hf_model_data": {
|
| 1585 |
"path": "hf_model:data/source_alignment_audit.json",
|
| 1586 |
"exists": true,
|
| 1587 |
"bytes": 4432,
|
| 1588 |
-
"sha256": "
|
| 1589 |
},
|
| 1590 |
"hf_model_docs_data": {
|
| 1591 |
"path": "hf_model:docs/data/source_alignment_audit.json",
|
| 1592 |
"exists": true,
|
| 1593 |
"bytes": 4432,
|
| 1594 |
-
"sha256": "
|
| 1595 |
},
|
| 1596 |
"hf_model": {
|
| 1597 |
"path": "hf_model:metrics/source_alignment_audit.json",
|
| 1598 |
"exists": true,
|
| 1599 |
"bytes": 4432,
|
| 1600 |
-
"sha256": "
|
| 1601 |
}
|
| 1602 |
},
|
| 1603 |
"failures": []
|
|
@@ -1658,44 +1658,44 @@
|
|
| 1658 |
"path": "repo:docs/data/single_episode_task_model_radar.json",
|
| 1659 |
"exists": true,
|
| 1660 |
"bytes": 50973,
|
| 1661 |
-
"sha256": "
|
| 1662 |
},
|
| 1663 |
"mirrors": {
|
| 1664 |
"hf_space": {
|
| 1665 |
"path": "hf_space:data/single_episode_task_model_radar.json",
|
| 1666 |
"exists": true,
|
| 1667 |
"bytes": 50973,
|
| 1668 |
-
"sha256": "
|
| 1669 |
},
|
| 1670 |
"hf_artifacts_data": {
|
| 1671 |
"path": "hf_artifacts:data/single_episode_task_model_radar.json",
|
| 1672 |
"exists": true,
|
| 1673 |
"bytes": 50973,
|
| 1674 |
-
"sha256": "
|
| 1675 |
},
|
| 1676 |
"hf_artifacts": {
|
| 1677 |
"path": "hf_artifacts:docs/data/single_episode_task_model_radar.json",
|
| 1678 |
"exists": true,
|
| 1679 |
"bytes": 50973,
|
| 1680 |
-
"sha256": "
|
| 1681 |
},
|
| 1682 |
"hf_model_data": {
|
| 1683 |
"path": "hf_model:data/single_episode_task_model_radar.json",
|
| 1684 |
"exists": true,
|
| 1685 |
"bytes": 50973,
|
| 1686 |
-
"sha256": "
|
| 1687 |
},
|
| 1688 |
"hf_model_docs_data": {
|
| 1689 |
"path": "hf_model:docs/data/single_episode_task_model_radar.json",
|
| 1690 |
"exists": true,
|
| 1691 |
"bytes": 50973,
|
| 1692 |
-
"sha256": "
|
| 1693 |
},
|
| 1694 |
"hf_model": {
|
| 1695 |
"path": "hf_model:metrics/single_episode_task_model_radar.json",
|
| 1696 |
"exists": true,
|
| 1697 |
"bytes": 50973,
|
| 1698 |
-
"sha256": "
|
| 1699 |
}
|
| 1700 |
},
|
| 1701 |
"failures": []
|
|
@@ -1707,44 +1707,44 @@
|
|
| 1707 |
"path": "repo:docs/data/episode128_task_model_radar.json",
|
| 1708 |
"exists": true,
|
| 1709 |
"bytes": 187388,
|
| 1710 |
-
"sha256": "
|
| 1711 |
},
|
| 1712 |
"mirrors": {
|
| 1713 |
"hf_space": {
|
| 1714 |
"path": "hf_space:data/episode128_task_model_radar.json",
|
| 1715 |
"exists": true,
|
| 1716 |
"bytes": 187388,
|
| 1717 |
-
"sha256": "
|
| 1718 |
},
|
| 1719 |
"hf_artifacts_data": {
|
| 1720 |
"path": "hf_artifacts:data/episode128_task_model_radar.json",
|
| 1721 |
"exists": true,
|
| 1722 |
"bytes": 187388,
|
| 1723 |
-
"sha256": "
|
| 1724 |
},
|
| 1725 |
"hf_artifacts": {
|
| 1726 |
"path": "hf_artifacts:docs/data/episode128_task_model_radar.json",
|
| 1727 |
"exists": true,
|
| 1728 |
"bytes": 187388,
|
| 1729 |
-
"sha256": "
|
| 1730 |
},
|
| 1731 |
"hf_model_data": {
|
| 1732 |
"path": "hf_model:data/episode128_task_model_radar.json",
|
| 1733 |
"exists": true,
|
| 1734 |
"bytes": 187388,
|
| 1735 |
-
"sha256": "
|
| 1736 |
},
|
| 1737 |
"hf_model_docs_data": {
|
| 1738 |
"path": "hf_model:docs/data/episode128_task_model_radar.json",
|
| 1739 |
"exists": true,
|
| 1740 |
"bytes": 187388,
|
| 1741 |
-
"sha256": "
|
| 1742 |
},
|
| 1743 |
"hf_model": {
|
| 1744 |
"path": "hf_model:metrics/episode128_task_model_radar.json",
|
| 1745 |
"exists": true,
|
| 1746 |
"bytes": 187388,
|
| 1747 |
-
"sha256": "
|
| 1748 |
}
|
| 1749 |
},
|
| 1750 |
"failures": []
|
|
@@ -1903,44 +1903,44 @@
|
|
| 1903 |
"path": "repo:docs/data/task_surface_integrity.json",
|
| 1904 |
"exists": true,
|
| 1905 |
"bytes": 45779,
|
| 1906 |
-
"sha256": "
|
| 1907 |
},
|
| 1908 |
"mirrors": {
|
| 1909 |
"hf_space": {
|
| 1910 |
"path": "hf_space:data/task_surface_integrity.json",
|
| 1911 |
"exists": true,
|
| 1912 |
"bytes": 45779,
|
| 1913 |
-
"sha256": "
|
| 1914 |
},
|
| 1915 |
"hf_artifacts_data": {
|
| 1916 |
"path": "hf_artifacts:data/task_surface_integrity.json",
|
| 1917 |
"exists": true,
|
| 1918 |
"bytes": 45779,
|
| 1919 |
-
"sha256": "
|
| 1920 |
},
|
| 1921 |
"hf_artifacts": {
|
| 1922 |
"path": "hf_artifacts:docs/data/task_surface_integrity.json",
|
| 1923 |
"exists": true,
|
| 1924 |
"bytes": 45779,
|
| 1925 |
-
"sha256": "
|
| 1926 |
},
|
| 1927 |
"hf_model_data": {
|
| 1928 |
"path": "hf_model:data/task_surface_integrity.json",
|
| 1929 |
"exists": true,
|
| 1930 |
"bytes": 45779,
|
| 1931 |
-
"sha256": "
|
| 1932 |
},
|
| 1933 |
"hf_model_docs_data": {
|
| 1934 |
"path": "hf_model:docs/data/task_surface_integrity.json",
|
| 1935 |
"exists": true,
|
| 1936 |
"bytes": 45779,
|
| 1937 |
-
"sha256": "
|
| 1938 |
},
|
| 1939 |
"hf_model": {
|
| 1940 |
"path": "hf_model:metrics/task_surface_integrity.json",
|
| 1941 |
"exists": true,
|
| 1942 |
"bytes": 45779,
|
| 1943 |
-
"sha256": "
|
| 1944 |
}
|
| 1945 |
},
|
| 1946 |
"failures": []
|
|
@@ -2001,44 +2001,44 @@
|
|
| 2001 |
"path": "repo:docs/data/task_method_20_result_matrix.json",
|
| 2002 |
"exists": true,
|
| 2003 |
"bytes": 129749,
|
| 2004 |
-
"sha256": "
|
| 2005 |
},
|
| 2006 |
"mirrors": {
|
| 2007 |
"hf_space": {
|
| 2008 |
"path": "hf_space:data/task_method_20_result_matrix.json",
|
| 2009 |
"exists": true,
|
| 2010 |
"bytes": 129749,
|
| 2011 |
-
"sha256": "
|
| 2012 |
},
|
| 2013 |
"hf_artifacts_data": {
|
| 2014 |
"path": "hf_artifacts:data/task_method_20_result_matrix.json",
|
| 2015 |
"exists": true,
|
| 2016 |
"bytes": 129749,
|
| 2017 |
-
"sha256": "
|
| 2018 |
},
|
| 2019 |
"hf_artifacts": {
|
| 2020 |
"path": "hf_artifacts:docs/data/task_method_20_result_matrix.json",
|
| 2021 |
"exists": true,
|
| 2022 |
"bytes": 129749,
|
| 2023 |
-
"sha256": "
|
| 2024 |
},
|
| 2025 |
"hf_model_data": {
|
| 2026 |
"path": "hf_model:data/task_method_20_result_matrix.json",
|
| 2027 |
"exists": true,
|
| 2028 |
"bytes": 129749,
|
| 2029 |
-
"sha256": "
|
| 2030 |
},
|
| 2031 |
"hf_model_docs_data": {
|
| 2032 |
"path": "hf_model:docs/data/task_method_20_result_matrix.json",
|
| 2033 |
"exists": true,
|
| 2034 |
"bytes": 129749,
|
| 2035 |
-
"sha256": "
|
| 2036 |
},
|
| 2037 |
"hf_model": {
|
| 2038 |
"path": "hf_model:metrics/task_method_20_result_matrix.json",
|
| 2039 |
"exists": true,
|
| 2040 |
"bytes": 129749,
|
| 2041 |
-
"sha256": "
|
| 2042 |
}
|
| 2043 |
},
|
| 2044 |
"failures": []
|
|
@@ -2050,44 +2050,44 @@
|
|
| 2050 |
"path": "repo:docs/data/task_method_20_gap_audit.json",
|
| 2051 |
"exists": true,
|
| 2052 |
"bytes": 55745,
|
| 2053 |
-
"sha256": "
|
| 2054 |
},
|
| 2055 |
"mirrors": {
|
| 2056 |
"hf_space": {
|
| 2057 |
"path": "hf_space:data/task_method_20_gap_audit.json",
|
| 2058 |
"exists": true,
|
| 2059 |
"bytes": 55745,
|
| 2060 |
-
"sha256": "
|
| 2061 |
},
|
| 2062 |
"hf_artifacts_data": {
|
| 2063 |
"path": "hf_artifacts:data/task_method_20_gap_audit.json",
|
| 2064 |
"exists": true,
|
| 2065 |
"bytes": 55745,
|
| 2066 |
-
"sha256": "
|
| 2067 |
},
|
| 2068 |
"hf_artifacts": {
|
| 2069 |
"path": "hf_artifacts:docs/data/task_method_20_gap_audit.json",
|
| 2070 |
"exists": true,
|
| 2071 |
"bytes": 55745,
|
| 2072 |
-
"sha256": "
|
| 2073 |
},
|
| 2074 |
"hf_model_data": {
|
| 2075 |
"path": "hf_model:data/task_method_20_gap_audit.json",
|
| 2076 |
"exists": true,
|
| 2077 |
"bytes": 55745,
|
| 2078 |
-
"sha256": "
|
| 2079 |
},
|
| 2080 |
"hf_model_docs_data": {
|
| 2081 |
"path": "hf_model:docs/data/task_method_20_gap_audit.json",
|
| 2082 |
"exists": true,
|
| 2083 |
"bytes": 55745,
|
| 2084 |
-
"sha256": "
|
| 2085 |
},
|
| 2086 |
"hf_model": {
|
| 2087 |
"path": "hf_model:metrics/task_method_20_gap_audit.json",
|
| 2088 |
"exists": true,
|
| 2089 |
"bytes": 55745,
|
| 2090 |
-
"sha256": "
|
| 2091 |
}
|
| 2092 |
},
|
| 2093 |
"failures": []
|
|
@@ -2148,44 +2148,44 @@
|
|
| 2148 |
"path": "repo:docs/data/unified_task_model_radar.json",
|
| 2149 |
"exists": true,
|
| 2150 |
"bytes": 231240,
|
| 2151 |
-
"sha256": "
|
| 2152 |
},
|
| 2153 |
"mirrors": {
|
| 2154 |
"hf_space": {
|
| 2155 |
"path": "hf_space:data/unified_task_model_radar.json",
|
| 2156 |
"exists": true,
|
| 2157 |
"bytes": 231240,
|
| 2158 |
-
"sha256": "
|
| 2159 |
},
|
| 2160 |
"hf_artifacts_data": {
|
| 2161 |
"path": "hf_artifacts:data/unified_task_model_radar.json",
|
| 2162 |
"exists": true,
|
| 2163 |
"bytes": 231240,
|
| 2164 |
-
"sha256": "
|
| 2165 |
},
|
| 2166 |
"hf_artifacts": {
|
| 2167 |
"path": "hf_artifacts:docs/data/unified_task_model_radar.json",
|
| 2168 |
"exists": true,
|
| 2169 |
"bytes": 231240,
|
| 2170 |
-
"sha256": "
|
| 2171 |
},
|
| 2172 |
"hf_model_data": {
|
| 2173 |
"path": "hf_model:data/unified_task_model_radar.json",
|
| 2174 |
"exists": true,
|
| 2175 |
"bytes": 231240,
|
| 2176 |
-
"sha256": "
|
| 2177 |
},
|
| 2178 |
"hf_model_docs_data": {
|
| 2179 |
"path": "hf_model:docs/data/unified_task_model_radar.json",
|
| 2180 |
"exists": true,
|
| 2181 |
"bytes": 231240,
|
| 2182 |
-
"sha256": "
|
| 2183 |
},
|
| 2184 |
"hf_model": {
|
| 2185 |
"path": "hf_model:metrics/unified_task_model_radar.json",
|
| 2186 |
"exists": true,
|
| 2187 |
"bytes": 231240,
|
| 2188 |
-
"sha256": "
|
| 2189 |
}
|
| 2190 |
},
|
| 2191 |
"failures": []
|
|
@@ -2197,44 +2197,44 @@
|
|
| 2197 |
"path": "repo:docs/data/website_integrity.json",
|
| 2198 |
"exists": true,
|
| 2199 |
"bytes": 19052,
|
| 2200 |
-
"sha256": "
|
| 2201 |
},
|
| 2202 |
"mirrors": {
|
| 2203 |
"hf_space": {
|
| 2204 |
"path": "hf_space:data/website_integrity.json",
|
| 2205 |
"exists": true,
|
| 2206 |
"bytes": 19052,
|
| 2207 |
-
"sha256": "
|
| 2208 |
},
|
| 2209 |
"hf_artifacts_data": {
|
| 2210 |
"path": "hf_artifacts:data/website_integrity.json",
|
| 2211 |
"exists": true,
|
| 2212 |
"bytes": 19052,
|
| 2213 |
-
"sha256": "
|
| 2214 |
},
|
| 2215 |
"hf_artifacts": {
|
| 2216 |
"path": "hf_artifacts:docs/data/website_integrity.json",
|
| 2217 |
"exists": true,
|
| 2218 |
"bytes": 19052,
|
| 2219 |
-
"sha256": "
|
| 2220 |
},
|
| 2221 |
"hf_model_data": {
|
| 2222 |
"path": "hf_model:data/website_integrity.json",
|
| 2223 |
"exists": true,
|
| 2224 |
"bytes": 19052,
|
| 2225 |
-
"sha256": "
|
| 2226 |
},
|
| 2227 |
"hf_model_docs_data": {
|
| 2228 |
"path": "hf_model:docs/data/website_integrity.json",
|
| 2229 |
"exists": true,
|
| 2230 |
"bytes": 19052,
|
| 2231 |
-
"sha256": "
|
| 2232 |
},
|
| 2233 |
"hf_model": {
|
| 2234 |
"path": "hf_model:metrics/website_integrity.json",
|
| 2235 |
"exists": true,
|
| 2236 |
"bytes": 19052,
|
| 2237 |
-
"sha256": "
|
| 2238 |
}
|
| 2239 |
},
|
| 2240 |
"failures": []
|
|
@@ -3430,21 +3430,21 @@
|
|
| 3430 |
"local": {
|
| 3431 |
"path": "repo:scripts/omni/collect_qwen3_future_task_probe_results.sh",
|
| 3432 |
"exists": true,
|
| 3433 |
-
"bytes":
|
| 3434 |
-
"sha256": "
|
| 3435 |
},
|
| 3436 |
"mirrors": {
|
| 3437 |
"hf_artifacts": {
|
| 3438 |
"path": "hf_artifacts:scripts/omni/collect_qwen3_future_task_probe_results.sh",
|
| 3439 |
"exists": true,
|
| 3440 |
-
"bytes":
|
| 3441 |
-
"sha256": "
|
| 3442 |
},
|
| 3443 |
"hf_model": {
|
| 3444 |
"path": "hf_model:scripts/omni/collect_qwen3_future_task_probe_results.sh",
|
| 3445 |
"exists": true,
|
| 3446 |
-
"bytes":
|
| 3447 |
-
"sha256": "
|
| 3448 |
}
|
| 3449 |
},
|
| 3450 |
"failures": []
|
|
@@ -3530,21 +3530,21 @@
|
|
| 3530 |
"local": {
|
| 3531 |
"path": "repo:scripts/omni/eval_qwen3_omni_future_task_probes.py",
|
| 3532 |
"exists": true,
|
| 3533 |
-
"bytes":
|
| 3534 |
-
"sha256": "
|
| 3535 |
},
|
| 3536 |
"mirrors": {
|
| 3537 |
"hf_artifacts": {
|
| 3538 |
"path": "hf_artifacts:scripts/omni/eval_qwen3_omni_future_task_probes.py",
|
| 3539 |
"exists": true,
|
| 3540 |
-
"bytes":
|
| 3541 |
-
"sha256": "
|
| 3542 |
},
|
| 3543 |
"hf_model": {
|
| 3544 |
"path": "hf_model:scripts/omni/eval_qwen3_omni_future_task_probes.py",
|
| 3545 |
"exists": true,
|
| 3546 |
-
"bytes":
|
| 3547 |
-
"sha256": "
|
| 3548 |
}
|
| 3549 |
},
|
| 3550 |
"failures": []
|
|
@@ -4280,21 +4280,21 @@
|
|
| 4280 |
"local": {
|
| 4281 |
"path": "repo:scripts/build_unified_task_model_radar.py",
|
| 4282 |
"exists": true,
|
| 4283 |
-
"bytes":
|
| 4284 |
-
"sha256": "
|
| 4285 |
},
|
| 4286 |
"mirrors": {
|
| 4287 |
"hf_artifacts": {
|
| 4288 |
"path": "hf_artifacts:scripts/build_unified_task_model_radar.py",
|
| 4289 |
"exists": true,
|
| 4290 |
-
"bytes":
|
| 4291 |
-
"sha256": "
|
| 4292 |
},
|
| 4293 |
"hf_model": {
|
| 4294 |
"path": "hf_model:scripts/build_unified_task_model_radar.py",
|
| 4295 |
"exists": true,
|
| 4296 |
-
"bytes":
|
| 4297 |
-
"sha256": "
|
| 4298 |
}
|
| 4299 |
},
|
| 4300 |
"failures": []
|
|
@@ -4330,21 +4330,21 @@
|
|
| 4330 |
"local": {
|
| 4331 |
"path": "repo:scripts/verify_live_publication.py",
|
| 4332 |
"exists": true,
|
| 4333 |
-
"bytes":
|
| 4334 |
-
"sha256": "
|
| 4335 |
},
|
| 4336 |
"mirrors": {
|
| 4337 |
"hf_artifacts": {
|
| 4338 |
"path": "hf_artifacts:scripts/verify_live_publication.py",
|
| 4339 |
"exists": true,
|
| 4340 |
-
"bytes":
|
| 4341 |
-
"sha256": "
|
| 4342 |
},
|
| 4343 |
"hf_model": {
|
| 4344 |
"path": "hf_model:scripts/verify_live_publication.py",
|
| 4345 |
"exists": true,
|
| 4346 |
-
"bytes":
|
| 4347 |
-
"sha256": "
|
| 4348 |
}
|
| 4349 |
},
|
| 4350 |
"failures": []
|
|
@@ -19545,26 +19545,26 @@
|
|
| 19545 |
"path": "repo:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19546 |
"exists": true,
|
| 19547 |
"bytes": 15690,
|
| 19548 |
-
"sha256": "
|
| 19549 |
},
|
| 19550 |
"mirrors": {
|
| 19551 |
"hf_space": {
|
| 19552 |
"path": "hf_space:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19553 |
"exists": true,
|
| 19554 |
"bytes": 15690,
|
| 19555 |
-
"sha256": "
|
| 19556 |
},
|
| 19557 |
"hf_artifacts": {
|
| 19558 |
"path": "hf_artifacts:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19559 |
"exists": true,
|
| 19560 |
"bytes": 15690,
|
| 19561 |
-
"sha256": "
|
| 19562 |
},
|
| 19563 |
"hf_model": {
|
| 19564 |
"path": "hf_model:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19565 |
"exists": true,
|
| 19566 |
"bytes": 15690,
|
| 19567 |
-
"sha256": "
|
| 19568 |
}
|
| 19569 |
},
|
| 19570 |
"failures": []
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-17T13:55:47+00:00",
|
| 4 |
"hf_root": "hf_publish",
|
| 5 |
"summary": {
|
| 6 |
"group_count": 611,
|
|
|
|
| 139 |
"path": "repo:docs/data/artifact_index.json",
|
| 140 |
"exists": true,
|
| 141 |
"bytes": 109674,
|
| 142 |
+
"sha256": "9bec12c02579b9a14296a6f88f6fa2dcfb339d730f0d1068d9e55a7015bfbcc5"
|
| 143 |
},
|
| 144 |
"mirrors": {
|
| 145 |
"hf_space": {
|
| 146 |
"path": "hf_space:data/artifact_index.json",
|
| 147 |
"exists": true,
|
| 148 |
"bytes": 109674,
|
| 149 |
+
"sha256": "9bec12c02579b9a14296a6f88f6fa2dcfb339d730f0d1068d9e55a7015bfbcc5"
|
| 150 |
},
|
| 151 |
"hf_artifacts_data": {
|
| 152 |
"path": "hf_artifacts:data/artifact_index.json",
|
| 153 |
"exists": true,
|
| 154 |
"bytes": 109674,
|
| 155 |
+
"sha256": "9bec12c02579b9a14296a6f88f6fa2dcfb339d730f0d1068d9e55a7015bfbcc5"
|
| 156 |
},
|
| 157 |
"hf_artifacts": {
|
| 158 |
"path": "hf_artifacts:docs/data/artifact_index.json",
|
| 159 |
"exists": true,
|
| 160 |
"bytes": 109674,
|
| 161 |
+
"sha256": "9bec12c02579b9a14296a6f88f6fa2dcfb339d730f0d1068d9e55a7015bfbcc5"
|
| 162 |
},
|
| 163 |
"hf_model_data": {
|
| 164 |
"path": "hf_model:data/artifact_index.json",
|
| 165 |
"exists": true,
|
| 166 |
"bytes": 109674,
|
| 167 |
+
"sha256": "9bec12c02579b9a14296a6f88f6fa2dcfb339d730f0d1068d9e55a7015bfbcc5"
|
| 168 |
},
|
| 169 |
"hf_model_docs_data": {
|
| 170 |
"path": "hf_model:docs/data/artifact_index.json",
|
| 171 |
"exists": true,
|
| 172 |
"bytes": 109674,
|
| 173 |
+
"sha256": "9bec12c02579b9a14296a6f88f6fa2dcfb339d730f0d1068d9e55a7015bfbcc5"
|
| 174 |
},
|
| 175 |
"hf_model": {
|
| 176 |
"path": "hf_model:metrics/artifact_index.json",
|
| 177 |
"exists": true,
|
| 178 |
"bytes": 109674,
|
| 179 |
+
"sha256": "9bec12c02579b9a14296a6f88f6fa2dcfb339d730f0d1068d9e55a7015bfbcc5"
|
| 180 |
}
|
| 181 |
},
|
| 182 |
"failures": []
|
|
|
|
| 825 |
"path": "repo:docs/data/publication_audit.json",
|
| 826 |
"exists": true,
|
| 827 |
"bytes": 8299,
|
| 828 |
+
"sha256": "3d34bd58cd7f7a682d2a3a37786eb21db051d87ebec28c561b117b2c2388cee4"
|
| 829 |
},
|
| 830 |
"mirrors": {
|
| 831 |
"hf_space": {
|
| 832 |
"path": "hf_space:data/publication_audit.json",
|
| 833 |
"exists": true,
|
| 834 |
"bytes": 8299,
|
| 835 |
+
"sha256": "3d34bd58cd7f7a682d2a3a37786eb21db051d87ebec28c561b117b2c2388cee4"
|
| 836 |
},
|
| 837 |
"hf_artifacts_data": {
|
| 838 |
"path": "hf_artifacts:data/publication_audit.json",
|
| 839 |
"exists": true,
|
| 840 |
"bytes": 8299,
|
| 841 |
+
"sha256": "3d34bd58cd7f7a682d2a3a37786eb21db051d87ebec28c561b117b2c2388cee4"
|
| 842 |
},
|
| 843 |
"hf_artifacts": {
|
| 844 |
"path": "hf_artifacts:docs/data/publication_audit.json",
|
| 845 |
"exists": true,
|
| 846 |
"bytes": 8299,
|
| 847 |
+
"sha256": "3d34bd58cd7f7a682d2a3a37786eb21db051d87ebec28c561b117b2c2388cee4"
|
| 848 |
},
|
| 849 |
"hf_model_data": {
|
| 850 |
"path": "hf_model:data/publication_audit.json",
|
| 851 |
"exists": true,
|
| 852 |
"bytes": 8299,
|
| 853 |
+
"sha256": "3d34bd58cd7f7a682d2a3a37786eb21db051d87ebec28c561b117b2c2388cee4"
|
| 854 |
},
|
| 855 |
"hf_model_docs_data": {
|
| 856 |
"path": "hf_model:docs/data/publication_audit.json",
|
| 857 |
"exists": true,
|
| 858 |
"bytes": 8299,
|
| 859 |
+
"sha256": "3d34bd58cd7f7a682d2a3a37786eb21db051d87ebec28c561b117b2c2388cee4"
|
| 860 |
},
|
| 861 |
"hf_model": {
|
| 862 |
"path": "hf_model:metrics/publication_audit.json",
|
| 863 |
"exists": true,
|
| 864 |
"bytes": 8299,
|
| 865 |
+
"sha256": "3d34bd58cd7f7a682d2a3a37786eb21db051d87ebec28c561b117b2c2388cee4"
|
| 866 |
}
|
| 867 |
},
|
| 868 |
"failures": []
|
|
|
|
| 874 |
"path": "repo:docs/data/public_surface_qa.json",
|
| 875 |
"exists": true,
|
| 876 |
"bytes": 6146,
|
| 877 |
+
"sha256": "3e4cc531cf1c69099ffdf65073af9afbad473f86ac4049e8078e71dee7427a3b"
|
| 878 |
},
|
| 879 |
"mirrors": {
|
| 880 |
"hf_space": {
|
| 881 |
"path": "hf_space:data/public_surface_qa.json",
|
| 882 |
"exists": true,
|
| 883 |
"bytes": 6146,
|
| 884 |
+
"sha256": "3e4cc531cf1c69099ffdf65073af9afbad473f86ac4049e8078e71dee7427a3b"
|
| 885 |
},
|
| 886 |
"hf_artifacts_data": {
|
| 887 |
"path": "hf_artifacts:data/public_surface_qa.json",
|
| 888 |
"exists": true,
|
| 889 |
"bytes": 6146,
|
| 890 |
+
"sha256": "3e4cc531cf1c69099ffdf65073af9afbad473f86ac4049e8078e71dee7427a3b"
|
| 891 |
},
|
| 892 |
"hf_artifacts": {
|
| 893 |
"path": "hf_artifacts:docs/data/public_surface_qa.json",
|
| 894 |
"exists": true,
|
| 895 |
"bytes": 6146,
|
| 896 |
+
"sha256": "3e4cc531cf1c69099ffdf65073af9afbad473f86ac4049e8078e71dee7427a3b"
|
| 897 |
},
|
| 898 |
"hf_model_data": {
|
| 899 |
"path": "hf_model:data/public_surface_qa.json",
|
| 900 |
"exists": true,
|
| 901 |
"bytes": 6146,
|
| 902 |
+
"sha256": "3e4cc531cf1c69099ffdf65073af9afbad473f86ac4049e8078e71dee7427a3b"
|
| 903 |
},
|
| 904 |
"hf_model_docs_data": {
|
| 905 |
"path": "hf_model:docs/data/public_surface_qa.json",
|
| 906 |
"exists": true,
|
| 907 |
"bytes": 6146,
|
| 908 |
+
"sha256": "3e4cc531cf1c69099ffdf65073af9afbad473f86ac4049e8078e71dee7427a3b"
|
| 909 |
},
|
| 910 |
"hf_model": {
|
| 911 |
"path": "hf_model:metrics/public_surface_qa.json",
|
| 912 |
"exists": true,
|
| 913 |
"bytes": 6146,
|
| 914 |
+
"sha256": "3e4cc531cf1c69099ffdf65073af9afbad473f86ac4049e8078e71dee7427a3b"
|
| 915 |
}
|
| 916 |
},
|
| 917 |
"failures": []
|
|
|
|
| 1560 |
"path": "repo:docs/data/source_alignment_audit.json",
|
| 1561 |
"exists": true,
|
| 1562 |
"bytes": 4432,
|
| 1563 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 1564 |
},
|
| 1565 |
"mirrors": {
|
| 1566 |
"hf_space": {
|
| 1567 |
"path": "hf_space:data/source_alignment_audit.json",
|
| 1568 |
"exists": true,
|
| 1569 |
"bytes": 4432,
|
| 1570 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 1571 |
},
|
| 1572 |
"hf_artifacts_data": {
|
| 1573 |
"path": "hf_artifacts:data/source_alignment_audit.json",
|
| 1574 |
"exists": true,
|
| 1575 |
"bytes": 4432,
|
| 1576 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 1577 |
},
|
| 1578 |
"hf_artifacts": {
|
| 1579 |
"path": "hf_artifacts:docs/data/source_alignment_audit.json",
|
| 1580 |
"exists": true,
|
| 1581 |
"bytes": 4432,
|
| 1582 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 1583 |
},
|
| 1584 |
"hf_model_data": {
|
| 1585 |
"path": "hf_model:data/source_alignment_audit.json",
|
| 1586 |
"exists": true,
|
| 1587 |
"bytes": 4432,
|
| 1588 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 1589 |
},
|
| 1590 |
"hf_model_docs_data": {
|
| 1591 |
"path": "hf_model:docs/data/source_alignment_audit.json",
|
| 1592 |
"exists": true,
|
| 1593 |
"bytes": 4432,
|
| 1594 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 1595 |
},
|
| 1596 |
"hf_model": {
|
| 1597 |
"path": "hf_model:metrics/source_alignment_audit.json",
|
| 1598 |
"exists": true,
|
| 1599 |
"bytes": 4432,
|
| 1600 |
+
"sha256": "5d013c6820fb7f582b4b4b9a55f98de20168ea1947d4bea64e11d16dbd521428"
|
| 1601 |
}
|
| 1602 |
},
|
| 1603 |
"failures": []
|
|
|
|
| 1658 |
"path": "repo:docs/data/single_episode_task_model_radar.json",
|
| 1659 |
"exists": true,
|
| 1660 |
"bytes": 50973,
|
| 1661 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 1662 |
},
|
| 1663 |
"mirrors": {
|
| 1664 |
"hf_space": {
|
| 1665 |
"path": "hf_space:data/single_episode_task_model_radar.json",
|
| 1666 |
"exists": true,
|
| 1667 |
"bytes": 50973,
|
| 1668 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 1669 |
},
|
| 1670 |
"hf_artifacts_data": {
|
| 1671 |
"path": "hf_artifacts:data/single_episode_task_model_radar.json",
|
| 1672 |
"exists": true,
|
| 1673 |
"bytes": 50973,
|
| 1674 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 1675 |
},
|
| 1676 |
"hf_artifacts": {
|
| 1677 |
"path": "hf_artifacts:docs/data/single_episode_task_model_radar.json",
|
| 1678 |
"exists": true,
|
| 1679 |
"bytes": 50973,
|
| 1680 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 1681 |
},
|
| 1682 |
"hf_model_data": {
|
| 1683 |
"path": "hf_model:data/single_episode_task_model_radar.json",
|
| 1684 |
"exists": true,
|
| 1685 |
"bytes": 50973,
|
| 1686 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 1687 |
},
|
| 1688 |
"hf_model_docs_data": {
|
| 1689 |
"path": "hf_model:docs/data/single_episode_task_model_radar.json",
|
| 1690 |
"exists": true,
|
| 1691 |
"bytes": 50973,
|
| 1692 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 1693 |
},
|
| 1694 |
"hf_model": {
|
| 1695 |
"path": "hf_model:metrics/single_episode_task_model_radar.json",
|
| 1696 |
"exists": true,
|
| 1697 |
"bytes": 50973,
|
| 1698 |
+
"sha256": "07a6a8026e48e60d6a1ee0686d615645590ac3d95cc938fc9f0b26cbdea5d3a6"
|
| 1699 |
}
|
| 1700 |
},
|
| 1701 |
"failures": []
|
|
|
|
| 1707 |
"path": "repo:docs/data/episode128_task_model_radar.json",
|
| 1708 |
"exists": true,
|
| 1709 |
"bytes": 187388,
|
| 1710 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 1711 |
},
|
| 1712 |
"mirrors": {
|
| 1713 |
"hf_space": {
|
| 1714 |
"path": "hf_space:data/episode128_task_model_radar.json",
|
| 1715 |
"exists": true,
|
| 1716 |
"bytes": 187388,
|
| 1717 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 1718 |
},
|
| 1719 |
"hf_artifacts_data": {
|
| 1720 |
"path": "hf_artifacts:data/episode128_task_model_radar.json",
|
| 1721 |
"exists": true,
|
| 1722 |
"bytes": 187388,
|
| 1723 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 1724 |
},
|
| 1725 |
"hf_artifacts": {
|
| 1726 |
"path": "hf_artifacts:docs/data/episode128_task_model_radar.json",
|
| 1727 |
"exists": true,
|
| 1728 |
"bytes": 187388,
|
| 1729 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 1730 |
},
|
| 1731 |
"hf_model_data": {
|
| 1732 |
"path": "hf_model:data/episode128_task_model_radar.json",
|
| 1733 |
"exists": true,
|
| 1734 |
"bytes": 187388,
|
| 1735 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 1736 |
},
|
| 1737 |
"hf_model_docs_data": {
|
| 1738 |
"path": "hf_model:docs/data/episode128_task_model_radar.json",
|
| 1739 |
"exists": true,
|
| 1740 |
"bytes": 187388,
|
| 1741 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 1742 |
},
|
| 1743 |
"hf_model": {
|
| 1744 |
"path": "hf_model:metrics/episode128_task_model_radar.json",
|
| 1745 |
"exists": true,
|
| 1746 |
"bytes": 187388,
|
| 1747 |
+
"sha256": "47e37a1b6bbbb3df98630dfab0de8e39e2c170400d1bce52054967a136dbc58c"
|
| 1748 |
}
|
| 1749 |
},
|
| 1750 |
"failures": []
|
|
|
|
| 1903 |
"path": "repo:docs/data/task_surface_integrity.json",
|
| 1904 |
"exists": true,
|
| 1905 |
"bytes": 45779,
|
| 1906 |
+
"sha256": "bf1b292db388f5a513100369078098b88705445908d8065a2b7907c584e40393"
|
| 1907 |
},
|
| 1908 |
"mirrors": {
|
| 1909 |
"hf_space": {
|
| 1910 |
"path": "hf_space:data/task_surface_integrity.json",
|
| 1911 |
"exists": true,
|
| 1912 |
"bytes": 45779,
|
| 1913 |
+
"sha256": "bf1b292db388f5a513100369078098b88705445908d8065a2b7907c584e40393"
|
| 1914 |
},
|
| 1915 |
"hf_artifacts_data": {
|
| 1916 |
"path": "hf_artifacts:data/task_surface_integrity.json",
|
| 1917 |
"exists": true,
|
| 1918 |
"bytes": 45779,
|
| 1919 |
+
"sha256": "bf1b292db388f5a513100369078098b88705445908d8065a2b7907c584e40393"
|
| 1920 |
},
|
| 1921 |
"hf_artifacts": {
|
| 1922 |
"path": "hf_artifacts:docs/data/task_surface_integrity.json",
|
| 1923 |
"exists": true,
|
| 1924 |
"bytes": 45779,
|
| 1925 |
+
"sha256": "bf1b292db388f5a513100369078098b88705445908d8065a2b7907c584e40393"
|
| 1926 |
},
|
| 1927 |
"hf_model_data": {
|
| 1928 |
"path": "hf_model:data/task_surface_integrity.json",
|
| 1929 |
"exists": true,
|
| 1930 |
"bytes": 45779,
|
| 1931 |
+
"sha256": "bf1b292db388f5a513100369078098b88705445908d8065a2b7907c584e40393"
|
| 1932 |
},
|
| 1933 |
"hf_model_docs_data": {
|
| 1934 |
"path": "hf_model:docs/data/task_surface_integrity.json",
|
| 1935 |
"exists": true,
|
| 1936 |
"bytes": 45779,
|
| 1937 |
+
"sha256": "bf1b292db388f5a513100369078098b88705445908d8065a2b7907c584e40393"
|
| 1938 |
},
|
| 1939 |
"hf_model": {
|
| 1940 |
"path": "hf_model:metrics/task_surface_integrity.json",
|
| 1941 |
"exists": true,
|
| 1942 |
"bytes": 45779,
|
| 1943 |
+
"sha256": "bf1b292db388f5a513100369078098b88705445908d8065a2b7907c584e40393"
|
| 1944 |
}
|
| 1945 |
},
|
| 1946 |
"failures": []
|
|
|
|
| 2001 |
"path": "repo:docs/data/task_method_20_result_matrix.json",
|
| 2002 |
"exists": true,
|
| 2003 |
"bytes": 129749,
|
| 2004 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 2005 |
},
|
| 2006 |
"mirrors": {
|
| 2007 |
"hf_space": {
|
| 2008 |
"path": "hf_space:data/task_method_20_result_matrix.json",
|
| 2009 |
"exists": true,
|
| 2010 |
"bytes": 129749,
|
| 2011 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 2012 |
},
|
| 2013 |
"hf_artifacts_data": {
|
| 2014 |
"path": "hf_artifacts:data/task_method_20_result_matrix.json",
|
| 2015 |
"exists": true,
|
| 2016 |
"bytes": 129749,
|
| 2017 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 2018 |
},
|
| 2019 |
"hf_artifacts": {
|
| 2020 |
"path": "hf_artifacts:docs/data/task_method_20_result_matrix.json",
|
| 2021 |
"exists": true,
|
| 2022 |
"bytes": 129749,
|
| 2023 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 2024 |
},
|
| 2025 |
"hf_model_data": {
|
| 2026 |
"path": "hf_model:data/task_method_20_result_matrix.json",
|
| 2027 |
"exists": true,
|
| 2028 |
"bytes": 129749,
|
| 2029 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 2030 |
},
|
| 2031 |
"hf_model_docs_data": {
|
| 2032 |
"path": "hf_model:docs/data/task_method_20_result_matrix.json",
|
| 2033 |
"exists": true,
|
| 2034 |
"bytes": 129749,
|
| 2035 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 2036 |
},
|
| 2037 |
"hf_model": {
|
| 2038 |
"path": "hf_model:metrics/task_method_20_result_matrix.json",
|
| 2039 |
"exists": true,
|
| 2040 |
"bytes": 129749,
|
| 2041 |
+
"sha256": "58636609d9145bce26857ddee8e0fe4751ebee8429d4bef60fbe9d9daf7d2bd4"
|
| 2042 |
}
|
| 2043 |
},
|
| 2044 |
"failures": []
|
|
|
|
| 2050 |
"path": "repo:docs/data/task_method_20_gap_audit.json",
|
| 2051 |
"exists": true,
|
| 2052 |
"bytes": 55745,
|
| 2053 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 2054 |
},
|
| 2055 |
"mirrors": {
|
| 2056 |
"hf_space": {
|
| 2057 |
"path": "hf_space:data/task_method_20_gap_audit.json",
|
| 2058 |
"exists": true,
|
| 2059 |
"bytes": 55745,
|
| 2060 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 2061 |
},
|
| 2062 |
"hf_artifacts_data": {
|
| 2063 |
"path": "hf_artifacts:data/task_method_20_gap_audit.json",
|
| 2064 |
"exists": true,
|
| 2065 |
"bytes": 55745,
|
| 2066 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 2067 |
},
|
| 2068 |
"hf_artifacts": {
|
| 2069 |
"path": "hf_artifacts:docs/data/task_method_20_gap_audit.json",
|
| 2070 |
"exists": true,
|
| 2071 |
"bytes": 55745,
|
| 2072 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 2073 |
},
|
| 2074 |
"hf_model_data": {
|
| 2075 |
"path": "hf_model:data/task_method_20_gap_audit.json",
|
| 2076 |
"exists": true,
|
| 2077 |
"bytes": 55745,
|
| 2078 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 2079 |
},
|
| 2080 |
"hf_model_docs_data": {
|
| 2081 |
"path": "hf_model:docs/data/task_method_20_gap_audit.json",
|
| 2082 |
"exists": true,
|
| 2083 |
"bytes": 55745,
|
| 2084 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 2085 |
},
|
| 2086 |
"hf_model": {
|
| 2087 |
"path": "hf_model:metrics/task_method_20_gap_audit.json",
|
| 2088 |
"exists": true,
|
| 2089 |
"bytes": 55745,
|
| 2090 |
+
"sha256": "7cc10a067d029ae4d55869b2db1181e01fb5063ec5637111255a4f3d79dbb082"
|
| 2091 |
}
|
| 2092 |
},
|
| 2093 |
"failures": []
|
|
|
|
| 2148 |
"path": "repo:docs/data/unified_task_model_radar.json",
|
| 2149 |
"exists": true,
|
| 2150 |
"bytes": 231240,
|
| 2151 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 2152 |
},
|
| 2153 |
"mirrors": {
|
| 2154 |
"hf_space": {
|
| 2155 |
"path": "hf_space:data/unified_task_model_radar.json",
|
| 2156 |
"exists": true,
|
| 2157 |
"bytes": 231240,
|
| 2158 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 2159 |
},
|
| 2160 |
"hf_artifacts_data": {
|
| 2161 |
"path": "hf_artifacts:data/unified_task_model_radar.json",
|
| 2162 |
"exists": true,
|
| 2163 |
"bytes": 231240,
|
| 2164 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 2165 |
},
|
| 2166 |
"hf_artifacts": {
|
| 2167 |
"path": "hf_artifacts:docs/data/unified_task_model_radar.json",
|
| 2168 |
"exists": true,
|
| 2169 |
"bytes": 231240,
|
| 2170 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 2171 |
},
|
| 2172 |
"hf_model_data": {
|
| 2173 |
"path": "hf_model:data/unified_task_model_radar.json",
|
| 2174 |
"exists": true,
|
| 2175 |
"bytes": 231240,
|
| 2176 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 2177 |
},
|
| 2178 |
"hf_model_docs_data": {
|
| 2179 |
"path": "hf_model:docs/data/unified_task_model_radar.json",
|
| 2180 |
"exists": true,
|
| 2181 |
"bytes": 231240,
|
| 2182 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 2183 |
},
|
| 2184 |
"hf_model": {
|
| 2185 |
"path": "hf_model:metrics/unified_task_model_radar.json",
|
| 2186 |
"exists": true,
|
| 2187 |
"bytes": 231240,
|
| 2188 |
+
"sha256": "87eb194c326323167b356448678fc9e2cc4b39610c48e6e14d368d55261d2745"
|
| 2189 |
}
|
| 2190 |
},
|
| 2191 |
"failures": []
|
|
|
|
| 2197 |
"path": "repo:docs/data/website_integrity.json",
|
| 2198 |
"exists": true,
|
| 2199 |
"bytes": 19052,
|
| 2200 |
+
"sha256": "1be490f4c58971d19e1f9c614f40cbd64a776b8dec350438dae455596dfc182e"
|
| 2201 |
},
|
| 2202 |
"mirrors": {
|
| 2203 |
"hf_space": {
|
| 2204 |
"path": "hf_space:data/website_integrity.json",
|
| 2205 |
"exists": true,
|
| 2206 |
"bytes": 19052,
|
| 2207 |
+
"sha256": "1be490f4c58971d19e1f9c614f40cbd64a776b8dec350438dae455596dfc182e"
|
| 2208 |
},
|
| 2209 |
"hf_artifacts_data": {
|
| 2210 |
"path": "hf_artifacts:data/website_integrity.json",
|
| 2211 |
"exists": true,
|
| 2212 |
"bytes": 19052,
|
| 2213 |
+
"sha256": "1be490f4c58971d19e1f9c614f40cbd64a776b8dec350438dae455596dfc182e"
|
| 2214 |
},
|
| 2215 |
"hf_artifacts": {
|
| 2216 |
"path": "hf_artifacts:docs/data/website_integrity.json",
|
| 2217 |
"exists": true,
|
| 2218 |
"bytes": 19052,
|
| 2219 |
+
"sha256": "1be490f4c58971d19e1f9c614f40cbd64a776b8dec350438dae455596dfc182e"
|
| 2220 |
},
|
| 2221 |
"hf_model_data": {
|
| 2222 |
"path": "hf_model:data/website_integrity.json",
|
| 2223 |
"exists": true,
|
| 2224 |
"bytes": 19052,
|
| 2225 |
+
"sha256": "1be490f4c58971d19e1f9c614f40cbd64a776b8dec350438dae455596dfc182e"
|
| 2226 |
},
|
| 2227 |
"hf_model_docs_data": {
|
| 2228 |
"path": "hf_model:docs/data/website_integrity.json",
|
| 2229 |
"exists": true,
|
| 2230 |
"bytes": 19052,
|
| 2231 |
+
"sha256": "1be490f4c58971d19e1f9c614f40cbd64a776b8dec350438dae455596dfc182e"
|
| 2232 |
},
|
| 2233 |
"hf_model": {
|
| 2234 |
"path": "hf_model:metrics/website_integrity.json",
|
| 2235 |
"exists": true,
|
| 2236 |
"bytes": 19052,
|
| 2237 |
+
"sha256": "1be490f4c58971d19e1f9c614f40cbd64a776b8dec350438dae455596dfc182e"
|
| 2238 |
}
|
| 2239 |
},
|
| 2240 |
"failures": []
|
|
|
|
| 3430 |
"local": {
|
| 3431 |
"path": "repo:scripts/omni/collect_qwen3_future_task_probe_results.sh",
|
| 3432 |
"exists": true,
|
| 3433 |
+
"bytes": 3726,
|
| 3434 |
+
"sha256": "35918a28d6e34acae6f71e667570354f82a1cdbd32816f603d248e19c356980c"
|
| 3435 |
},
|
| 3436 |
"mirrors": {
|
| 3437 |
"hf_artifacts": {
|
| 3438 |
"path": "hf_artifacts:scripts/omni/collect_qwen3_future_task_probe_results.sh",
|
| 3439 |
"exists": true,
|
| 3440 |
+
"bytes": 3726,
|
| 3441 |
+
"sha256": "35918a28d6e34acae6f71e667570354f82a1cdbd32816f603d248e19c356980c"
|
| 3442 |
},
|
| 3443 |
"hf_model": {
|
| 3444 |
"path": "hf_model:scripts/omni/collect_qwen3_future_task_probe_results.sh",
|
| 3445 |
"exists": true,
|
| 3446 |
+
"bytes": 3726,
|
| 3447 |
+
"sha256": "35918a28d6e34acae6f71e667570354f82a1cdbd32816f603d248e19c356980c"
|
| 3448 |
}
|
| 3449 |
},
|
| 3450 |
"failures": []
|
|
|
|
| 3530 |
"local": {
|
| 3531 |
"path": "repo:scripts/omni/eval_qwen3_omni_future_task_probes.py",
|
| 3532 |
"exists": true,
|
| 3533 |
+
"bytes": 32653,
|
| 3534 |
+
"sha256": "5298a9c83252ac31cd30fa89e54834f98f6ccada8ffe10680f34773cbbe98d30"
|
| 3535 |
},
|
| 3536 |
"mirrors": {
|
| 3537 |
"hf_artifacts": {
|
| 3538 |
"path": "hf_artifacts:scripts/omni/eval_qwen3_omni_future_task_probes.py",
|
| 3539 |
"exists": true,
|
| 3540 |
+
"bytes": 32653,
|
| 3541 |
+
"sha256": "5298a9c83252ac31cd30fa89e54834f98f6ccada8ffe10680f34773cbbe98d30"
|
| 3542 |
},
|
| 3543 |
"hf_model": {
|
| 3544 |
"path": "hf_model:scripts/omni/eval_qwen3_omni_future_task_probes.py",
|
| 3545 |
"exists": true,
|
| 3546 |
+
"bytes": 32653,
|
| 3547 |
+
"sha256": "5298a9c83252ac31cd30fa89e54834f98f6ccada8ffe10680f34773cbbe98d30"
|
| 3548 |
}
|
| 3549 |
},
|
| 3550 |
"failures": []
|
|
|
|
| 4280 |
"local": {
|
| 4281 |
"path": "repo:scripts/build_unified_task_model_radar.py",
|
| 4282 |
"exists": true,
|
| 4283 |
+
"bytes": 51243,
|
| 4284 |
+
"sha256": "e0f995a01e8589a7f819dc5b766156c26e8b14e4db9c3c0c5e08be7a29b4de56"
|
| 4285 |
},
|
| 4286 |
"mirrors": {
|
| 4287 |
"hf_artifacts": {
|
| 4288 |
"path": "hf_artifacts:scripts/build_unified_task_model_radar.py",
|
| 4289 |
"exists": true,
|
| 4290 |
+
"bytes": 51243,
|
| 4291 |
+
"sha256": "e0f995a01e8589a7f819dc5b766156c26e8b14e4db9c3c0c5e08be7a29b4de56"
|
| 4292 |
},
|
| 4293 |
"hf_model": {
|
| 4294 |
"path": "hf_model:scripts/build_unified_task_model_radar.py",
|
| 4295 |
"exists": true,
|
| 4296 |
+
"bytes": 51243,
|
| 4297 |
+
"sha256": "e0f995a01e8589a7f819dc5b766156c26e8b14e4db9c3c0c5e08be7a29b4de56"
|
| 4298 |
}
|
| 4299 |
},
|
| 4300 |
"failures": []
|
|
|
|
| 4330 |
"local": {
|
| 4331 |
"path": "repo:scripts/verify_live_publication.py",
|
| 4332 |
"exists": true,
|
| 4333 |
+
"bytes": 57383,
|
| 4334 |
+
"sha256": "4cf40aa266827832734791b63862174a1d08a086bd97166fab31707320d5609c"
|
| 4335 |
},
|
| 4336 |
"mirrors": {
|
| 4337 |
"hf_artifacts": {
|
| 4338 |
"path": "hf_artifacts:scripts/verify_live_publication.py",
|
| 4339 |
"exists": true,
|
| 4340 |
+
"bytes": 57383,
|
| 4341 |
+
"sha256": "4cf40aa266827832734791b63862174a1d08a086bd97166fab31707320d5609c"
|
| 4342 |
},
|
| 4343 |
"hf_model": {
|
| 4344 |
"path": "hf_model:scripts/verify_live_publication.py",
|
| 4345 |
"exists": true,
|
| 4346 |
+
"bytes": 57383,
|
| 4347 |
+
"sha256": "4cf40aa266827832734791b63862174a1d08a086bd97166fab31707320d5609c"
|
| 4348 |
}
|
| 4349 |
},
|
| 4350 |
"failures": []
|
|
|
|
| 19545 |
"path": "repo:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19546 |
"exists": true,
|
| 19547 |
"bytes": 15690,
|
| 19548 |
+
"sha256": "bec8510557fee7505f68d697590eefdcaad96d70d9d9b201fab7a9bdc361a2ac"
|
| 19549 |
},
|
| 19550 |
"mirrors": {
|
| 19551 |
"hf_space": {
|
| 19552 |
"path": "hf_space:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19553 |
"exists": true,
|
| 19554 |
"bytes": 15690,
|
| 19555 |
+
"sha256": "bec8510557fee7505f68d697590eefdcaad96d70d9d9b201fab7a9bdc361a2ac"
|
| 19556 |
},
|
| 19557 |
"hf_artifacts": {
|
| 19558 |
"path": "hf_artifacts:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19559 |
"exists": true,
|
| 19560 |
"bytes": 15690,
|
| 19561 |
+
"sha256": "bec8510557fee7505f68d697590eefdcaad96d70d9d9b201fab7a9bdc361a2ac"
|
| 19562 |
},
|
| 19563 |
"hf_model": {
|
| 19564 |
"path": "hf_model:TASK_METHOD_20_GAP_AUDIT.md",
|
| 19565 |
"exists": true,
|
| 19566 |
"bytes": 15690,
|
| 19567 |
+
"sha256": "bec8510557fee7505f68d697590eefdcaad96d70d9d9b201fab7a9bdc361a2ac"
|
| 19568 |
}
|
| 19569 |
},
|
| 19570 |
"failures": []
|
docs/data/public_surface_qa.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Public Project Surface",
|
| 3 |
"status": "pass",
|
| 4 |
-
"generated_at_utc": "2026-06-
|
| 5 |
"scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
|
| 6 |
"checks": [
|
| 7 |
{
|
|
@@ -18,7 +18,7 @@
|
|
| 18 |
"website_integrity": {
|
| 19 |
"exists": true,
|
| 20 |
"status": "pass",
|
| 21 |
-
"generated_at_utc": "2026-06-
|
| 22 |
},
|
| 23 |
"rendered_site_check": {
|
| 24 |
"exists": true,
|
|
@@ -28,12 +28,12 @@
|
|
| 28 |
"task_surface_integrity": {
|
| 29 |
"exists": true,
|
| 30 |
"status": "pass",
|
| 31 |
-
"generated_at_utc": "2026-06-
|
| 32 |
},
|
| 33 |
"source_alignment": {
|
| 34 |
"exists": true,
|
| 35 |
"status": "pass",
|
| 36 |
-
"generated_at_utc": "2026-06-
|
| 37 |
},
|
| 38 |
"scale_up_status": {
|
| 39 |
"exists": true,
|
|
@@ -43,12 +43,12 @@
|
|
| 43 |
"publication_package": {
|
| 44 |
"exists": true,
|
| 45 |
"status": "pass",
|
| 46 |
-
"generated_at_utc": "2026-06-
|
| 47 |
},
|
| 48 |
"mirror_parity": {
|
| 49 |
"exists": true,
|
| 50 |
"status": "pass",
|
| 51 |
-
"generated_at_utc": "2026-06-
|
| 52 |
}
|
| 53 |
},
|
| 54 |
"failures": {}
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Public Project Surface",
|
| 3 |
"status": "pass",
|
| 4 |
+
"generated_at_utc": "2026-06-17T15:16:15+00:00",
|
| 5 |
"scope": "Repo README, GitHub Pages HTML, Hugging Face Space card, artifact dataset card, and model card.",
|
| 6 |
"checks": [
|
| 7 |
{
|
|
|
|
| 18 |
"website_integrity": {
|
| 19 |
"exists": true,
|
| 20 |
"status": "pass",
|
| 21 |
+
"generated_at_utc": "2026-06-17T13:55:22+00:00"
|
| 22 |
},
|
| 23 |
"rendered_site_check": {
|
| 24 |
"exists": true,
|
|
|
|
| 28 |
"task_surface_integrity": {
|
| 29 |
"exists": true,
|
| 30 |
"status": "pass",
|
| 31 |
+
"generated_at_utc": "2026-06-17T13:55:20+00:00"
|
| 32 |
},
|
| 33 |
"source_alignment": {
|
| 34 |
"exists": true,
|
| 35 |
"status": "pass",
|
| 36 |
+
"generated_at_utc": "2026-06-17T13:55:20+00:00"
|
| 37 |
},
|
| 38 |
"scale_up_status": {
|
| 39 |
"exists": true,
|
|
|
|
| 43 |
"publication_package": {
|
| 44 |
"exists": true,
|
| 45 |
"status": "pass",
|
| 46 |
+
"generated_at_utc": "2026-06-17T13:55:30+00:00"
|
| 47 |
},
|
| 48 |
"mirror_parity": {
|
| 49 |
"exists": true,
|
| 50 |
"status": "pass",
|
| 51 |
+
"generated_at_utc": "2026-06-17T13:55:47+00:00"
|
| 52 |
}
|
| 53 |
},
|
| 54 |
"failures": {}
|
docs/data/publication_audit.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
@@ -122,6 +122,9 @@
|
|
| 122 |
"docs/assets/charts/unified_task_model_radar.svg": true,
|
| 123 |
"docs/assets/charts/single_episode_task_model_radar.svg": true,
|
| 124 |
"docs/assets/charts/episode128_task_model_radar.svg": true,
|
|
|
|
|
|
|
|
|
|
| 125 |
"docs/assets/pipeline_diagram.png": true,
|
| 126 |
"docs/assets/task_architectures.png": true,
|
| 127 |
"results/episode_task_suite/summary_report.json": true,
|
|
@@ -200,8 +203,8 @@
|
|
| 200 |
"github_repo": {
|
| 201 |
"root": "repo",
|
| 202 |
"exists": true,
|
| 203 |
-
"file_count":
|
| 204 |
-
"text_file_count":
|
| 205 |
"largest_file": {
|
| 206 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 207 |
"bytes": 55702978
|
|
@@ -222,7 +225,7 @@
|
|
| 222 |
"hf_artifact_bundle": {
|
| 223 |
"root": "hf_publish/artifacts",
|
| 224 |
"exists": true,
|
| 225 |
-
"file_count":
|
| 226 |
"text_file_count": 1036,
|
| 227 |
"largest_file": {
|
| 228 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
|
@@ -233,7 +236,7 @@
|
|
| 233 |
"hf_model_bundle": {
|
| 234 |
"root": "hf_publish/model",
|
| 235 |
"exists": true,
|
| 236 |
-
"file_count":
|
| 237 |
"text_file_count": 1197,
|
| 238 |
"largest_file": {
|
| 239 |
"path": "pytorch_model.bin",
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-17T15:17:33+00:00",
|
| 4 |
"checks": [
|
| 5 |
{
|
| 6 |
"name": "required_publication_assets_present",
|
|
|
|
| 122 |
"docs/assets/charts/unified_task_model_radar.svg": true,
|
| 123 |
"docs/assets/charts/single_episode_task_model_radar.svg": true,
|
| 124 |
"docs/assets/charts/episode128_task_model_radar.svg": true,
|
| 125 |
+
"docs/assets/foundation-pipelines/spatial-intelligence-pipeline.png": true,
|
| 126 |
+
"docs/assets/foundation-pipelines/human-video-world-model-pipeline.png": true,
|
| 127 |
+
"docs/assets/foundation-pipelines/vision-language-action-pipeline.png": true,
|
| 128 |
"docs/assets/pipeline_diagram.png": true,
|
| 129 |
"docs/assets/task_architectures.png": true,
|
| 130 |
"results/episode_task_suite/summary_report.json": true,
|
|
|
|
| 203 |
"github_repo": {
|
| 204 |
"root": "repo",
|
| 205 |
"exists": true,
|
| 206 |
+
"file_count": 1216,
|
| 207 |
+
"text_file_count": 1018,
|
| 208 |
"largest_file": {
|
| 209 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
| 210 |
"bytes": 55702978
|
|
|
|
| 225 |
"hf_artifact_bundle": {
|
| 226 |
"root": "hf_publish/artifacts",
|
| 227 |
"exists": true,
|
| 228 |
+
"file_count": 2389,
|
| 229 |
"text_file_count": 1036,
|
| 230 |
"largest_file": {
|
| 231 |
"path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
|
|
|
|
| 236 |
"hf_model_bundle": {
|
| 237 |
"root": "hf_publish/model",
|
| 238 |
"exists": true,
|
| 239 |
+
"file_count": 2824,
|
| 240 |
"text_file_count": 1197,
|
| 241 |
"largest_file": {
|
| 242 |
"path": "pytorch_model.bin",
|
docs/data/single_episode_task_model_radar.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Single-Episode 20-Task Radar",
|
| 3 |
"status": "pass",
|
| 4 |
-
"generated_at_utc": "2026-06-
|
| 5 |
"description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
|
| 6 |
"task_count": 20,
|
| 7 |
"method_count": 2,
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Single-Episode 20-Task Radar",
|
| 3 |
"status": "pass",
|
| 4 |
+
"generated_at_utc": "2026-06-17T13:55:02+00:00",
|
| 5 |
"description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
|
| 6 |
"task_count": 20,
|
| 7 |
"method_count": 2,
|
docs/data/source_alignment_audit.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Source Alignment Note",
|
| 3 |
"status": "pass",
|
| 4 |
-
"generated_at_utc": "2026-06-
|
| 5 |
"alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
|
| 6 |
"alignment_summary": {
|
| 7 |
"full_dataset_repo": "ropedia-ai/xperience-10m",
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Ropedia Xperience-10M Source Alignment Note",
|
| 3 |
"status": "pass",
|
| 4 |
+
"generated_at_utc": "2026-06-17T15:17:20+00:00",
|
| 5 |
"alignment_json": "docs/data/xperience10m_dataset_card_alignment.json",
|
| 6 |
"alignment_summary": {
|
| 7 |
"full_dataset_repo": "ropedia-ai/xperience-10m",
|
docs/data/task_method_20_gap_audit.json
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
{
|
| 2 |
-
"generated_at_utc": "2026-06-
|
| 3 |
"immediate_actions": [
|
| 4 |
{
|
| 5 |
"artifact": "docs/data/task_method_20_gap_audit.json",
|
|
|
|
| 1 |
{
|
| 2 |
+
"generated_at_utc": "2026-06-17T13:55:12+00:00",
|
| 3 |
"immediate_actions": [
|
| 4 |
{
|
| 5 |
"artifact": "docs/data/task_method_20_gap_audit.json",
|
docs/data/task_method_20_result_matrix.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Task Method 20-Result Matrix",
|
| 3 |
"status": "pass",
|
| 4 |
-
"generated_at_utc": "2026-06-
|
| 5 |
"task_count": 20,
|
| 6 |
"method_count": 9,
|
| 7 |
"method_task_record_count": 180,
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Task Method 20-Result Matrix",
|
| 3 |
"status": "pass",
|
| 4 |
+
"generated_at_utc": "2026-06-17T13:55:02+00:00",
|
| 5 |
"task_count": 20,
|
| 6 |
"method_count": 9,
|
| 7 |
"method_task_record_count": 180,
|
docs/data/task_surface_integrity.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"summary": {
|
| 5 |
"task_count": 12,
|
| 6 |
"expected_task_count": 12,
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-17T15:17:20+00:00",
|
| 4 |
"summary": {
|
| 5 |
"task_count": 12,
|
| 6 |
"expected_task_count": 12,
|
docs/data/three_foundation_pipelines.json
CHANGED
|
@@ -3,6 +3,13 @@
|
|
| 3 |
"status": "pipeline_plan",
|
| 4 |
"source_document": "THREE_FOUNDATION_PIPELINES.md",
|
| 5 |
"claim_boundary": "These are supported pipeline directions, not three completed model-quality claims.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"shared_principles": [
|
| 7 |
"Use episode-level train/validation/test separation.",
|
| 8 |
"Build manifest-first exporters before training.",
|
|
@@ -44,6 +51,9 @@
|
|
| 44 |
"first_pipeline": "Build a spatial-memory exporter, start with metric depth and pose consistency tasks, then evaluate spatial QA, object permanence, counting, retrieval, and pose-aware consistency.",
|
| 45 |
"current_maturity": "Ready as a pipeline and evaluation contract.",
|
| 46 |
"next_gate": "Raw depth and pose artifacts plus held-out multi-episode spatial metrics.",
|
|
|
|
|
|
|
|
|
|
| 47 |
"avoid_claiming_now": [
|
| 48 |
"full neural rendering",
|
| 49 |
"full 3D reconstruction",
|
|
@@ -82,6 +92,9 @@
|
|
| 82 |
"first_pipeline": "Keep Qwen-style structured future probes for task interpretability, keep Cosmos-style dynamics branches separate, and add latent or feature-reconstruction metrics before claiming world-model quality.",
|
| 83 |
"current_maturity": "Partially evidenced by current future-task probes and Cosmos-style branch artifacts.",
|
| 84 |
"next_gate": "Stronger future-state metrics, qualitative future examples, and held-out episode breakdowns.",
|
|
|
|
|
|
|
|
|
|
| 85 |
"avoid_claiming_now": [
|
| 86 |
"strong world model from structured future-task scores alone",
|
| 87 |
"visual future quality without visual or latent future metrics"
|
|
@@ -118,6 +131,9 @@
|
|
| 118 |
"first_pipeline": "Define the action space, use existing 20-task next-action/contact/object-conditioned tasks first, then add hand-trajectory or policy-compatible action chunks after conversion is traceable.",
|
| 119 |
"current_maturity": "Feasible but gated by action-target conversion.",
|
| 120 |
"next_gate": "Traceable action tokens, normalization, retargeting metadata, and held-out policy metrics.",
|
|
|
|
|
|
|
|
|
|
| 121 |
"avoid_claiming_now": [
|
| 122 |
"robot policy quality",
|
| 123 |
"policy generalization before action-space evidence exists"
|
|
|
|
| 3 |
"status": "pipeline_plan",
|
| 4 |
"source_document": "THREE_FOUNDATION_PIPELINES.md",
|
| 5 |
"claim_boundary": "These are supported pipeline directions, not three completed model-quality claims.",
|
| 6 |
+
"placeholder_assets": {
|
| 7 |
+
"status": "published_placeholders",
|
| 8 |
+
"asset_root": "docs/assets/foundation-pipelines",
|
| 9 |
+
"source": "ChatGPT image generation with repo-local prompt notes",
|
| 10 |
+
"source_prompt_file": "docs/assets/foundation-pipelines/prompts.md",
|
| 11 |
+
"note": "Images are visual placeholders for pipeline tracks. Technical claims remain governed by the Markdown/JSON contracts and verified metrics."
|
| 12 |
+
},
|
| 13 |
"shared_principles": [
|
| 14 |
"Use episode-level train/validation/test separation.",
|
| 15 |
"Build manifest-first exporters before training.",
|
|
|
|
| 51 |
"first_pipeline": "Build a spatial-memory exporter, start with metric depth and pose consistency tasks, then evaluate spatial QA, object permanence, counting, retrieval, and pose-aware consistency.",
|
| 52 |
"current_maturity": "Ready as a pipeline and evaluation contract.",
|
| 53 |
"next_gate": "Raw depth and pose artifacts plus held-out multi-episode spatial metrics.",
|
| 54 |
+
"placeholder_image": "docs/assets/foundation-pipelines/spatial-intelligence-pipeline.png",
|
| 55 |
+
"website_image": "assets/foundation-pipelines/spatial-intelligence-pipeline.png",
|
| 56 |
+
"image_alt": "Placeholder visual for the spatial intelligence pipeline: multiview video, depth, and pose inputs feeding scene memory and spatial reasoning outputs.",
|
| 57 |
"avoid_claiming_now": [
|
| 58 |
"full neural rendering",
|
| 59 |
"full 3D reconstruction",
|
|
|
|
| 92 |
"first_pipeline": "Keep Qwen-style structured future probes for task interpretability, keep Cosmos-style dynamics branches separate, and add latent or feature-reconstruction metrics before claiming world-model quality.",
|
| 93 |
"current_maturity": "Partially evidenced by current future-task probes and Cosmos-style branch artifacts.",
|
| 94 |
"next_gate": "Stronger future-state metrics, qualitative future examples, and held-out episode breakdowns.",
|
| 95 |
+
"placeholder_image": "docs/assets/foundation-pipelines/human-video-world-model-pipeline.png",
|
| 96 |
+
"website_image": "assets/foundation-pipelines/human-video-world-model-pipeline.png",
|
| 97 |
+
"image_alt": "Placeholder visual for the human-video world model pipeline: observed interaction windows feeding temporal dynamics and future-state outputs.",
|
| 98 |
"avoid_claiming_now": [
|
| 99 |
"strong world model from structured future-task scores alone",
|
| 100 |
"visual future quality without visual or latent future metrics"
|
|
|
|
| 131 |
"first_pipeline": "Define the action space, use existing 20-task next-action/contact/object-conditioned tasks first, then add hand-trajectory or policy-compatible action chunks after conversion is traceable.",
|
| 132 |
"current_maturity": "Feasible but gated by action-target conversion.",
|
| 133 |
"next_gate": "Traceable action tokens, normalization, retargeting metadata, and held-out policy metrics.",
|
| 134 |
+
"placeholder_image": "docs/assets/foundation-pipelines/vision-language-action-pipeline.png",
|
| 135 |
+
"website_image": "assets/foundation-pipelines/vision-language-action-pipeline.png",
|
| 136 |
+
"image_alt": "Placeholder visual for the vision-language-action pipeline: video, language, motion, and contact cues feeding action-chunk outputs.",
|
| 137 |
"avoid_claiming_now": [
|
| 138 |
"robot policy quality",
|
| 139 |
"policy generalization before action-space evidence exists"
|
docs/data/unified_task_model_radar.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"title": "Unified 20-Task Model Radar",
|
| 3 |
"status": "pass",
|
| 4 |
-
"generated_at_utc": "2026-06-
|
| 5 |
"task_count": 20,
|
| 6 |
"method_count": 9,
|
| 7 |
"method_task_record_count": 180,
|
|
|
|
| 1 |
{
|
| 2 |
"title": "Unified 20-Task Model Radar",
|
| 3 |
"status": "pass",
|
| 4 |
+
"generated_at_utc": "2026-06-17T13:55:02+00:00",
|
| 5 |
"task_count": 20,
|
| 6 |
"method_count": 9,
|
| 7 |
"method_task_record_count": 180,
|
docs/data/website_integrity.json
CHANGED
|
@@ -1,14 +1,14 @@
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
-
"generated_at_utc": "2026-06-
|
| 4 |
"docs_root": "docs",
|
| 5 |
"site_base": "/ropedia-xperience-10m-task-suite/",
|
| 6 |
"summary": {
|
| 7 |
"html_pages": 4,
|
| 8 |
-
"local_references":
|
| 9 |
"external_reference_count": 123,
|
| 10 |
"json_files": 47,
|
| 11 |
-
"image_assets_referenced":
|
| 12 |
"failure_count": 0
|
| 13 |
},
|
| 14 |
"failures": {
|
|
@@ -80,8 +80,8 @@
|
|
| 80 |
"name": "project_overview_precedes_progress_ledger",
|
| 81 |
"status": "pass",
|
| 82 |
"reason": "The project overview should appear before the deeper progress ledger.",
|
| 83 |
-
"overview_index":
|
| 84 |
-
"evidence_index":
|
| 85 |
},
|
| 86 |
{
|
| 87 |
"name": "project_status_links_json",
|
|
@@ -159,9 +159,9 @@
|
|
| 159 |
"name": "evaluation_protocol_between_overview_and_progress",
|
| 160 |
"status": "pass",
|
| 161 |
"reason": "The evaluation protocol should appear before the deeper evidence ledger.",
|
| 162 |
-
"overview_index":
|
| 163 |
-
"protocol_index":
|
| 164 |
-
"evidence_index":
|
| 165 |
},
|
| 166 |
{
|
| 167 |
"name": "evaluation_protocol_links_json",
|
|
@@ -277,8 +277,8 @@
|
|
| 277 |
{
|
| 278 |
"path": "index.html",
|
| 279 |
"id_count": 90,
|
| 280 |
-
"reference_count":
|
| 281 |
-
"image_count":
|
| 282 |
},
|
| 283 |
{
|
| 284 |
"path": "research_roadmap.html",
|
|
@@ -301,7 +301,7 @@
|
|
| 301 |
},
|
| 302 |
{
|
| 303 |
"path": "data/artifact_index.json",
|
| 304 |
-
"bytes":
|
| 305 |
"top_level_type": "dict"
|
| 306 |
},
|
| 307 |
{
|
|
@@ -331,7 +331,7 @@
|
|
| 331 |
},
|
| 332 |
{
|
| 333 |
"path": "data/figure_index.json",
|
| 334 |
-
"bytes":
|
| 335 |
"top_level_type": "dict"
|
| 336 |
},
|
| 337 |
{
|
|
@@ -346,7 +346,7 @@
|
|
| 346 |
},
|
| 347 |
{
|
| 348 |
"path": "data/mirror_parity.json",
|
| 349 |
-
"bytes":
|
| 350 |
"top_level_type": "dict"
|
| 351 |
},
|
| 352 |
{
|
|
@@ -506,7 +506,7 @@
|
|
| 506 |
},
|
| 507 |
{
|
| 508 |
"path": "data/three_foundation_pipelines.json",
|
| 509 |
-
"bytes":
|
| 510 |
"top_level_type": "dict"
|
| 511 |
},
|
| 512 |
{
|
|
@@ -521,7 +521,7 @@
|
|
| 521 |
},
|
| 522 |
{
|
| 523 |
"path": "data/website_integrity.json",
|
| 524 |
-
"bytes":
|
| 525 |
"top_level_type": "dict"
|
| 526 |
},
|
| 527 |
{
|
|
@@ -630,6 +630,30 @@
|
|
| 630 |
"format": "SVG",
|
| 631 |
"has_viewbox": true
|
| 632 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 633 |
{
|
| 634 |
"path": "assets/modalities/audio.png",
|
| 635 |
"exists": true,
|
|
|
|
| 1 |
{
|
| 2 |
"status": "pass",
|
| 3 |
+
"generated_at_utc": "2026-06-17T15:17:21+00:00",
|
| 4 |
"docs_root": "docs",
|
| 5 |
"site_base": "/ropedia-xperience-10m-task-suite/",
|
| 6 |
"summary": {
|
| 7 |
"html_pages": 4,
|
| 8 |
+
"local_references": 187,
|
| 9 |
"external_reference_count": 123,
|
| 10 |
"json_files": 47,
|
| 11 |
+
"image_assets_referenced": 28,
|
| 12 |
"failure_count": 0
|
| 13 |
},
|
| 14 |
"failures": {
|
|
|
|
| 80 |
"name": "project_overview_precedes_progress_ledger",
|
| 81 |
"status": "pass",
|
| 82 |
"reason": "The project overview should appear before the deeper progress ledger.",
|
| 83 |
+
"overview_index": 88890,
|
| 84 |
+
"evidence_index": 119623
|
| 85 |
},
|
| 86 |
{
|
| 87 |
"name": "project_status_links_json",
|
|
|
|
| 159 |
"name": "evaluation_protocol_between_overview_and_progress",
|
| 160 |
"status": "pass",
|
| 161 |
"reason": "The evaluation protocol should appear before the deeper evidence ledger.",
|
| 162 |
+
"overview_index": 88890,
|
| 163 |
+
"protocol_index": 115804,
|
| 164 |
+
"evidence_index": 119623
|
| 165 |
},
|
| 166 |
{
|
| 167 |
"name": "evaluation_protocol_links_json",
|
|
|
|
| 277 |
{
|
| 278 |
"path": "index.html",
|
| 279 |
"id_count": 90,
|
| 280 |
+
"reference_count": 163,
|
| 281 |
+
"image_count": 34
|
| 282 |
},
|
| 283 |
{
|
| 284 |
"path": "research_roadmap.html",
|
|
|
|
| 301 |
},
|
| 302 |
{
|
| 303 |
"path": "data/artifact_index.json",
|
| 304 |
+
"bytes": 111262,
|
| 305 |
"top_level_type": "dict"
|
| 306 |
},
|
| 307 |
{
|
|
|
|
| 331 |
},
|
| 332 |
{
|
| 333 |
"path": "data/figure_index.json",
|
| 334 |
+
"bytes": 19501,
|
| 335 |
"top_level_type": "dict"
|
| 336 |
},
|
| 337 |
{
|
|
|
|
| 346 |
},
|
| 347 |
{
|
| 348 |
"path": "data/mirror_parity.json",
|
| 349 |
+
"bytes": 902747,
|
| 350 |
"top_level_type": "dict"
|
| 351 |
},
|
| 352 |
{
|
|
|
|
| 506 |
},
|
| 507 |
{
|
| 508 |
"path": "data/three_foundation_pipelines.json",
|
| 509 |
+
"bytes": 6518,
|
| 510 |
"top_level_type": "dict"
|
| 511 |
},
|
| 512 |
{
|
|
|
|
| 521 |
},
|
| 522 |
{
|
| 523 |
"path": "data/website_integrity.json",
|
| 524 |
+
"bytes": 19052,
|
| 525 |
"top_level_type": "dict"
|
| 526 |
},
|
| 527 |
{
|
|
|
|
| 630 |
"format": "SVG",
|
| 631 |
"has_viewbox": true
|
| 632 |
},
|
| 633 |
+
{
|
| 634 |
+
"path": "assets/foundation-pipelines/human-video-world-model-pipeline.png",
|
| 635 |
+
"exists": true,
|
| 636 |
+
"bytes": 2356312,
|
| 637 |
+
"width": 1672,
|
| 638 |
+
"height": 941,
|
| 639 |
+
"format": "PNG"
|
| 640 |
+
},
|
| 641 |
+
{
|
| 642 |
+
"path": "assets/foundation-pipelines/spatial-intelligence-pipeline.png",
|
| 643 |
+
"exists": true,
|
| 644 |
+
"bytes": 2337155,
|
| 645 |
+
"width": 1672,
|
| 646 |
+
"height": 941,
|
| 647 |
+
"format": "PNG"
|
| 648 |
+
},
|
| 649 |
+
{
|
| 650 |
+
"path": "assets/foundation-pipelines/vision-language-action-pipeline.png",
|
| 651 |
+
"exists": true,
|
| 652 |
+
"bytes": 2421011,
|
| 653 |
+
"width": 1672,
|
| 654 |
+
"height": 941,
|
| 655 |
+
"format": "PNG"
|
| 656 |
+
},
|
| 657 |
{
|
| 658 |
"path": "assets/modalities/audio.png",
|
| 659 |
"exists": true,
|
scripts/omni/collect_qwen3_future_task_probe_results.sh
CHANGED
|
@@ -14,12 +14,9 @@ REMOTE_RUN_DIR="${REMOTE_ROOT}/${RESULT_ROOT}/${RUN_ID}"
|
|
| 14 |
LOCAL_RUN_DIR="${PROJECT_ROOT}/${RESULT_ROOT}/${RUN_ID}"
|
| 15 |
LOCAL_LAUNCHER_DIR="${PROJECT_ROOT}/${RESULT_ROOT}/deferred_launchers"
|
| 16 |
REMOTE_LAUNCHER_LOG="${REMOTE_ROOT}/${RESULT_ROOT}/deferred_launchers/${RUN_ID}.launcher.log"
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
-
long_horizon_next_action
|
| 20 |
-
next_subtask_forecast
|
| 21 |
-
object_set_forecast
|
| 22 |
-
)
|
| 23 |
|
| 24 |
echo "checking remote run ${REMOTE_HOST}:${REMOTE_RUN_DIR}"
|
| 25 |
ssh "$REMOTE_HOST" "cd '$REMOTE_ROOT' && test -s '${RESULT_ROOT}/${RUN_ID}/summary.json'"
|
|
@@ -33,19 +30,24 @@ ssh "$REMOTE_HOST" "test -s '$REMOTE_LAUNCHER_LOG'" >/dev/null 2>&1 \
|
|
| 33 |
&& rsync -av "${REMOTE_HOST}:${REMOTE_LAUNCHER_LOG}" "$LOCAL_LAUNCHER_DIR/" \
|
| 34 |
|| true
|
| 35 |
|
| 36 |
-
python3 - "$PROJECT_ROOT" "$RUN_ID" <<'PY'
|
| 37 |
import json
|
| 38 |
import sys
|
| 39 |
from pathlib import Path
|
| 40 |
|
| 41 |
root = Path(sys.argv[1])
|
| 42 |
run_id = sys.argv[2]
|
|
|
|
| 43 |
run_dir = root / "results/omni_finetune" / run_id
|
| 44 |
-
|
|
|
|
|
|
|
| 45 |
"long_horizon_next_action": "long_horizon_next_action_macro_f1",
|
| 46 |
"next_subtask_forecast": "next_subtask_forecast_macro_f1",
|
| 47 |
"object_set_forecast": "object_set_forecast_micro_f1",
|
|
|
|
| 48 |
}
|
|
|
|
| 49 |
|
| 50 |
summary_path = run_dir / "summary.json"
|
| 51 |
if not summary_path.exists():
|
|
|
|
| 14 |
LOCAL_RUN_DIR="${PROJECT_ROOT}/${RESULT_ROOT}/${RUN_ID}"
|
| 15 |
LOCAL_LAUNCHER_DIR="${PROJECT_ROOT}/${RESULT_ROOT}/deferred_launchers"
|
| 16 |
REMOTE_LAUNCHER_LOG="${REMOTE_ROOT}/${RESULT_ROOT}/deferred_launchers/${RUN_ID}.launcher.log"
|
| 17 |
+
TASKS_CSV="${TASKS_CSV:-long_horizon_next_action,next_subtask_forecast,object_set_forecast}"
|
| 18 |
|
| 19 |
+
IFS=',' read -r -a TASKS <<< "$TASKS_CSV"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
echo "checking remote run ${REMOTE_HOST}:${REMOTE_RUN_DIR}"
|
| 22 |
ssh "$REMOTE_HOST" "cd '$REMOTE_ROOT' && test -s '${RESULT_ROOT}/${RUN_ID}/summary.json'"
|
|
|
|
| 30 |
&& rsync -av "${REMOTE_HOST}:${REMOTE_LAUNCHER_LOG}" "$LOCAL_LAUNCHER_DIR/" \
|
| 31 |
|| true
|
| 32 |
|
| 33 |
+
python3 - "$PROJECT_ROOT" "$RUN_ID" "$TASKS_CSV" <<'PY'
|
| 34 |
import json
|
| 35 |
import sys
|
| 36 |
from pathlib import Path
|
| 37 |
|
| 38 |
root = Path(sys.argv[1])
|
| 39 |
run_id = sys.argv[2]
|
| 40 |
+
task_ids = [item.strip() for item in sys.argv[3].split(",") if item.strip()]
|
| 41 |
run_dir = root / "results/omni_finetune" / run_id
|
| 42 |
+
metric_key_by_task = {
|
| 43 |
+
"temporal_order": "temporal_order_f1",
|
| 44 |
+
"misalignment_detection": "misalignment_detection_f1",
|
| 45 |
"long_horizon_next_action": "long_horizon_next_action_macro_f1",
|
| 46 |
"next_subtask_forecast": "next_subtask_forecast_macro_f1",
|
| 47 |
"object_set_forecast": "object_set_forecast_micro_f1",
|
| 48 |
+
"time_to_transition": "time_to_transition_mae",
|
| 49 |
}
|
| 50 |
+
expected = {task_id: metric_key_by_task[task_id] for task_id in task_ids}
|
| 51 |
|
| 52 |
summary_path = run_dir / "summary.json"
|
| 53 |
if not summary_path.exists():
|
scripts/omni/eval_qwen3_omni_future_task_probes.py
CHANGED
|
@@ -1,14 +1,17 @@
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""Evaluate Qwen3-Omni on future-target task probes from the 128-episode JSON.
|
| 3 |
|
| 4 |
-
This runner scores
|
| 5 |
-
multi-episode JSON export:
|
| 6 |
|
| 7 |
- Task 13: long-horizon next action, +100 frames.
|
| 8 |
- Task 14: long-horizon next subtask, +100 frames.
|
| 9 |
- Task 17: future object set, +100 frames.
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
It does not fabricate scores for
|
| 12 |
missing-modality targets.
|
| 13 |
"""
|
| 14 |
|
|
@@ -16,7 +19,9 @@ from __future__ import annotations
|
|
| 16 |
|
| 17 |
import argparse
|
| 18 |
import csv
|
|
|
|
| 19 |
import json
|
|
|
|
| 20 |
import time
|
| 21 |
from collections import OrderedDict
|
| 22 |
from pathlib import Path
|
|
@@ -37,6 +42,32 @@ from qwen3_omni_dataset_utils import (
|
|
| 37 |
|
| 38 |
TASK_SPECS: OrderedDict[str, dict[str, Any]] = OrderedDict(
|
| 39 |
[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
(
|
| 41 |
"long_horizon_next_action",
|
| 42 |
{
|
|
@@ -73,6 +104,18 @@ TASK_SPECS: OrderedDict[str, dict[str, Any]] = OrderedDict(
|
|
| 73 |
"option_field": None,
|
| 74 |
},
|
| 75 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
]
|
| 77 |
)
|
| 78 |
|
|
@@ -207,6 +250,22 @@ def future_index_map(samples: list[dict[str, Any]], frame_offset: int) -> dict[i
|
|
| 207 |
return mapping
|
| 208 |
|
| 209 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
def parse_json_object(text: str) -> dict[str, Any]:
|
| 211 |
raw = str(text or "").strip()
|
| 212 |
if raw.startswith("```"):
|
|
@@ -227,7 +286,25 @@ def parse_json_object(text: str) -> dict[str, Any]:
|
|
| 227 |
return payload if isinstance(payload, dict) else {}
|
| 228 |
|
| 229 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 230 |
def task_options(sample: dict[str, Any], spec: dict[str, Any]) -> list[str]:
|
|
|
|
|
|
|
| 231 |
option_field = spec.get("option_field")
|
| 232 |
options = sample.get(option_field) if option_field else None
|
| 233 |
if isinstance(options, list) and options:
|
|
@@ -247,8 +324,12 @@ def build_task_prompt(sample: dict[str, Any], future_sample: dict[str, Any], tas
|
|
| 247 |
f"Task {spec['task_number']}: {spec['label']}",
|
| 248 |
f"Episode: {sample.get('episode_id')}",
|
| 249 |
f"Current visible/audio context frames: {start}-{end}",
|
| 250 |
-
f"Predict the target at the future window starting near frame {start + future_frames} (resolved target start frame {future_start}).",
|
| 251 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 252 |
options = task_options(sample, spec)
|
| 253 |
if task_id == "long_horizon_next_action":
|
| 254 |
lines.extend(
|
|
@@ -276,6 +357,35 @@ def build_task_prompt(sample: dict[str, Any], future_sample: dict[str, Any], tas
|
|
| 276 |
"List the objects likely to be active or manipulated in that future window. Use short object names.",
|
| 277 |
]
|
| 278 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 279 |
else:
|
| 280 |
raise ValueError(f"unknown task: {task_id}")
|
| 281 |
return "\n".join(lines)
|
|
@@ -290,14 +400,30 @@ def build_messages(
|
|
| 290 |
*,
|
| 291 |
include_audio: bool = True,
|
| 292 |
) -> list[dict[str, Any]]:
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
audio_path = media.get("audio_path")
|
| 296 |
content: list[dict[str, Any]] = []
|
| 297 |
-
if
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 301 |
content.append({"type": "text", "text": build_task_prompt(sample, future_sample, task_id, spec, future_frames)})
|
| 302 |
return [
|
| 303 |
{"role": "system", "content": [{"type": "text", "text": SYSTEM_PROMPT}]},
|
|
@@ -394,10 +520,32 @@ def extract_prediction(raw: str, sample: dict[str, Any], spec: dict[str, Any]) -
|
|
| 394 |
value = payload.get(spec["prediction_key"])
|
| 395 |
if spec["family"] == "multi_label":
|
| 396 |
return normalize_objects(value)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 397 |
options = task_options(sample, spec)
|
| 398 |
return match_label(str(value or raw), options) if options else normalize_text(value)
|
| 399 |
|
| 400 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 401 |
def object_set_metrics(rows: list[dict[str, Any]]) -> dict[str, float]:
|
| 402 |
tp = fp = fn = exact = 0
|
| 403 |
for row in rows:
|
|
@@ -419,6 +567,26 @@ def object_set_metrics(rows: list[dict[str, Any]]) -> dict[str, float]:
|
|
| 419 |
}
|
| 420 |
|
| 421 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 422 |
def score_task(task_id: str, spec: dict[str, Any], rows: list[dict[str, Any]], output_dir: Path, args: argparse.Namespace) -> dict[str, Any]:
|
| 423 |
task_dir = output_dir / task_id
|
| 424 |
task_dir.mkdir(parents=True, exist_ok=True)
|
|
@@ -471,11 +639,17 @@ def score_task(task_id: str, spec: dict[str, Any], rows: list[dict[str, Any]], o
|
|
| 471 |
metrics[f"{task_id}_accuracy"] = metrics["accuracy"]
|
| 472 |
write_csv(task_dir / "per_class_metrics.csv", per_class, ["class_name", "support", "predicted", "precision", "recall", "f1"])
|
| 473 |
primary_score = metrics["macro_f1"]
|
| 474 |
-
|
| 475 |
metrics = object_set_metrics(rows)
|
| 476 |
metrics[f"{task_id}_micro_f1"] = metrics["micro_f1"]
|
| 477 |
metrics[f"{task_id}_exact_match"] = metrics["exact_match"]
|
| 478 |
primary_score = metrics["micro_f1"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 479 |
|
| 480 |
metrics.update(
|
| 481 |
{
|
|
@@ -516,6 +690,7 @@ def main() -> int:
|
|
| 516 |
selected_tasks = select_tasks(args.tasks)
|
| 517 |
samples = load_jsonl(args.dataset_jsonl)
|
| 518 |
future_map = future_index_map(samples, args.future_frames)
|
|
|
|
| 519 |
eval_indices = [idx for idx in select_eval_indices(samples, args) if idx in future_map]
|
| 520 |
if not eval_indices:
|
| 521 |
raise ValueError("No evaluation samples with future targets selected.")
|
|
@@ -554,8 +729,14 @@ def main() -> int:
|
|
| 554 |
continue
|
| 555 |
started = time.time()
|
| 556 |
raw = generate_messages(model, processor, sample, future_sample, task_id, spec, args)
|
| 557 |
-
true_value =
|
| 558 |
predicted_value = extract_prediction(raw, sample, spec)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 559 |
row = {
|
| 560 |
"prediction_id": pred_id,
|
| 561 |
"id": sample.get("id"),
|
|
@@ -571,7 +752,7 @@ def main() -> int:
|
|
| 571 |
"true_value": true_value,
|
| 572 |
"predicted_value": predicted_value,
|
| 573 |
"raw_prediction": raw,
|
| 574 |
-
"correct":
|
| 575 |
}
|
| 576 |
partial_by_task[task_id][pred_id] = row
|
| 577 |
append_jsonl(partial_path, row)
|
|
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""Evaluate Qwen3-Omni on future-target task probes from the 128-episode JSON.
|
| 3 |
|
| 4 |
+
This runner scores task targets that can be derived from the current
|
| 5 |
+
multi-episode JSON export and staged media:
|
| 6 |
|
| 7 |
- Task 13: long-horizon next action, +100 frames.
|
| 8 |
- Task 14: long-horizon next subtask, +100 frames.
|
| 9 |
- Task 17: future object set, +100 frames.
|
| 10 |
+
- Task 11: temporal order from two staged video windows.
|
| 11 |
+
- Task 12: audio-video misalignment from staged video/audio windows.
|
| 12 |
+
- Task 20: capped frames until next action transition.
|
| 13 |
|
| 14 |
+
It does not fabricate scores for retrieval, raw-caption, raw hand-pose, or
|
| 15 |
missing-modality targets.
|
| 16 |
"""
|
| 17 |
|
|
|
|
| 19 |
|
| 20 |
import argparse
|
| 21 |
import csv
|
| 22 |
+
import hashlib
|
| 23 |
import json
|
| 24 |
+
import re
|
| 25 |
import time
|
| 26 |
from collections import OrderedDict
|
| 27 |
from pathlib import Path
|
|
|
|
| 42 |
|
| 43 |
TASK_SPECS: OrderedDict[str, dict[str, Any]] = OrderedDict(
|
| 44 |
[
|
| 45 |
+
(
|
| 46 |
+
"temporal_order",
|
| 47 |
+
{
|
| 48 |
+
"task_number": 11,
|
| 49 |
+
"label": "Temporal Order Verification",
|
| 50 |
+
"family": "classification",
|
| 51 |
+
"metric_key": "temporal_order_f1",
|
| 52 |
+
"prediction_key": "temporal_order",
|
| 53 |
+
"target_field": None,
|
| 54 |
+
"option_field": None,
|
| 55 |
+
"options": ["correct", "reversed"],
|
| 56 |
+
},
|
| 57 |
+
),
|
| 58 |
+
(
|
| 59 |
+
"misalignment_detection",
|
| 60 |
+
{
|
| 61 |
+
"task_number": 12,
|
| 62 |
+
"label": "Multimodal Misalignment Detection",
|
| 63 |
+
"family": "classification",
|
| 64 |
+
"metric_key": "misalignment_detection_f1",
|
| 65 |
+
"prediction_key": "misalignment_detection",
|
| 66 |
+
"target_field": None,
|
| 67 |
+
"option_field": None,
|
| 68 |
+
"options": ["aligned", "shifted"],
|
| 69 |
+
},
|
| 70 |
+
),
|
| 71 |
(
|
| 72 |
"long_horizon_next_action",
|
| 73 |
{
|
|
|
|
| 104 |
"option_field": None,
|
| 105 |
},
|
| 106 |
),
|
| 107 |
+
(
|
| 108 |
+
"time_to_transition",
|
| 109 |
+
{
|
| 110 |
+
"task_number": 20,
|
| 111 |
+
"label": "Time to Transition",
|
| 112 |
+
"family": "regression",
|
| 113 |
+
"metric_key": "time_to_transition_mae",
|
| 114 |
+
"prediction_key": "time_to_transition_frames",
|
| 115 |
+
"target_field": None,
|
| 116 |
+
"option_field": None,
|
| 117 |
+
},
|
| 118 |
+
),
|
| 119 |
]
|
| 120 |
)
|
| 121 |
|
|
|
|
| 250 |
return mapping
|
| 251 |
|
| 252 |
|
| 253 |
+
def time_to_transition_map(samples: list[dict[str, Any]], cap_frames: int = 200) -> dict[int, int]:
|
| 254 |
+
mapping: dict[int, int] = {}
|
| 255 |
+
for indices in by_episode_sorted(samples).values():
|
| 256 |
+
actions = [normalize_text(answer(samples[idx]).get("action")) for idx in indices]
|
| 257 |
+
starts = [row_start(samples[idx]) for idx in indices]
|
| 258 |
+
for pos, idx in enumerate(indices):
|
| 259 |
+
current_action = actions[pos]
|
| 260 |
+
target = cap_frames
|
| 261 |
+
for next_pos in range(pos + 1, len(indices)):
|
| 262 |
+
if actions[next_pos] and actions[next_pos] != current_action:
|
| 263 |
+
target = min(cap_frames, max(0, starts[next_pos] - starts[pos]))
|
| 264 |
+
break
|
| 265 |
+
mapping[idx] = target
|
| 266 |
+
return mapping
|
| 267 |
+
|
| 268 |
+
|
| 269 |
def parse_json_object(text: str) -> dict[str, Any]:
|
| 270 |
raw = str(text or "").strip()
|
| 271 |
if raw.startswith("```"):
|
|
|
|
| 286 |
return payload if isinstance(payload, dict) else {}
|
| 287 |
|
| 288 |
|
| 289 |
+
def stable_variant(task_id: str, sample: dict[str, Any]) -> bool:
|
| 290 |
+
key = f"{task_id}::{sample.get('id')}"
|
| 291 |
+
digest = hashlib.sha1(key.encode("utf-8")).hexdigest()
|
| 292 |
+
return int(digest[:2], 16) % 2 == 0
|
| 293 |
+
|
| 294 |
+
|
| 295 |
+
def media_video_path(sample: dict[str, Any]) -> str | None:
|
| 296 |
+
media = sample.get("media") if isinstance(sample.get("media"), dict) else {}
|
| 297 |
+
return media.get("mosaic_video_path") or sample.get("primary_video_path")
|
| 298 |
+
|
| 299 |
+
|
| 300 |
+
def media_audio_path(sample: dict[str, Any]) -> str | None:
|
| 301 |
+
media = sample.get("media") if isinstance(sample.get("media"), dict) else {}
|
| 302 |
+
return media.get("audio_path")
|
| 303 |
+
|
| 304 |
+
|
| 305 |
def task_options(sample: dict[str, Any], spec: dict[str, Any]) -> list[str]:
|
| 306 |
+
if isinstance(spec.get("options"), list):
|
| 307 |
+
return [str(item) for item in spec["options"]]
|
| 308 |
option_field = spec.get("option_field")
|
| 309 |
options = sample.get(option_field) if option_field else None
|
| 310 |
if isinstance(options, list) and options:
|
|
|
|
| 324 |
f"Task {spec['task_number']}: {spec['label']}",
|
| 325 |
f"Episode: {sample.get('episode_id')}",
|
| 326 |
f"Current visible/audio context frames: {start}-{end}",
|
|
|
|
| 327 |
]
|
| 328 |
+
if task_id in {"long_horizon_next_action", "next_subtask_forecast", "object_set_forecast"}:
|
| 329 |
+
lines.append(
|
| 330 |
+
f"Predict the target at the future window starting near frame {start + future_frames} "
|
| 331 |
+
f"(resolved target start frame {future_start})."
|
| 332 |
+
)
|
| 333 |
options = task_options(sample, spec)
|
| 334 |
if task_id == "long_horizon_next_action":
|
| 335 |
lines.extend(
|
|
|
|
| 357 |
"List the objects likely to be active or manipulated in that future window. Use short object names.",
|
| 358 |
]
|
| 359 |
)
|
| 360 |
+
elif task_id == "temporal_order":
|
| 361 |
+
lines.extend(
|
| 362 |
+
[
|
| 363 |
+
"You will receive two video clips named Clip A and Clip B.",
|
| 364 |
+
"Return JSON only with this schema:",
|
| 365 |
+
f'{{"{prediction_key}":"<correct or reversed>"}}',
|
| 366 |
+
"Answer correct if Clip A happens before Clip B in the same episode.",
|
| 367 |
+
"Answer reversed if Clip A happens after Clip B in the same episode.",
|
| 368 |
+
]
|
| 369 |
+
)
|
| 370 |
+
elif task_id == "misalignment_detection":
|
| 371 |
+
lines.extend(
|
| 372 |
+
[
|
| 373 |
+
"You will receive one video clip and one audio clip.",
|
| 374 |
+
"Return JSON only with this schema:",
|
| 375 |
+
f'{{"{prediction_key}":"<aligned or shifted>"}}',
|
| 376 |
+
"Answer aligned if the audio belongs to the same time window as the video.",
|
| 377 |
+
"Answer shifted if the audio comes from a later shifted window in the same episode.",
|
| 378 |
+
]
|
| 379 |
+
)
|
| 380 |
+
elif task_id == "time_to_transition":
|
| 381 |
+
lines.extend(
|
| 382 |
+
[
|
| 383 |
+
"Estimate how many frames remain until the next action-label boundary.",
|
| 384 |
+
"The answer is capped at 200 frames.",
|
| 385 |
+
"Return JSON only with this schema:",
|
| 386 |
+
f'{{"{prediction_key}":<integer from 0 to 200>}}',
|
| 387 |
+
]
|
| 388 |
+
)
|
| 389 |
else:
|
| 390 |
raise ValueError(f"unknown task: {task_id}")
|
| 391 |
return "\n".join(lines)
|
|
|
|
| 400 |
*,
|
| 401 |
include_audio: bool = True,
|
| 402 |
) -> list[dict[str, Any]]:
|
| 403 |
+
video_path = media_video_path(sample)
|
| 404 |
+
audio_path = media_audio_path(sample)
|
|
|
|
| 405 |
content: list[dict[str, Any]] = []
|
| 406 |
+
if task_id == "temporal_order":
|
| 407 |
+
future_video_path = media_video_path(future_sample)
|
| 408 |
+
if stable_variant(task_id, sample):
|
| 409 |
+
first_video, second_video = video_path, future_video_path
|
| 410 |
+
else:
|
| 411 |
+
first_video, second_video = future_video_path, video_path
|
| 412 |
+
if first_video:
|
| 413 |
+
content.append({"type": "video", "video": first_video})
|
| 414 |
+
if second_video:
|
| 415 |
+
content.append({"type": "video", "video": second_video})
|
| 416 |
+
elif task_id == "misalignment_detection":
|
| 417 |
+
paired_audio_path = audio_path if stable_variant(task_id, sample) else media_audio_path(future_sample)
|
| 418 |
+
if video_path:
|
| 419 |
+
content.append({"type": "video", "video": video_path})
|
| 420 |
+
if include_audio and paired_audio_path:
|
| 421 |
+
content.append({"type": "audio", "audio": paired_audio_path})
|
| 422 |
+
else:
|
| 423 |
+
if video_path:
|
| 424 |
+
content.append({"type": "video", "video": video_path})
|
| 425 |
+
if include_audio and audio_path:
|
| 426 |
+
content.append({"type": "audio", "audio": audio_path})
|
| 427 |
content.append({"type": "text", "text": build_task_prompt(sample, future_sample, task_id, spec, future_frames)})
|
| 428 |
return [
|
| 429 |
{"role": "system", "content": [{"type": "text", "text": SYSTEM_PROMPT}]},
|
|
|
|
| 520 |
value = payload.get(spec["prediction_key"])
|
| 521 |
if spec["family"] == "multi_label":
|
| 522 |
return normalize_objects(value)
|
| 523 |
+
if spec["family"] == "regression":
|
| 524 |
+
match = re.search(r"-?\d+(?:\.\d+)?", str(value if value is not None else raw))
|
| 525 |
+
if not match:
|
| 526 |
+
return None
|
| 527 |
+
return max(0.0, min(200.0, float(match.group(0))))
|
| 528 |
options = task_options(sample, spec)
|
| 529 |
return match_label(str(value or raw), options) if options else normalize_text(value)
|
| 530 |
|
| 531 |
|
| 532 |
+
def task_target_value(
|
| 533 |
+
task_id: str,
|
| 534 |
+
sample: dict[str, Any],
|
| 535 |
+
future_sample: dict[str, Any],
|
| 536 |
+
spec: dict[str, Any],
|
| 537 |
+
transition_targets: dict[int, int],
|
| 538 |
+
sample_idx: int,
|
| 539 |
+
) -> Any:
|
| 540 |
+
if task_id == "temporal_order":
|
| 541 |
+
return "correct" if stable_variant(task_id, sample) else "reversed"
|
| 542 |
+
if task_id == "misalignment_detection":
|
| 543 |
+
return "aligned" if stable_variant(task_id, sample) else "shifted"
|
| 544 |
+
if task_id == "time_to_transition":
|
| 545 |
+
return float(transition_targets[sample_idx])
|
| 546 |
+
return task_target(future_sample, spec)
|
| 547 |
+
|
| 548 |
+
|
| 549 |
def object_set_metrics(rows: list[dict[str, Any]]) -> dict[str, float]:
|
| 550 |
tp = fp = fn = exact = 0
|
| 551 |
for row in rows:
|
|
|
|
| 567 |
}
|
| 568 |
|
| 569 |
|
| 570 |
+
def regression_metrics(rows: list[dict[str, Any]]) -> dict[str, float]:
|
| 571 |
+
errors = []
|
| 572 |
+
within_20 = 0
|
| 573 |
+
for row in rows:
|
| 574 |
+
true_value = float(row.get("true_value") or 0.0)
|
| 575 |
+
pred_value = row.get("predicted_value")
|
| 576 |
+
if pred_value is None:
|
| 577 |
+
pred_value = 200.0
|
| 578 |
+
err = abs(float(pred_value) - true_value)
|
| 579 |
+
errors.append(err)
|
| 580 |
+
within_20 += int(err <= 20.0)
|
| 581 |
+
mae = float(np.mean(errors)) if errors else 0.0
|
| 582 |
+
return {
|
| 583 |
+
"num_samples": len(rows),
|
| 584 |
+
"mae": mae,
|
| 585 |
+
"time_to_transition_mae": mae,
|
| 586 |
+
"within_20_frames": within_20 / len(rows) if rows else 0.0,
|
| 587 |
+
}
|
| 588 |
+
|
| 589 |
+
|
| 590 |
def score_task(task_id: str, spec: dict[str, Any], rows: list[dict[str, Any]], output_dir: Path, args: argparse.Namespace) -> dict[str, Any]:
|
| 591 |
task_dir = output_dir / task_id
|
| 592 |
task_dir.mkdir(parents=True, exist_ok=True)
|
|
|
|
| 639 |
metrics[f"{task_id}_accuracy"] = metrics["accuracy"]
|
| 640 |
write_csv(task_dir / "per_class_metrics.csv", per_class, ["class_name", "support", "predicted", "precision", "recall", "f1"])
|
| 641 |
primary_score = metrics["macro_f1"]
|
| 642 |
+
elif spec["family"] == "multi_label":
|
| 643 |
metrics = object_set_metrics(rows)
|
| 644 |
metrics[f"{task_id}_micro_f1"] = metrics["micro_f1"]
|
| 645 |
metrics[f"{task_id}_exact_match"] = metrics["exact_match"]
|
| 646 |
primary_score = metrics["micro_f1"]
|
| 647 |
+
elif spec["family"] == "regression":
|
| 648 |
+
metrics = regression_metrics(rows)
|
| 649 |
+
primary_score = metrics["mae"]
|
| 650 |
+
else:
|
| 651 |
+
raise ValueError(f"unsupported task family: {spec['family']}")
|
| 652 |
+
metrics[spec["metric_key"]] = primary_score
|
| 653 |
|
| 654 |
metrics.update(
|
| 655 |
{
|
|
|
|
| 690 |
selected_tasks = select_tasks(args.tasks)
|
| 691 |
samples = load_jsonl(args.dataset_jsonl)
|
| 692 |
future_map = future_index_map(samples, args.future_frames)
|
| 693 |
+
transition_targets = time_to_transition_map(samples)
|
| 694 |
eval_indices = [idx for idx in select_eval_indices(samples, args) if idx in future_map]
|
| 695 |
if not eval_indices:
|
| 696 |
raise ValueError("No evaluation samples with future targets selected.")
|
|
|
|
| 729 |
continue
|
| 730 |
started = time.time()
|
| 731 |
raw = generate_messages(model, processor, sample, future_sample, task_id, spec, args)
|
| 732 |
+
true_value = task_target_value(task_id, sample, future_sample, spec, transition_targets, sample_idx)
|
| 733 |
predicted_value = extract_prediction(raw, sample, spec)
|
| 734 |
+
if spec["family"] == "classification":
|
| 735 |
+
correct = int(true_value == predicted_value)
|
| 736 |
+
elif spec["family"] == "multi_label":
|
| 737 |
+
correct = int(set(true_value) == set(predicted_value))
|
| 738 |
+
else:
|
| 739 |
+
correct = int(predicted_value is not None and abs(float(true_value) - float(predicted_value)) <= 20.0)
|
| 740 |
row = {
|
| 741 |
"prediction_id": pred_id,
|
| 742 |
"id": sample.get("id"),
|
|
|
|
| 752 |
"true_value": true_value,
|
| 753 |
"predicted_value": predicted_value,
|
| 754 |
"raw_prediction": raw,
|
| 755 |
+
"correct": correct,
|
| 756 |
}
|
| 757 |
partial_by_task[task_id][pred_id] = row
|
| 758 |
append_jsonl(partial_path, row)
|