cy0307 commited on 10 days ago

Commit

ef97957

verified ·

1 Parent(s): 4ad6b11

Add files using upload-large-folder tool

Browse files

Files changed (28) hide show

TASK_METHOD_20_GAP_AUDIT.md +5 -8
TASK_METHOD_20_RESULT_MATRIX.md +4 -4
assets/charts/episode128_task_model_radar.svg +6 -3
assets/charts/unified_task_model_radar.svg +5 -2
data/episode128_task_model_radar.json +60 -60
data/publication_audit.json +3 -3
data/single_episode_task_model_radar.json +1 -1
data/task_method_20_gap_audit.json +13 -55
data/task_method_20_result_matrix.json +36 -36
data/task_surface_integrity.json +1 -1
data/unified_task_model_radar.json +70 -70
data/website_integrity.json +7 -7
docs/data/episode128_task_model_radar.json +60 -60
docs/data/publication_audit.json +3 -3
docs/data/single_episode_task_model_radar.json +1 -1
docs/data/task_method_20_gap_audit.json +13 -55
docs/data/task_method_20_result_matrix.json +36 -36
docs/data/task_surface_integrity.json +1 -1
docs/data/unified_task_model_radar.json +70 -70
docs/data/website_integrity.json +7 -7
metrics/episode128_task_model_radar.json +60 -60
metrics/publication_audit.json +3 -3
metrics/single_episode_task_model_radar.json +1 -1
metrics/task_method_20_gap_audit.json +13 -55
metrics/task_method_20_result_matrix.json +36 -36
metrics/task_surface_integrity.json +1 -1
metrics/unified_task_model_radar.json +70 -70
metrics/website_integrity.json +7 -7

TASK_METHOD_20_GAP_AUDIT.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Task Method 20-Result Gap Audit
-Generated: `2026-06-17T13:55:12+00:00`
 This audit is the explicit gap ledger for the 9-method x 20-task result matrix.
 It keeps missing cells visible while preserving the rule that a numeric score
@@ -9,8 +9,8 @@ requires a real task target and source artifact.
 ## Score Summary
 - Method-task records: `180`
-- Numeric scored records: `116`
-- Scoreless records: `64`
 - Proxy-scored records: `4`
 - Source matrix: [`docs/data/task_method_20_result_matrix.json`](docs/data/task_method_20_result_matrix.json)
@@ -24,7 +24,7 @@ requires a real task target and source artifact.
 | 128ep Metadata NN | metadata128_neural_mlp | 6/20 | 14 | 0 | not_supported_by_metadata_only_package: 14, scored: 6 |
 | 128ep Raw Simple | raw128_simple | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
 | 128ep Raw NN | raw128_neural_mlp | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
-| Qwen3-Omni v6 LoRA | qwen3_omni_v6_lora | 10/20 | 10 | 0 | not_evaluated_in_verified_package: 10, scored: 10 |
 | Cosmos3-Super Reasoner | cosmos3_super_reasoner | 7/20 | 13 | 0 | not_evaluated_in_verified_package: 13, scored: 7 |
 | Cosmos3-Nano Future Window | cosmos3_nano_future_window | 5/20 | 15 | 0 | not_evaluated_in_verified_package: 15, scored: 5 |
@@ -32,7 +32,7 @@ requires a real task target and source artifact.
 | Status | Count | Next step |
 | --- | --- | --- |
-| not_evaluated_in_verified_package | 38 | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | not_supported_by_metadata_only_package | 22 | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | unsupported_without_required_target | 4 | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
@@ -61,12 +61,10 @@ requires a real task target and source artifact.
 | 10 | Cross-Modal Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 10 | Cross-Modal Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 11 | Temporal Order Verification | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
-| 11 | Temporal Order Verification | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 11 | Temporal Order Verification | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 11 | Temporal Order Verification | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 12 | Multimodal Synchronization Detection | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
 | 12 | Multimodal Synchronization Detection | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
-| 12 | Multimodal Synchronization Detection | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 12 | Multimodal Synchronization Detection | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 12 | Multimodal Synchronization Detection | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 13 | Long-Horizon Next-Action Forecasting | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
@@ -101,7 +99,6 @@ requires a real task target and source artifact.
 | 19 | Camera-View Synchronization Retrieval | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 20 | Time-to-Next-Transition Regression | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | 20 | Time-to-Next-Transition Regression | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
-| 20 | Time-to-Next-Transition Regression | Qwen3-Omni v6 LoRA | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 20 | Time-to-Next-Transition Regression | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 20 | Time-to-Next-Transition Regression | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |

 # Task Method 20-Result Gap Audit
+Generated: `2026-06-17T21:17:51+00:00`
 This audit is the explicit gap ledger for the 9-method x 20-task result matrix.
 It keeps missing cells visible while preserving the rule that a numeric score
 ## Score Summary
 - Method-task records: `180`
+- Numeric scored records: `119`
+- Scoreless records: `61`
 - Proxy-scored records: `4`
 - Source matrix: [`docs/data/task_method_20_result_matrix.json`](docs/data/task_method_20_result_matrix.json)
 | 128ep Metadata NN | metadata128_neural_mlp | 6/20 | 14 | 0 | not_supported_by_metadata_only_package: 14, scored: 6 |
 | 128ep Raw Simple | raw128_simple | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
 | 128ep Raw NN | raw128_neural_mlp | 20/20 | 0 | 2 | proxy_scored: 2, scored: 18 |
+| Qwen3-Omni v6 LoRA | qwen3_omni_v6_lora | 13/20 | 7 | 0 | not_evaluated_in_verified_package: 7, scored: 13 |
 | Cosmos3-Super Reasoner | cosmos3_super_reasoner | 7/20 | 13 | 0 | not_evaluated_in_verified_package: 13, scored: 7 |
 | Cosmos3-Nano Future Window | cosmos3_nano_future_window | 5/20 | 15 | 0 | not_evaluated_in_verified_package: 15, scored: 5 |
 | Status | Count | Next step |
 | --- | --- | --- |
+| not_evaluated_in_verified_package | 35 | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | not_supported_by_metadata_only_package | 22 | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | unsupported_without_required_target | 4 | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
 | 10 | Cross-Modal Reconstruction | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 10 | Cross-Modal Reconstruction | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 11 | Temporal Order Verification | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | 11 | Temporal Order Verification | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 11 | Temporal Order Verification | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 12 | Multimodal Synchronization Detection | 128ep Metadata Simple | unsupported | Export the missing target field for this 128-episode method, then rerun the same train/validation/test split. |
 | 12 | Multimodal Synchronization Detection | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | 12 | Multimodal Synchronization Detection | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 12 | Multimodal Synchronization Detection | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 13 | Long-Horizon Next-Action Forecasting | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | 19 | Camera-View Synchronization Retrieval | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 20 | Time-to-Next-Transition Regression | 128ep Metadata Simple | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | 20 | Time-to-Next-Transition Regression | 128ep Metadata NN | not supported | Run the task with raw sensor-feature blocks or add a task-specific metadata target builder before assigning a numeric score. |
 | 20 | Time-to-Next-Transition Regression | Cosmos3-Super Reasoner | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |
 | 20 | Time-to-Next-Transition Regression | Cosmos3-Nano Future Window | not evaluated | Generate verified model outputs for this task contract and score them against the held-out labels. |

TASK_METHOD_20_RESULT_MATRIX.md CHANGED Viewed

@@ -12,7 +12,7 @@ Legend: `score` = numeric task score, `proxy` = documented raw128 compact proxy
 | 128ep Metadata NN | 20 | 6 | 0 | 14 | not supported 14, scored 6 |
 | 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
 | 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
-| Qwen3-Omni v6 LoRA | 20 | 10 | 0 | 10 | not evaluated 10, scored 10 |
 | Cosmos3-Super Reasoner | 20 | 7 | 0 | 13 | not evaluated 13, scored 7 |
 | Cosmos3-Nano Future Window | 20 | 5 | 0 | 15 | not evaluated 15, scored 5 |
@@ -28,8 +28,8 @@ Legend: `score` = numeric task score, `proxy` = documented raw128 compact proxy
 | 08 | Language Grounding | score | score | score | not supported | score | score | not evaluated | not evaluated | not evaluated |
 | 09 | Cross-Modal Retrieval | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | score |
 | 10 | Cross-Modal Reconstruction | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
-| 11 | Temporal Order Verification | score | score | score | not supported | score | score | not evaluated | not evaluated | not evaluated |
-| 12 | Multimodal Synchronization Detection | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
 | 13 | Long-Horizon Next-Action Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
 | 14 | Long-Horizon Next-Subtask Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
 | 15 | Interaction Text Prediction | score | score | not supported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
@@ -37,6 +37,6 @@ Legend: `score` = numeric task score, `proxy` = documented raw128 compact proxy
 | 17 | Future Object-Set Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
 | 18 | IMU-to-Hand Pose Reconstruction | score | score | not supported | not supported | score | score | not evaluated | not evaluated | not evaluated |
 | 19 | Camera-View Synchronization Retrieval | score | score | not supported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
-| 20 | Time-to-Next-Transition Regression | score | score | not supported | not supported | score | score | not evaluated | not evaluated | not evaluated |
 Sources and raw values are in `docs/data/task_method_20_result_matrix.json` and `docs/data/unified_task_model_radar.json`.

 | 128ep Metadata NN | 20 | 6 | 0 | 14 | not supported 14, scored 6 |
 | 128ep Raw Simple | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
 | 128ep Raw NN | 20 | 20 | 2 | 0 | proxy scored 2, scored 18 |
+| Qwen3-Omni v6 LoRA | 20 | 13 | 0 | 7 | not evaluated 7, scored 13 |
 | Cosmos3-Super Reasoner | 20 | 7 | 0 | 13 | not evaluated 13, scored 7 |
 | Cosmos3-Nano Future Window | 20 | 5 | 0 | 15 | not evaluated 15, scored 5 |
 | 08 | Language Grounding | score | score | score | not supported | score | score | not evaluated | not evaluated | not evaluated |
 | 09 | Cross-Modal Retrieval | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | score |
 | 10 | Cross-Modal Reconstruction | score | score | unsupported | not supported | score | score | not evaluated | not evaluated | not evaluated |
+| 11 | Temporal Order Verification | score | score | score | not supported | score | score | score | not evaluated | not evaluated |
+| 12 | Multimodal Synchronization Detection | score | score | unsupported | not supported | score | score | score | not evaluated | not evaluated |
 | 13 | Long-Horizon Next-Action Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
 | 14 | Long-Horizon Next-Subtask Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
 | 15 | Interaction Text Prediction | score | score | not supported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
 | 17 | Future Object-Set Forecasting | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
 | 18 | IMU-to-Hand Pose Reconstruction | score | score | not supported | not supported | score | score | not evaluated | not evaluated | not evaluated |
 | 19 | Camera-View Synchronization Retrieval | score | score | not supported | not supported | proxy | proxy | not evaluated | not evaluated | not evaluated |
+| 20 | Time-to-Next-Transition Regression | score | score | not supported | not supported | score | score | score | not evaluated | not evaluated |
 Sources and raw values are in `docs/data/task_method_20_result_matrix.json` and `docs/data/unified_task_model_radar.json`.

assets/charts/episode128_task_model_radar.svg CHANGED Viewed

assets/charts/unified_task_model_radar.svg CHANGED Viewed

data/episode128_task_model_radar.json CHANGED Viewed

@@ -1,12 +1,12 @@
 {
   "title": "128-Episode 20-Task Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
   "task_count": 20,
   "method_count": 7,
   "method_task_record_count": 140,
-  "scored_method_task_count": 76,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -127,17 +127,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1157,15 +1157,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -1248,15 +1248,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -1976,15 +1976,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "mae",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -3350,17 +3350,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -3476,17 +3476,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -4484,17 +4484,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "128-Episode 20-Task Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
   "task_count": 20,
   "method_count": 7,
   "method_task_record_count": 140,
+  "scored_method_task_count": 79,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 0.40984631701404173,
+          "metric_key": "temporal_order_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.40984631701404173,
+          "raw_text": "0.4098",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 0.3344936184319576,
+          "metric_key": "misalignment_detection_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.3344936184319576,
+          "raw_text": "0.3345",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 134.0687422166874,
+          "metric_key": "time_to_transition_mae",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.07859666766782253,
+          "raw_text": "134.07",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

data/publication_audit.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T21:12:50+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
@@ -206,8 +206,8 @@
     "github_repo": {
       "root": "repo",
       "exists": true,
-      "file_count": 1232,
-      "text_file_count": 1034,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 55702978

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:35+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
     "github_repo": {
       "root": "repo",
       "exists": true,
+      "file_count": 1250,
+      "text_file_count": 1052,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 55702978

data/single_episode_task_model_radar.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "title": "Single-Episode 20-Task Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
   "task_count": 20,
   "method_count": 2,

 {
   "title": "Single-Episode 20-Task Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
   "task_count": 20,
   "method_count": 2,

data/task_method_20_gap_audit.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
-  "generated_at_utc": "2026-06-17T13:55:12+00:00",
   "immediate_actions": [
     {
       "artifact": "docs/data/task_method_20_gap_audit.json",
       "id": "gap_audit",
-      "purpose": "Keep the 64 scoreless cells visible and reproducible."
     },
     {
       "artifact": "scripts/omni/score_model_output_probes.py",
@@ -101,11 +101,11 @@
       "proxy_scored_task_count": 0,
       "result_record_count": 20,
       "scope": "128 selected episodes, held-out test",
-      "scored_task_count": 10,
-      "scoreless_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       }
     },
     "raw128_neural_mlp": {
@@ -140,10 +140,10 @@
     "cosmos3_super_reasoner": 13,
     "metadata128_neural_mlp": 14,
     "metadata128_simple": 12,
-    "qwen3_omni_v6_lora": 10
   },
   "missing_by_status": {
-    "not_evaluated_in_verified_package": 38,
     "not_supported_by_metadata_only_package": 22,
     "unsupported_without_required_target": 4
   },
@@ -183,15 +183,13 @@
     "11 Temporal Order Verification": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
-      "metadata128_neural_mlp",
-      "qwen3_omni_v6_lora"
     ],
     "12 Multimodal Synchronization Detection": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
-      "metadata128_simple",
-      "qwen3_omni_v6_lora"
     ],
     "13 Long-Horizon Next-Action Forecasting": [
       "cosmos3_nano_future_window",
@@ -241,8 +239,7 @@
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
-      "metadata128_simple",
-      "qwen3_omni_v6_lora"
     ]
   },
   "missing_records": [
@@ -519,19 +516,6 @@
       "task_label": "Temporal Order Verification",
       "task_number": 11
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "f1",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "temporal_order",
-      "task_label": "Temporal Order Verification",
-      "task_number": 11
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
@@ -584,19 +568,6 @@
       "task_label": "Multimodal Synchronization Detection",
       "task_number": 12
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "f1",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "misalignment_detection",
-      "task_label": "Multimodal Synchronization Detection",
-      "task_number": 12
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
@@ -1039,19 +1010,6 @@
       "task_label": "Time-to-Next-Transition Regression",
       "task_number": 20
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "mae",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "time_to_transition",
-      "task_label": "Time-to-Next-Transition Regression",
-      "task_number": 20
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "mae",
@@ -1125,8 +1083,8 @@
     "method_count": 9,
     "method_task_record_count": 180,
     "proxy_scored_method_task_count": 4,
-    "scored_method_task_count": 116,
-    "scoreless_method_task_count": 64,
     "task_count": 20
   },
   "source_matrix": "docs/data/task_method_20_result_matrix.json",

 {
+  "generated_at_utc": "2026-06-17T21:17:51+00:00",
   "immediate_actions": [
     {
       "artifact": "docs/data/task_method_20_gap_audit.json",
       "id": "gap_audit",
+      "purpose": "Keep the 61 scoreless cells visible and reproducible."
     },
     {
       "artifact": "scripts/omni/score_model_output_probes.py",
       "proxy_scored_task_count": 0,
       "result_record_count": 20,
       "scope": "128 selected episodes, held-out test",
+      "scored_task_count": 13,
+      "scoreless_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       }
     },
     "raw128_neural_mlp": {
     "cosmos3_super_reasoner": 13,
     "metadata128_neural_mlp": 14,
     "metadata128_simple": 12,
+    "qwen3_omni_v6_lora": 7
   },
   "missing_by_status": {
+    "not_evaluated_in_verified_package": 35,
     "not_supported_by_metadata_only_package": 22,
     "unsupported_without_required_target": 4
   },
     "11 Temporal Order Verification": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
+      "metadata128_neural_mlp"
     ],
     "12 Multimodal Synchronization Detection": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
+      "metadata128_simple"
     ],
     "13 Long-Horizon Next-Action Forecasting": [
       "cosmos3_nano_future_window",
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
+      "metadata128_simple"
     ]
   },
   "missing_records": [
       "task_label": "Temporal Order Verification",
       "task_number": 11
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
       "task_label": "Multimodal Synchronization Detection",
       "task_number": 12
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
       "task_label": "Time-to-Next-Transition Regression",
       "task_number": 20
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "mae",
     "method_count": 9,
     "method_task_record_count": 180,
     "proxy_scored_method_task_count": 4,
+    "scored_method_task_count": 119,
+    "scoreless_method_task_count": 61,
     "task_count": 20
   },
   "source_matrix": "docs/data/task_method_20_result_matrix.json",

data/task_method_20_result_matrix.json CHANGED Viewed

@@ -1,11 +1,11 @@
 {
   "title": "Task Method 20-Result Matrix",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
-  "scored_method_task_count": 116,
   "series": [
     {
       "id": "minimal",
@@ -161,17 +161,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1958,17 +1958,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -2120,17 +2120,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -3416,17 +3416,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "Task Method 20-Result Matrix",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
+  "scored_method_task_count": 119,
   "series": [
     {
       "id": "minimal",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

data/task_surface_integrity.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T20:46:02+00:00",
   "summary": {
     "task_count": 12,
     "expected_task_count": 12,

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:26+00:00",
   "summary": {
     "task_count": 12,
     "expected_task_count": 12,

data/unified_task_model_radar.json CHANGED Viewed

@@ -1,11 +1,11 @@
 {
   "title": "Unified 20-Task Model Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
-  "scored_method_task_count": 116,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -170,17 +170,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1375,6 +1375,17 @@
           "raw_text": "0.8520",
           "status_label": "scored"
         },
         "metadata128_simple": {
           "raw": 0.4198864140782312,
           "metric_key": "f1",
@@ -1419,17 +1430,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
@@ -1486,6 +1486,17 @@
           "raw_text": "0.7153",
           "status_label": "scored"
         },
         "metadata128_simple": {
           "raw": null,
           "metric_key": "f1",
@@ -1530,17 +1541,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
@@ -2374,6 +2374,17 @@
           "raw_text": "10.55",
           "status_label": "scored"
         },
         "raw128_simple": {
           "raw": 52.32759475708008,
           "metric_key": "mae",
@@ -2418,17 +2429,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "mae",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "mae",
@@ -2492,7 +2492,7 @@
       "title": "Qwen3-Omni v6 LoRA",
       "status": "verified",
       "task_aligned_axes": "Qwen3",
-      "coverage": "20 records / 10 scored task-aligned axes",
       "headline": "JSON validity 0.9990; action macro-F1 0.0029",
       "source": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora_eval_test_full/eval/metrics.json"
     },
@@ -4256,17 +4256,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -4418,17 +4418,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -5714,17 +5714,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "Unified 20-Task Model Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
+  "scored_method_task_count": 119,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
           "raw_text": "0.8520",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 0.40984631701404173,
+          "metric_key": "temporal_order_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.40984631701404173,
+          "raw_text": "0.4098",
+          "status_label": "scored"
+        },
         "metadata128_simple": {
           "raw": 0.4198864140782312,
           "metric_key": "f1",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "0.7153",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 0.3344936184319576,
+          "metric_key": "misalignment_detection_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.3344936184319576,
+          "raw_text": "0.3345",
+          "status_label": "scored"
+        },
         "metadata128_simple": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "10.55",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 134.0687422166874,
+          "metric_key": "time_to_transition_mae",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.07859666766782253,
+          "raw_text": "134.07",
+          "status_label": "scored"
+        },
         "raw128_simple": {
           "raw": 52.32759475708008,
           "metric_key": "mae",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "mae",
       "title": "Qwen3-Omni v6 LoRA",
       "status": "verified",
       "task_aligned_axes": "Qwen3",
+      "coverage": "20 records / 13 scored task-aligned axes",
       "headline": "JSON validity 0.9990; action macro-F1 0.0029",
       "source": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora_eval_test_full/eval/metrics.json"
     },
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

data/website_integrity.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T21:12:34+00:00",
   "docs_root": "docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
@@ -316,7 +316,7 @@
     },
     {
       "path": "data/episode128_task_model_radar.json",
-      "bytes": 187388,
       "top_level_type": "dict"
     },
     {
@@ -486,12 +486,12 @@
     },
     {
       "path": "data/task_method_20_gap_audit.json",
-      "bytes": 55745,
       "top_level_type": "dict"
     },
     {
       "path": "data/task_method_20_result_matrix.json",
-      "bytes": 129749,
       "top_level_type": "dict"
     },
     {
@@ -526,7 +526,7 @@
     },
     {
       "path": "data/unified_task_model_radar.json",
-      "bytes": 231240,
       "top_level_type": "dict"
     },
     {
@@ -566,7 +566,7 @@
     {
       "path": "assets/charts/episode128_task_model_radar.svg",
       "exists": true,
-      "bytes": 44044,
       "format": "SVG",
       "has_viewbox": true
     },
@@ -636,7 +636,7 @@
     {
       "path": "assets/charts/unified_task_model_radar.svg",
       "exists": true,
-      "bytes": 50060,
       "format": "SVG",
       "has_viewbox": true
     },

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:27+00:00",
   "docs_root": "docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
     },
     {
       "path": "data/episode128_task_model_radar.json",
+      "bytes": 187309,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/task_method_20_gap_audit.json",
+      "bytes": 53574,
       "top_level_type": "dict"
     },
     {
       "path": "data/task_method_20_result_matrix.json",
+      "bytes": 129707,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/unified_task_model_radar.json",
+      "bytes": 231161,
       "top_level_type": "dict"
     },
     {
     {
       "path": "assets/charts/episode128_task_model_radar.svg",
       "exists": true,
+      "bytes": 44378,
       "format": "SVG",
       "has_viewbox": true
     },
     {
       "path": "assets/charts/unified_task_model_radar.svg",
       "exists": true,
+      "bytes": 50394,
       "format": "SVG",
       "has_viewbox": true
     },

docs/data/episode128_task_model_radar.json CHANGED Viewed

@@ -1,12 +1,12 @@
 {
   "title": "128-Episode 20-Task Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
   "task_count": 20,
   "method_count": 7,
   "method_task_record_count": 140,
-  "scored_method_task_count": 76,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -127,17 +127,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1157,15 +1157,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -1248,15 +1248,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -1976,15 +1976,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "mae",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -3350,17 +3350,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -3476,17 +3476,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -4484,17 +4484,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "128-Episode 20-Task Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
   "task_count": 20,
   "method_count": 7,
   "method_task_record_count": 140,
+  "scored_method_task_count": 79,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 0.40984631701404173,
+          "metric_key": "temporal_order_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.40984631701404173,
+          "raw_text": "0.4098",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 0.3344936184319576,
+          "metric_key": "misalignment_detection_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.3344936184319576,
+          "raw_text": "0.3345",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 134.0687422166874,
+          "metric_key": "time_to_transition_mae",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.07859666766782253,
+          "raw_text": "134.07",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

docs/data/publication_audit.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T21:12:50+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
@@ -206,8 +206,8 @@
     "github_repo": {
       "root": "repo",
       "exists": true,
-      "file_count": 1232,
-      "text_file_count": 1034,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 55702978

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:35+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
     "github_repo": {
       "root": "repo",
       "exists": true,
+      "file_count": 1250,
+      "text_file_count": 1052,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 55702978

docs/data/single_episode_task_model_radar.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "title": "Single-Episode 20-Task Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
   "task_count": 20,
   "method_count": 2,

 {
   "title": "Single-Episode 20-Task Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
   "task_count": 20,
   "method_count": 2,

docs/data/task_method_20_gap_audit.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
-  "generated_at_utc": "2026-06-17T13:55:12+00:00",
   "immediate_actions": [
     {
       "artifact": "docs/data/task_method_20_gap_audit.json",
       "id": "gap_audit",
-      "purpose": "Keep the 64 scoreless cells visible and reproducible."
     },
     {
       "artifact": "scripts/omni/score_model_output_probes.py",
@@ -101,11 +101,11 @@
       "proxy_scored_task_count": 0,
       "result_record_count": 20,
       "scope": "128 selected episodes, held-out test",
-      "scored_task_count": 10,
-      "scoreless_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       }
     },
     "raw128_neural_mlp": {
@@ -140,10 +140,10 @@
     "cosmos3_super_reasoner": 13,
     "metadata128_neural_mlp": 14,
     "metadata128_simple": 12,
-    "qwen3_omni_v6_lora": 10
   },
   "missing_by_status": {
-    "not_evaluated_in_verified_package": 38,
     "not_supported_by_metadata_only_package": 22,
     "unsupported_without_required_target": 4
   },
@@ -183,15 +183,13 @@
     "11 Temporal Order Verification": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
-      "metadata128_neural_mlp",
-      "qwen3_omni_v6_lora"
     ],
     "12 Multimodal Synchronization Detection": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
-      "metadata128_simple",
-      "qwen3_omni_v6_lora"
     ],
     "13 Long-Horizon Next-Action Forecasting": [
       "cosmos3_nano_future_window",
@@ -241,8 +239,7 @@
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
-      "metadata128_simple",
-      "qwen3_omni_v6_lora"
     ]
   },
   "missing_records": [
@@ -519,19 +516,6 @@
       "task_label": "Temporal Order Verification",
       "task_number": 11
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "f1",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "temporal_order",
-      "task_label": "Temporal Order Verification",
-      "task_number": 11
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
@@ -584,19 +568,6 @@
       "task_label": "Multimodal Synchronization Detection",
       "task_number": 12
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "f1",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "misalignment_detection",
-      "task_label": "Multimodal Synchronization Detection",
-      "task_number": 12
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
@@ -1039,19 +1010,6 @@
       "task_label": "Time-to-Next-Transition Regression",
       "task_number": 20
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "mae",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "time_to_transition",
-      "task_label": "Time-to-Next-Transition Regression",
-      "task_number": 20
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "mae",
@@ -1125,8 +1083,8 @@
     "method_count": 9,
     "method_task_record_count": 180,
     "proxy_scored_method_task_count": 4,
-    "scored_method_task_count": 116,
-    "scoreless_method_task_count": 64,
     "task_count": 20
   },
   "source_matrix": "docs/data/task_method_20_result_matrix.json",

 {
+  "generated_at_utc": "2026-06-17T21:17:51+00:00",
   "immediate_actions": [
     {
       "artifact": "docs/data/task_method_20_gap_audit.json",
       "id": "gap_audit",
+      "purpose": "Keep the 61 scoreless cells visible and reproducible."
     },
     {
       "artifact": "scripts/omni/score_model_output_probes.py",
       "proxy_scored_task_count": 0,
       "result_record_count": 20,
       "scope": "128 selected episodes, held-out test",
+      "scored_task_count": 13,
+      "scoreless_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       }
     },
     "raw128_neural_mlp": {
     "cosmos3_super_reasoner": 13,
     "metadata128_neural_mlp": 14,
     "metadata128_simple": 12,
+    "qwen3_omni_v6_lora": 7
   },
   "missing_by_status": {
+    "not_evaluated_in_verified_package": 35,
     "not_supported_by_metadata_only_package": 22,
     "unsupported_without_required_target": 4
   },
     "11 Temporal Order Verification": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
+      "metadata128_neural_mlp"
     ],
     "12 Multimodal Synchronization Detection": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
+      "metadata128_simple"
     ],
     "13 Long-Horizon Next-Action Forecasting": [
       "cosmos3_nano_future_window",
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
+      "metadata128_simple"
     ]
   },
   "missing_records": [
       "task_label": "Temporal Order Verification",
       "task_number": 11
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
       "task_label": "Multimodal Synchronization Detection",
       "task_number": 12
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
       "task_label": "Time-to-Next-Transition Regression",
       "task_number": 20
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "mae",
     "method_count": 9,
     "method_task_record_count": 180,
     "proxy_scored_method_task_count": 4,
+    "scored_method_task_count": 119,
+    "scoreless_method_task_count": 61,
     "task_count": 20
   },
   "source_matrix": "docs/data/task_method_20_result_matrix.json",

docs/data/task_method_20_result_matrix.json CHANGED Viewed

@@ -1,11 +1,11 @@
 {
   "title": "Task Method 20-Result Matrix",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
-  "scored_method_task_count": 116,
   "series": [
     {
       "id": "minimal",
@@ -161,17 +161,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1958,17 +1958,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -2120,17 +2120,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -3416,17 +3416,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "Task Method 20-Result Matrix",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
+  "scored_method_task_count": 119,
   "series": [
     {
       "id": "minimal",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

docs/data/task_surface_integrity.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T20:46:02+00:00",
   "summary": {
     "task_count": 12,
     "expected_task_count": 12,

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:26+00:00",
   "summary": {
     "task_count": 12,
     "expected_task_count": 12,

docs/data/unified_task_model_radar.json CHANGED Viewed

@@ -1,11 +1,11 @@
 {
   "title": "Unified 20-Task Model Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
-  "scored_method_task_count": 116,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -170,17 +170,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1375,6 +1375,17 @@
           "raw_text": "0.8520",
           "status_label": "scored"
         },
         "metadata128_simple": {
           "raw": 0.4198864140782312,
           "metric_key": "f1",
@@ -1419,17 +1430,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
@@ -1486,6 +1486,17 @@
           "raw_text": "0.7153",
           "status_label": "scored"
         },
         "metadata128_simple": {
           "raw": null,
           "metric_key": "f1",
@@ -1530,17 +1541,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
@@ -2374,6 +2374,17 @@
           "raw_text": "10.55",
           "status_label": "scored"
         },
         "raw128_simple": {
           "raw": 52.32759475708008,
           "metric_key": "mae",
@@ -2418,17 +2429,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "mae",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "mae",
@@ -2492,7 +2492,7 @@
       "title": "Qwen3-Omni v6 LoRA",
       "status": "verified",
       "task_aligned_axes": "Qwen3",
-      "coverage": "20 records / 10 scored task-aligned axes",
       "headline": "JSON validity 0.9990; action macro-F1 0.0029",
       "source": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora_eval_test_full/eval/metrics.json"
     },
@@ -4256,17 +4256,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -4418,17 +4418,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -5714,17 +5714,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "Unified 20-Task Model Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
+  "scored_method_task_count": 119,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
           "raw_text": "0.8520",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 0.40984631701404173,
+          "metric_key": "temporal_order_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.40984631701404173,
+          "raw_text": "0.4098",
+          "status_label": "scored"
+        },
         "metadata128_simple": {
           "raw": 0.4198864140782312,
           "metric_key": "f1",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "0.7153",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 0.3344936184319576,
+          "metric_key": "misalignment_detection_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.3344936184319576,
+          "raw_text": "0.3345",
+          "status_label": "scored"
+        },
         "metadata128_simple": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "10.55",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 134.0687422166874,
+          "metric_key": "time_to_transition_mae",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.07859666766782253,
+          "raw_text": "134.07",
+          "status_label": "scored"
+        },
         "raw128_simple": {
           "raw": 52.32759475708008,
           "metric_key": "mae",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "mae",
       "title": "Qwen3-Omni v6 LoRA",
       "status": "verified",
       "task_aligned_axes": "Qwen3",
+      "coverage": "20 records / 13 scored task-aligned axes",
       "headline": "JSON validity 0.9990; action macro-F1 0.0029",
       "source": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora_eval_test_full/eval/metrics.json"
     },
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

docs/data/website_integrity.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T21:12:34+00:00",
   "docs_root": "docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
@@ -316,7 +316,7 @@
     },
     {
       "path": "data/episode128_task_model_radar.json",
-      "bytes": 187388,
       "top_level_type": "dict"
     },
     {
@@ -486,12 +486,12 @@
     },
     {
       "path": "data/task_method_20_gap_audit.json",
-      "bytes": 55745,
       "top_level_type": "dict"
     },
     {
       "path": "data/task_method_20_result_matrix.json",
-      "bytes": 129749,
       "top_level_type": "dict"
     },
     {
@@ -526,7 +526,7 @@
     },
     {
       "path": "data/unified_task_model_radar.json",
-      "bytes": 231240,
       "top_level_type": "dict"
     },
     {
@@ -566,7 +566,7 @@
     {
       "path": "assets/charts/episode128_task_model_radar.svg",
       "exists": true,
-      "bytes": 44044,
       "format": "SVG",
       "has_viewbox": true
     },
@@ -636,7 +636,7 @@
     {
       "path": "assets/charts/unified_task_model_radar.svg",
       "exists": true,
-      "bytes": 50060,
       "format": "SVG",
       "has_viewbox": true
     },

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:27+00:00",
   "docs_root": "docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
     },
     {
       "path": "data/episode128_task_model_radar.json",
+      "bytes": 187309,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/task_method_20_gap_audit.json",
+      "bytes": 53574,
       "top_level_type": "dict"
     },
     {
       "path": "data/task_method_20_result_matrix.json",
+      "bytes": 129707,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/unified_task_model_radar.json",
+      "bytes": 231161,
       "top_level_type": "dict"
     },
     {
     {
       "path": "assets/charts/episode128_task_model_radar.svg",
       "exists": true,
+      "bytes": 44378,
       "format": "SVG",
       "has_viewbox": true
     },
     {
       "path": "assets/charts/unified_task_model_radar.svg",
       "exists": true,
+      "bytes": 50394,
       "format": "SVG",
       "has_viewbox": true
     },

metrics/episode128_task_model_radar.json CHANGED Viewed

@@ -1,12 +1,12 @@
 {
   "title": "128-Episode 20-Task Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
   "task_count": 20,
   "method_count": 7,
   "method_task_record_count": 140,
-  "scored_method_task_count": 76,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -127,17 +127,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1157,15 +1157,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -1248,15 +1248,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -1976,15 +1976,15 @@
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "mae",
-          "source": null,
           "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
@@ -3350,17 +3350,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -3476,17 +3476,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -4484,17 +4484,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "128-Episode 20-Task Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "description": "Selected 128-episode metadata/raw baselines plus verified Qwen3/Cosmos branches. Every method has 20 records; numeric scores appear only where the public artifact produced that task target.",
   "task_count": 20,
   "method_count": 7,
   "method_task_record_count": 140,
+  "scored_method_task_count": 79,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 0.40984631701404173,
+          "metric_key": "temporal_order_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.40984631701404173,
+          "raw_text": "0.4098",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 0.3344936184319576,
+          "metric_key": "misalignment_detection_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.3344936184319576,
+          "raw_text": "0.3345",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "status_label": "scored"
         },
         "qwen3_omni_v6_lora": {
+          "raw": 134.0687422166874,
+          "metric_key": "time_to_transition_mae",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
           "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.07859666766782253,
+          "raw_text": "134.07",
+          "status_label": "scored"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

metrics/publication_audit.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T21:12:50+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
@@ -206,8 +206,8 @@
     "github_repo": {
       "root": "repo",
       "exists": true,
-      "file_count": 1232,
-      "text_file_count": 1034,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 55702978

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:35+00:00",
   "checks": [
     {
       "name": "required_publication_assets_present",
     "github_repo": {
       "root": "repo",
       "exists": true,
+      "file_count": 1250,
+      "text_file_count": 1052,
       "largest_file": {
         "path": "results/episode_task_suite/modality_reconstruction/predictions.npz",
         "bytes": 55702978

metrics/single_episode_task_model_radar.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "title": "Single-Episode 20-Task Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
   "task_count": 20,
   "method_count": 2,

 {
   "title": "Single-Episode 20-Task Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "description": "Minimal and Neural MLP baselines on the one public sample episode, both scored on all 20 task contracts.",
   "task_count": 20,
   "method_count": 2,

metrics/task_method_20_gap_audit.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
-  "generated_at_utc": "2026-06-17T13:55:12+00:00",
   "immediate_actions": [
     {
       "artifact": "docs/data/task_method_20_gap_audit.json",
       "id": "gap_audit",
-      "purpose": "Keep the 64 scoreless cells visible and reproducible."
     },
     {
       "artifact": "scripts/omni/score_model_output_probes.py",
@@ -101,11 +101,11 @@
       "proxy_scored_task_count": 0,
       "result_record_count": 20,
       "scope": "128 selected episodes, held-out test",
-      "scored_task_count": 10,
-      "scoreless_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       }
     },
     "raw128_neural_mlp": {
@@ -140,10 +140,10 @@
     "cosmos3_super_reasoner": 13,
     "metadata128_neural_mlp": 14,
     "metadata128_simple": 12,
-    "qwen3_omni_v6_lora": 10
   },
   "missing_by_status": {
-    "not_evaluated_in_verified_package": 38,
     "not_supported_by_metadata_only_package": 22,
     "unsupported_without_required_target": 4
   },
@@ -183,15 +183,13 @@
     "11 Temporal Order Verification": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
-      "metadata128_neural_mlp",
-      "qwen3_omni_v6_lora"
     ],
     "12 Multimodal Synchronization Detection": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
-      "metadata128_simple",
-      "qwen3_omni_v6_lora"
     ],
     "13 Long-Horizon Next-Action Forecasting": [
       "cosmos3_nano_future_window",
@@ -241,8 +239,7 @@
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
-      "metadata128_simple",
-      "qwen3_omni_v6_lora"
     ]
   },
   "missing_records": [
@@ -519,19 +516,6 @@
       "task_label": "Temporal Order Verification",
       "task_number": 11
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "f1",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "temporal_order",
-      "task_label": "Temporal Order Verification",
-      "task_number": 11
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
@@ -584,19 +568,6 @@
       "task_label": "Multimodal Synchronization Detection",
       "task_number": 12
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "f1",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "misalignment_detection",
-      "task_label": "Multimodal Synchronization Detection",
-      "task_number": 12
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
@@ -1039,19 +1010,6 @@
       "task_label": "Time-to-Next-Transition Regression",
       "task_number": 20
     },
-    {
-      "method": "Qwen3-Omni v6 LoRA",
-      "metric_key": "mae",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-      "recommended_next_step": "Generate verified model outputs for this task contract and score them against the held-out labels.",
-      "scope": "multi_episode_128_partial_model_overlay",
-      "series_id": "qwen3_omni_v6_lora",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "task_id": "time_to_transition",
-      "task_label": "Time-to-Next-Transition Regression",
-      "task_number": 20
-    },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "mae",
@@ -1125,8 +1083,8 @@
     "method_count": 9,
     "method_task_record_count": 180,
     "proxy_scored_method_task_count": 4,
-    "scored_method_task_count": 116,
-    "scoreless_method_task_count": 64,
     "task_count": 20
   },
   "source_matrix": "docs/data/task_method_20_result_matrix.json",

 {
+  "generated_at_utc": "2026-06-17T21:17:51+00:00",
   "immediate_actions": [
     {
       "artifact": "docs/data/task_method_20_gap_audit.json",
       "id": "gap_audit",
+      "purpose": "Keep the 61 scoreless cells visible and reproducible."
     },
     {
       "artifact": "scripts/omni/score_model_output_probes.py",
       "proxy_scored_task_count": 0,
       "result_record_count": 20,
       "scope": "128 selected episodes, held-out test",
+      "scored_task_count": 13,
+      "scoreless_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       }
     },
     "raw128_neural_mlp": {
     "cosmos3_super_reasoner": 13,
     "metadata128_neural_mlp": 14,
     "metadata128_simple": 12,
+    "qwen3_omni_v6_lora": 7
   },
   "missing_by_status": {
+    "not_evaluated_in_verified_package": 35,
     "not_supported_by_metadata_only_package": 22,
     "unsupported_without_required_target": 4
   },
     "11 Temporal Order Verification": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
+      "metadata128_neural_mlp"
     ],
     "12 Multimodal Synchronization Detection": [
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
+      "metadata128_simple"
     ],
     "13 Long-Horizon Next-Action Forecasting": [
       "cosmos3_nano_future_window",
       "cosmos3_nano_future_window",
       "cosmos3_super_reasoner",
       "metadata128_neural_mlp",
+      "metadata128_simple"
     ]
   },
   "missing_records": [
       "task_label": "Temporal Order Verification",
       "task_number": 11
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
       "task_label": "Multimodal Synchronization Detection",
       "task_number": 12
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "f1",
       "task_label": "Time-to-Next-Transition Regression",
       "task_number": 20
     },
     {
       "method": "Cosmos3-Super Reasoner",
       "metric_key": "mae",
     "method_count": 9,
     "method_task_record_count": 180,
     "proxy_scored_method_task_count": 4,
+    "scored_method_task_count": 119,
+    "scoreless_method_task_count": 61,
     "task_count": 20
   },
   "source_matrix": "docs/data/task_method_20_result_matrix.json",

metrics/task_method_20_result_matrix.json CHANGED Viewed

@@ -1,11 +1,11 @@
 {
   "title": "Task Method 20-Result Matrix",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
-  "scored_method_task_count": 116,
   "series": [
     {
       "id": "minimal",
@@ -161,17 +161,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1958,17 +1958,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -2120,17 +2120,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -3416,17 +3416,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "Task Method 20-Result Matrix",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
+  "scored_method_task_count": 119,
   "series": [
     {
       "id": "minimal",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

metrics/task_surface_integrity.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T20:46:02+00:00",
   "summary": {
     "task_count": 12,
     "expected_task_count": 12,

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:26+00:00",
   "summary": {
     "task_count": 12,
     "expected_task_count": 12,

metrics/unified_task_model_radar.json CHANGED Viewed

@@ -1,11 +1,11 @@
 {
   "title": "Unified 20-Task Model Radar",
   "status": "pass",
-  "generated_at_utc": "2026-06-17T13:55:02+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
-  "scored_method_task_count": 116,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
@@ -170,17 +170,17 @@
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
-      "scored_task_count": 10,
-      "covered_task_count": 10,
       "proxy_scored_task_count": 0,
-      "scoreless_task_count": 10,
       "unsupported_task_count": 0,
-      "not_evaluated_task_count": 10,
       "status_counts": {
-        "not_evaluated_in_verified_package": 10,
-        "scored": 10
       },
-      "coverage_fraction": 0.5,
       "result_record_fraction": 1.0
     },
     {
@@ -1375,6 +1375,17 @@
           "raw_text": "0.8520",
           "status_label": "scored"
         },
         "metadata128_simple": {
           "raw": 0.4198864140782312,
           "metric_key": "f1",
@@ -1419,17 +1430,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
@@ -1486,6 +1486,17 @@
           "raw_text": "0.7153",
           "status_label": "scored"
         },
         "metadata128_simple": {
           "raw": null,
           "metric_key": "f1",
@@ -1530,17 +1541,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "f1",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
@@ -2374,6 +2374,17 @@
           "raw_text": "10.55",
           "status_label": "scored"
         },
         "raw128_simple": {
           "raw": 52.32759475708008,
           "metric_key": "mae",
@@ -2418,17 +2429,6 @@
           "raw_text": "n/a",
           "status_label": "not supported"
         },
-        "qwen3_omni_v6_lora": {
-          "raw": null,
-          "metric_key": "mae",
-          "source": null,
-          "scope": "multi_episode_128_partial_model_overlay",
-          "status": "not_evaluated_in_verified_package",
-          "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score",
-          "normalized_score": null,
-          "raw_text": "n/a",
-          "status_label": "not evaluated"
-        },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "mae",
@@ -2492,7 +2492,7 @@
       "title": "Qwen3-Omni v6 LoRA",
       "status": "verified",
       "task_aligned_axes": "Qwen3",
-      "coverage": "20 records / 10 scored task-aligned axes",
       "headline": "JSON validity 0.9990; action macro-F1 0.0029",
       "source": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora_eval_test_full/eval/metrics.json"
     },
@@ -4256,17 +4256,17 @@
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 11,
@@ -4418,17 +4418,17 @@
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "f1",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 12,
@@ -5714,17 +5714,17 @@
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
-      "status": "not_evaluated_in_verified_package",
-      "status_label": "not evaluated",
-      "scored": false,
       "proxy_scored": false,
-      "raw": null,
-      "raw_text": "n/a",
-      "normalized_score": null,
-      "metric_key": "mae",
-      "source": null,
       "scope": "multi_episode_128_partial_model_overlay",
-      "reason": "the verified public model package did not ask this branch to emit that task target; a new task-specific evaluation package is required for a numeric score"
     },
     {
       "task_number": 20,

 {
   "title": "Unified 20-Task Model Radar",
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:17:41+00:00",
   "task_count": 20,
   "method_count": 9,
   "method_task_record_count": 180,
+  "scored_method_task_count": 119,
   "normalization_policy": {
     "higher_is_better": "bounded metrics are plotted directly on 0-1 axes after clipping to [0, 1]",
     "lower_is_better": "lower-error metrics are converted to best_observed_value / raw_value within the same task",
       "method_detail": "Verified held-out Qwen3-Omni v6 LoRA metrics, plus task 16 and any completed private-GPU future-task probes scored from task-specific JSON.",
       "plotted_as": "colored point overlay",
       "result_record_count": 20,
+      "scored_task_count": 13,
+      "covered_task_count": 13,
       "proxy_scored_task_count": 0,
+      "scoreless_task_count": 7,
       "unsupported_task_count": 0,
+      "not_evaluated_task_count": 7,
       "status_counts": {
+        "not_evaluated_in_verified_package": 7,
+        "scored": 13
       },
+      "coverage_fraction": 0.65,
       "result_record_fraction": 1.0
     },
     {
           "raw_text": "0.8520",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 0.40984631701404173,
+          "metric_key": "temporal_order_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.40984631701404173,
+          "raw_text": "0.4098",
+          "status_label": "scored"
+        },
         "metadata128_simple": {
           "raw": 0.4198864140782312,
           "metric_key": "f1",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "0.7153",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 0.3344936184319576,
+          "metric_key": "misalignment_detection_f1",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.3344936184319576,
+          "raw_text": "0.3345",
+          "status_label": "scored"
+        },
         "metadata128_simple": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "f1",
           "raw_text": "10.55",
           "status_label": "scored"
         },
+        "qwen3_omni_v6_lora": {
+          "raw": 134.0687422166874,
+          "metric_key": "time_to_transition_mae",
+          "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
+          "scope": "multi_episode_128_partial_model_overlay",
+          "status": "scored",
+          "reason": null,
+          "normalized_score": 0.07859666766782253,
+          "raw_text": "134.07",
+          "status_label": "scored"
+        },
         "raw128_simple": {
           "raw": 52.32759475708008,
           "metric_key": "mae",
           "raw_text": "n/a",
           "status_label": "not supported"
         },
         "cosmos3_super_reasoner": {
           "raw": null,
           "metric_key": "mae",
       "title": "Qwen3-Omni v6 LoRA",
       "status": "verified",
       "task_aligned_axes": "Qwen3",
+      "coverage": "20 records / 13 scored task-aligned axes",
       "headline": "JSON validity 0.9990; action macro-F1 0.0029",
       "source": "results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora_eval_test_full/eval/metrics.json"
     },
       "task_label": "Temporal Order Verification",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.40984631701404173,
+      "raw_text": "0.4098",
+      "normalized_score": 0.40984631701404173,
+      "metric_key": "temporal_order_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/temporal_order/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 11,
       "task_label": "Multimodal Synchronization Detection",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 0.3344936184319576,
+      "raw_text": "0.3345",
+      "normalized_score": 0.3344936184319576,
+      "metric_key": "misalignment_detection_f1",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/misalignment_detection/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 12,
       "task_label": "Time-to-Next-Transition Regression",
       "series_id": "qwen3_omni_v6_lora",
       "method": "Qwen3-Omni v6 LoRA",
+      "status": "scored",
+      "status_label": "scored",
+      "scored": true,
       "proxy_scored": false,
+      "raw": 134.0687422166874,
+      "raw_text": "134.07",
+      "normalized_score": 0.07859666766782253,
+      "metric_key": "time_to_transition_mae",
+      "source": "results/omni_finetune/xperience10m_qwen3_omni_v6_order_sync_time_probes_a100_20260617T132500Z/time_to_transition/metrics.json",
       "scope": "multi_episode_128_partial_model_overlay",
+      "reason": null
     },
     {
       "task_number": 20,

metrics/website_integrity.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "status": "pass",
-  "generated_at_utc": "2026-06-17T21:12:34+00:00",
   "docs_root": "docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
@@ -316,7 +316,7 @@
     },
     {
       "path": "data/episode128_task_model_radar.json",
-      "bytes": 187388,
       "top_level_type": "dict"
     },
     {
@@ -486,12 +486,12 @@
     },
     {
       "path": "data/task_method_20_gap_audit.json",
-      "bytes": 55745,
       "top_level_type": "dict"
     },
     {
       "path": "data/task_method_20_result_matrix.json",
-      "bytes": 129749,
       "top_level_type": "dict"
     },
     {
@@ -526,7 +526,7 @@
     },
     {
       "path": "data/unified_task_model_radar.json",
-      "bytes": 231240,
       "top_level_type": "dict"
     },
     {
@@ -566,7 +566,7 @@
     {
       "path": "assets/charts/episode128_task_model_radar.svg",
       "exists": true,
-      "bytes": 44044,
       "format": "SVG",
       "has_viewbox": true
     },
@@ -636,7 +636,7 @@
     {
       "path": "assets/charts/unified_task_model_radar.svg",
       "exists": true,
-      "bytes": 50060,
       "format": "SVG",
       "has_viewbox": true
     },

 {
   "status": "pass",
+  "generated_at_utc": "2026-06-17T21:25:27+00:00",
   "docs_root": "docs",
   "site_base": "/ropedia-xperience-10m-task-suite/",
   "summary": {
     },
     {
       "path": "data/episode128_task_model_radar.json",
+      "bytes": 187309,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/task_method_20_gap_audit.json",
+      "bytes": 53574,
       "top_level_type": "dict"
     },
     {
       "path": "data/task_method_20_result_matrix.json",
+      "bytes": 129707,
       "top_level_type": "dict"
     },
     {
     },
     {
       "path": "data/unified_task_model_radar.json",
+      "bytes": 231161,
       "top_level_type": "dict"
     },
     {
     {
       "path": "assets/charts/episode128_task_model_radar.svg",
       "exists": true,
+      "bytes": 44378,
       "format": "SVG",
       "has_viewbox": true
     },
     {
       "path": "assets/charts/unified_task_model_radar.svg",
       "exists": true,
+      "bytes": 50394,
       "format": "SVG",
       "has_viewbox": true
     },