cy0307
/

ropedia-xperience-10m-task-baselines

@@ -84,6 +84,35 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
 | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
 | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
 ## Research Project Overview
 | Theme | Current implementation |
@@ -179,7 +208,7 @@ The generated evaluation protocol is at
 The generated research takeaways are at
 [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
 [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
-The staged research roadmap is at
 [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
 [`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
 The foundation-model selection plan is at
@@ -352,7 +381,7 @@ Hugging Face Space app:
 | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
 | Task surface integrity | `docs/data/task_surface_integrity.json` | Checks the public task cards, readable task names, representative modality thumbnails, and interactive walkthrough storyboard |
 | Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json` | Records the browser-level page load, tab navigation, walkthrough deep link, player interaction, and console-health result |
-| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the staged path from sample-level task development to multi-episode and larger omni-model work |
 | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
 | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
 | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
@@ -481,7 +510,7 @@ docs/
   data/website_integrity.json       # machine-readable website integrity check
   data/project_manifest.json        # machine-readable public-surface metadata
   data/project_packet.json          # machine-readable project path and scope summary
-  data/research_roadmap.json        # staged multi-episode and omni-model roadmap
   data/research_directions.json     # four-track website data bundle
   data/research_direction_extensions.json # four extra probe data bundle
   data/task_walkthroughs.json       # human-readable task-card and walkthrough-storyboard data
@@ -605,13 +634,13 @@ The useful distinction is:
 The figure shows the intended end-to-end training flow: raw valid episodes enter
 episode-level split validation, parallel media/sensor export creates Qwen-style
 JSONL records, Qwen3-Omni receives video/audio/text directly, the sensor bridge
-adds depth/pose/mocap/IMU features, LoRA adapters are trained on staged
 train/val episodes, and sealed held-out test evaluation produces predictions,
 metrics, run reports, and upload-ready adapter artifacts.
 The current scale-up artifacts show that the export, manifest, sensor-feature,
 LoRA, and evaluation scripts can run on the available sample episode. They do
-not show a real multi-episode result. A real pilot requires staged valid
 episodes, held-out episode splits, training metadata, predictions, metrics, and
 a run report; the current selected pilot target is 128 episodes.
@@ -653,7 +682,7 @@ Current status in this repo:
 - gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
 - selected_episode_plan: 128 metadata-balanced episodes, 96/16/16 train/val/test
 - selected_download_size: 277.71 GiB excluding `visualization.rrd`
-- ready_for_held_out_pilot: false until the selected episodes are fully staged and checked
 - gated dataset: available for selected multi-episode data preparation
 - source_discovery: `results/omni_finetune/source_discovery.json`
 - data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
@@ -668,7 +697,7 @@ episode per top-level session UUID.
 ### Progressive Train/Validation Pilot
 The selected 128-episode plan can be used before every episode has arrived by
-training only on staged `train` episodes and monitoring staged `val` episodes.
 The final `test` episodes stay sealed until the end, so early development does
 not contaminate held-out evaluation.
@@ -688,7 +717,7 @@ running final test evaluation. The exporter uses session-qualified episode IDs
 and path-based split matching so repeated folder names such as `ep1` cannot
 collide across different sessions.
-For larger staged subsets, `scripts/omni/run_trainval_parallel_export_8gpu.sh`
 uses the same split guard, exports episodes in parallel CPU shards, skips and
 reports episodes that contain no labeled windows under the configured label
 rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
@@ -715,7 +744,7 @@ assuming one backbone solves every Xperience-10M objective.
 | Branch | Current role | When to use it |
 | --- | --- | --- |
 | Qwen3-Omni | First trainable multimodal LoRA pilot | Use for the selected 128-episode held-out baseline over video/audio/language plus sensor-bridge features. |
-| Cosmos 3 | First world-model/action-generation branch | Use after data staging for future-window prediction, action-conditioned world modeling, and synthetic-data usefulness tests. |
 | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
 | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
 | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |

 | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
 | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
+## Download And Accounting
+This repository is a baseline/artifact model repository rather than a single
+Transformers checkpoint. The main weights live under task-specific paths such
+as `artifacts/**/model.npz` and
+`artifacts/episode_task_suite/neural_mlp/**/model.pt`.
+For Hugging Face Hub download accounting, this repo includes a root
+[`config.json`](config.json) as the canonical query file. The displayed
+monthly download count can lag behind actual file requests, and direct browser
+downloads of arbitrary nested files may not be reflected immediately.
+Recommended programmatic access:
+```python
+from huggingface_hub import snapshot_download
+local_dir = snapshot_download(
+    repo_id="cy0307/ropedia-xperience-10m-task-baselines",
+    allow_patterns=[
+        "config.json",
+        "artifacts/**/*.npz",
+        "artifacts/**/*.pt",
+        "artifacts/**/metrics.json",
+        "artifacts/**/*predictions*",
+    ],
+)
+```
 ## Research Project Overview
 | Theme | Current implementation |
 The generated research takeaways are at
 [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
 [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
+The research roadmap is at
 [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
 [`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
 The foundation-model selection plan is at
 | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
 | Task surface integrity | `docs/data/task_surface_integrity.json` | Checks the public task cards, readable task names, representative modality thumbnails, and interactive walkthrough storyboard |
 | Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json` | Records the browser-level page load, tab navigation, walkthrough deep link, player interaction, and console-health result |
+| Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode and larger omni-model work |
 | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
 | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
 | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
   data/website_integrity.json       # machine-readable website integrity check
   data/project_manifest.json        # machine-readable public-surface metadata
   data/project_packet.json          # machine-readable project path and scope summary
+  data/research_roadmap.json        # multi-episode and omni-model roadmap
   data/research_directions.json     # four-track website data bundle
   data/research_direction_extensions.json # four extra probe data bundle
   data/task_walkthroughs.json       # human-readable task-card and walkthrough-storyboard data
 The figure shows the intended end-to-end training flow: raw valid episodes enter
 episode-level split validation, parallel media/sensor export creates Qwen-style
 JSONL records, Qwen3-Omni receives video/audio/text directly, the sensor bridge
+adds depth/pose/mocap/IMU features, LoRA adapters are trained on prepared
 train/val episodes, and sealed held-out test evaluation produces predictions,
 metrics, run reports, and upload-ready adapter artifacts.
 The current scale-up artifacts show that the export, manifest, sensor-feature,
 LoRA, and evaluation scripts can run on the available sample episode. They do
+not show a real multi-episode result. A real pilot requires valid prepared
 episodes, held-out episode splits, training metadata, predictions, metrics, and
 a run report; the current selected pilot target is 128 episodes.
 - gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
 - selected_episode_plan: 128 metadata-balanced episodes, 96/16/16 train/val/test
 - selected_download_size: 277.71 GiB excluding `visualization.rrd`
+- ready_for_held_out_pilot: false until the selected episodes are fully prepared and checked
 - gated dataset: available for selected multi-episode data preparation
 - source_discovery: `results/omni_finetune/source_discovery.json`
 - data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
 ### Progressive Train/Validation Pilot
 The selected 128-episode plan can be used before every episode has arrived by
+training only on prepared `train` episodes and monitoring prepared `val` episodes.
 The final `test` episodes stay sealed until the end, so early development does
 not contaminate held-out evaluation.
 and path-based split matching so repeated folder names such as `ep1` cannot
 collide across different sessions.
+For larger prepared subsets, `scripts/omni/run_trainval_parallel_export_8gpu.sh`
 uses the same split guard, exports episodes in parallel CPU shards, skips and
 reports episodes that contain no labeled windows under the configured label
 rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
 | Branch | Current role | When to use it |
 | --- | --- | --- |
 | Qwen3-Omni | First trainable multimodal LoRA pilot | Use for the selected 128-episode held-out baseline over video/audio/language plus sensor-bridge features. |
+| Cosmos 3 | First world-model/action-generation branch | Use after data preparation for future-window prediction, action-conditioned world modeling, and synthetic-data usefulness tests. |
 | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
 | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
 | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |