cy0307 commited on
Commit
2780a37
·
verified ·
1 Parent(s): 038e937

Document model download entrypoint

Browse files
Files changed (1) hide show
  1. README.md +38 -9
README.md CHANGED
@@ -84,6 +84,35 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
84
  | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
85
  | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  ## Research Project Overview
88
 
89
  | Theme | Current implementation |
@@ -179,7 +208,7 @@ The generated evaluation protocol is at
179
  The generated research takeaways are at
180
  [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
181
  [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
182
- The staged research roadmap is at
183
  [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
184
  [`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
185
  The foundation-model selection plan is at
@@ -352,7 +381,7 @@ Hugging Face Space app:
352
  | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
353
  | Task surface integrity | `docs/data/task_surface_integrity.json` | Checks the public task cards, readable task names, representative modality thumbnails, and interactive walkthrough storyboard |
354
  | Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json` | Records the browser-level page load, tab navigation, walkthrough deep link, player interaction, and console-health result |
355
- | Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the staged path from sample-level task development to multi-episode and larger omni-model work |
356
  | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
357
  | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
358
  | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
@@ -481,7 +510,7 @@ docs/
481
  data/website_integrity.json # machine-readable website integrity check
482
  data/project_manifest.json # machine-readable public-surface metadata
483
  data/project_packet.json # machine-readable project path and scope summary
484
- data/research_roadmap.json # staged multi-episode and omni-model roadmap
485
  data/research_directions.json # four-track website data bundle
486
  data/research_direction_extensions.json # four extra probe data bundle
487
  data/task_walkthroughs.json # human-readable task-card and walkthrough-storyboard data
@@ -605,13 +634,13 @@ The useful distinction is:
605
  The figure shows the intended end-to-end training flow: raw valid episodes enter
606
  episode-level split validation, parallel media/sensor export creates Qwen-style
607
  JSONL records, Qwen3-Omni receives video/audio/text directly, the sensor bridge
608
- adds depth/pose/mocap/IMU features, LoRA adapters are trained on staged
609
  train/val episodes, and sealed held-out test evaluation produces predictions,
610
  metrics, run reports, and upload-ready adapter artifacts.
611
 
612
  The current scale-up artifacts show that the export, manifest, sensor-feature,
613
  LoRA, and evaluation scripts can run on the available sample episode. They do
614
- not show a real multi-episode result. A real pilot requires staged valid
615
  episodes, held-out episode splits, training metadata, predictions, metrics, and
616
  a run report; the current selected pilot target is 128 episodes.
617
 
@@ -653,7 +682,7 @@ Current status in this repo:
653
  - gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
654
  - selected_episode_plan: 128 metadata-balanced episodes, 96/16/16 train/val/test
655
  - selected_download_size: 277.71 GiB excluding `visualization.rrd`
656
- - ready_for_held_out_pilot: false until the selected episodes are fully staged and checked
657
  - gated dataset: available for selected multi-episode data preparation
658
  - source_discovery: `results/omni_finetune/source_discovery.json`
659
  - data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
@@ -668,7 +697,7 @@ episode per top-level session UUID.
668
  ### Progressive Train/Validation Pilot
669
 
670
  The selected 128-episode plan can be used before every episode has arrived by
671
- training only on staged `train` episodes and monitoring staged `val` episodes.
672
  The final `test` episodes stay sealed until the end, so early development does
673
  not contaminate held-out evaluation.
674
 
@@ -688,7 +717,7 @@ running final test evaluation. The exporter uses session-qualified episode IDs
688
  and path-based split matching so repeated folder names such as `ep1` cannot
689
  collide across different sessions.
690
 
691
- For larger staged subsets, `scripts/omni/run_trainval_parallel_export_8gpu.sh`
692
  uses the same split guard, exports episodes in parallel CPU shards, skips and
693
  reports episodes that contain no labeled windows under the configured label
694
  rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
@@ -715,7 +744,7 @@ assuming one backbone solves every Xperience-10M objective.
715
  | Branch | Current role | When to use it |
716
  | --- | --- | --- |
717
  | Qwen3-Omni | First trainable multimodal LoRA pilot | Use for the selected 128-episode held-out baseline over video/audio/language plus sensor-bridge features. |
718
- | Cosmos 3 | First world-model/action-generation branch | Use after data staging for future-window prediction, action-conditioned world modeling, and synthetic-data usefulness tests. |
719
  | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
720
  | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
721
  | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |
 
84
  | Understand one model input | [`results/episode_task_suite/feature_manifest.json`](results/episode_task_suite/feature_manifest.json), [`results/episode_task_suite/windows.csv`](results/episode_task_suite/windows.csv) |
85
  | Check multi-episode data status | [`results/omni_finetune/DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md) |
86
 
87
+ ## Download And Accounting
88
+
89
+ This repository is a baseline/artifact model repository rather than a single
90
+ Transformers checkpoint. The main weights live under task-specific paths such
91
+ as `artifacts/**/model.npz` and
92
+ `artifacts/episode_task_suite/neural_mlp/**/model.pt`.
93
+
94
+ For Hugging Face Hub download accounting, this repo includes a root
95
+ [`config.json`](config.json) as the canonical query file. The displayed
96
+ monthly download count can lag behind actual file requests, and direct browser
97
+ downloads of arbitrary nested files may not be reflected immediately.
98
+
99
+ Recommended programmatic access:
100
+
101
+ ```python
102
+ from huggingface_hub import snapshot_download
103
+
104
+ local_dir = snapshot_download(
105
+ repo_id="cy0307/ropedia-xperience-10m-task-baselines",
106
+ allow_patterns=[
107
+ "config.json",
108
+ "artifacts/**/*.npz",
109
+ "artifacts/**/*.pt",
110
+ "artifacts/**/metrics.json",
111
+ "artifacts/**/*predictions*",
112
+ ],
113
+ )
114
+ ```
115
+
116
  ## Research Project Overview
117
 
118
  | Theme | Current implementation |
 
208
  The generated research takeaways are at
209
  [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md) and
210
  [`docs/data/research_takeaways.json`](docs/data/research_takeaways.json).
211
+ The research roadmap is at
212
  [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md) and
213
  [`docs/data/research_roadmap.json`](docs/data/research_roadmap.json).
214
  The foundation-model selection plan is at
 
381
  | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `docs/data/evaluation_protocol.json` | Defines the task unit, split, metrics, leakage controls, and current limitations |
382
  | Task surface integrity | `docs/data/task_surface_integrity.json` | Checks the public task cards, readable task names, representative modality thumbnails, and interactive walkthrough storyboard |
383
  | Rendered website check | `RENDERED_SITE_CHECK.md`, `docs/data/rendered_site_check.json` | Records the browser-level page load, tab navigation, walkthrough deep link, player interaction, and console-health result |
384
+ | Research roadmap | `RESEARCH_ROADMAP.md`, `docs/data/research_roadmap.json` | Shows the path from sample-level task development to multi-episode and larger omni-model work |
385
  | Minimal heads | softmax, ridge projection/regression, multi-label logistic heads | Keeps every input/output contract visible and inspectable |
386
  | Neural heads | PyTorch MLP classifiers/regressors under `neural_mlp/` | Checks whether nonlinear heads improve each task without changing features |
387
  | Evidence | metrics, predictions, confusion matrices, diagrams, dashboard | Makes the single-episode task development inspectable without rerunning first |
 
510
  data/website_integrity.json # machine-readable website integrity check
511
  data/project_manifest.json # machine-readable public-surface metadata
512
  data/project_packet.json # machine-readable project path and scope summary
513
+ data/research_roadmap.json # multi-episode and omni-model roadmap
514
  data/research_directions.json # four-track website data bundle
515
  data/research_direction_extensions.json # four extra probe data bundle
516
  data/task_walkthroughs.json # human-readable task-card and walkthrough-storyboard data
 
634
  The figure shows the intended end-to-end training flow: raw valid episodes enter
635
  episode-level split validation, parallel media/sensor export creates Qwen-style
636
  JSONL records, Qwen3-Omni receives video/audio/text directly, the sensor bridge
637
+ adds depth/pose/mocap/IMU features, LoRA adapters are trained on prepared
638
  train/val episodes, and sealed held-out test evaluation produces predictions,
639
  metrics, run reports, and upload-ready adapter artifacts.
640
 
641
  The current scale-up artifacts show that the export, manifest, sensor-feature,
642
  LoRA, and evaluation scripts can run on the available sample episode. They do
643
+ not show a real multi-episode result. A real pilot requires valid prepared
644
  episodes, held-out episode splits, training metadata, predictions, metrics, and
645
  a run report; the current selected pilot target is 128 episodes.
646
 
 
682
  - gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
683
  - selected_episode_plan: 128 metadata-balanced episodes, 96/16/16 train/val/test
684
  - selected_download_size: 277.71 GiB excluding `visualization.rrd`
685
+ - ready_for_held_out_pilot: false until the selected episodes are fully prepared and checked
686
  - gated dataset: available for selected multi-episode data preparation
687
  - source_discovery: `results/omni_finetune/source_discovery.json`
688
  - data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
 
697
  ### Progressive Train/Validation Pilot
698
 
699
  The selected 128-episode plan can be used before every episode has arrived by
700
+ training only on prepared `train` episodes and monitoring prepared `val` episodes.
701
  The final `test` episodes stay sealed until the end, so early development does
702
  not contaminate held-out evaluation.
703
 
 
717
  and path-based split matching so repeated folder names such as `ep1` cannot
718
  collide across different sessions.
719
 
720
+ For larger prepared subsets, `scripts/omni/run_trainval_parallel_export_8gpu.sh`
721
  uses the same split guard, exports episodes in parallel CPU shards, skips and
722
  reports episodes that contain no labeled windows under the configured label
723
  rule, then launches Qwen3-Omni LoRA with `NUM_PROCESSES=8`.
 
744
  | Branch | Current role | When to use it |
745
  | --- | --- | --- |
746
  | Qwen3-Omni | First trainable multimodal LoRA pilot | Use for the selected 128-episode held-out baseline over video/audio/language plus sensor-bridge features. |
747
+ | Cosmos 3 | First world-model/action-generation branch | Use after data preparation for future-window prediction, action-conditioned world modeling, and synthetic-data usefulness tests. |
748
  | GR00T | Humanoid/action-policy branch | Use after mocap/contact retargeting creates well-defined humanoid action targets. |
749
  | OpenVLA / openpi | Open VLA/policy baselines | Use after the project defines robot-compatible or action-token targets. |
750
  | Gemini Robotics | External reasoning reference | Use only for qualitative comparison or annotation support unless local trainable access exists. |