Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Refresh model README
Browse files
README.md
CHANGED
|
@@ -74,7 +74,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
|
|
| 74 |
| Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
|
| 75 |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split; companion simple/NN metadata baselines are also aligned to the selected 128-episode 96/16/16 split |
|
| 76 |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
|
| 77 |
-
| Scale-up path |
|
| 78 |
| Public surfaces | GitHub repo, GitHub Pages dashboard, GHCR static-site package, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
|
| 79 |
|
| 80 |
For the fastest interpretation of the current metrics, start with
|
|
@@ -111,7 +111,7 @@ This project is best read as a staged embodied-AI research study:
|
|
| 111 |
| Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
|
| 112 |
| Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split; the selected 128-episode setup also has same-split simple/NN metadata baselines for JSON-supported tasks. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [`results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md`](results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md) |
|
| 113 |
| Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
|
| 114 |
-
| Scale-up | The selected 128-episode Qwen3-Omni LoRA diagnostic
|
| 115 |
|
| 116 |
Detailed dataset notes, reproduction checks, and generated JSON reports are
|
| 117 |
included for readers who want to inspect the implementation, but they are
|
|
@@ -133,7 +133,7 @@ They give the current research state in one compact table:
|
|
| 133 |
| Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented |
|
| 134 |
| Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
|
| 135 |
| Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links |
|
| 136 |
-
| Qwen3-Omni multi-episode pilot |
|
| 137 |
| Raw Xperience-10M data / full Qwen weights | Not redistributed |
|
| 138 |
|
| 139 |
## 90-Second Research Project Path
|
|
@@ -152,7 +152,7 @@ If you are reading the project cold, open these in order:
|
|
| 152 |
| 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
|
| 153 |
| 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
|
| 154 |
| 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
|
| 155 |
-
| 11 | What is still pending? | [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json), [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | The
|
| 156 |
|
| 157 |
A compact reader-path summary is available at
|
| 158 |
[`docs/data/project_packet.json`](docs/data/project_packet.json).
|
|
@@ -481,8 +481,8 @@ python scripts/train_all_modalities_model.py --workspace /path/to/workspace
|
|
| 481 |
|
| 482 |
This repo includes a first Qwen3-Omni fine-tuning path over Xperience-10M. The
|
| 483 |
repository separates public-sample evidence from multi-episode fine-tuning
|
| 484 |
-
artifacts. The
|
| 485 |
-
diagnostic
|
| 486 |
The useful distinction is:
|
| 487 |
|
| 488 |
- direct Qwen3-Omni inputs: RGB/fisheye video, embedded MP4 audio, and language
|
|
@@ -505,6 +505,11 @@ for public README, website, or Hugging Face updates only after the validator
|
|
| 505 |
passes and `scripts/omni/package_verified_omni_result.py` creates a
|
| 506 |
public-safe derived-artifact package. The current verified package is listed in
|
| 507 |
[`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 508 |
|
| 509 |
### Sample Count Decision
|
| 510 |
|
|
@@ -544,13 +549,16 @@ Current status in this repo:
|
|
| 544 |
- gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
|
| 545 |
- selected_episode_plan: 128 source-balanced episodes, 96/16/16 train/val/test
|
| 546 |
- selected_download_size: 277.71 GiB excluding `visualization.rrd`
|
| 547 |
-
-
|
| 548 |
- selected_split: 96 train / 16 validation / 16 held-out test episodes
|
| 549 |
- exported_windows: 2,848 train / 512 validation / 448 test
|
| 550 |
- validation_samples_used: 512
|
| 551 |
- held_out_eval: 448 test windows from 14 exported test episodes
|
| 552 |
-
-
|
| 553 |
-
- current_quality_target: JSON validity
|
|
|
|
|
|
|
|
|
|
| 554 |
- gated dataset: available for selected multi-episode data preparation
|
| 555 |
- source_discovery: `results/omni_finetune/source_discovery.json`
|
| 556 |
- data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
|
|
@@ -716,12 +724,12 @@ windows without depending on Qwen chat-message records.
|
|
| 716 |
The public-safe verified package intentionally excludes raw data, base Qwen
|
| 717 |
weights, LoRA weights, and full checkpoints. Adapter upload is a separate step:
|
| 718 |
use it only when the intended adapter directory is present and the model card
|
| 719 |
-
clearly distinguishes older smoke weights from the selected-episode
|
| 720 |
-
|
| 721 |
|
| 722 |
```bash
|
| 723 |
python3 scripts/omni/upload_qwen3_omni_lora_to_hf.py \
|
| 724 |
-
--repo-id cy0307/ropedia-qwen3-omni-lora-
|
| 725 |
--source-dir /path/to/adapter_upload_package \
|
| 726 |
--message "Upload Xperience-10M Qwen3-Omni LoRA pilot"
|
| 727 |
```
|
|
|
|
| 74 |
| Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
|
| 75 |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split; companion simple/NN metadata baselines are also aligned to the selected 128-episode 96/16/16 split |
|
| 76 |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
|
| 77 |
+
| Scale-up path | The selected-episode Qwen3-Omni LoRA final diagnostic result is verified on the 96/16/16 split; same-split simple/NN metadata baselines now cover the 12 task ids as a companion comparison. The Qwen result proves the multi-episode export/train/eval/package loop and meets the strict-JSON target, but weak action/subtask metrics make it a baseline for error analysis rather than a strong model. Cosmos3/world-model and VLA/policy branches reuse the same split and package contract after their targets are implemented. |
|
| 78 |
| Public surfaces | GitHub repo, GitHub Pages dashboard, GHCR static-site package, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
|
| 79 |
|
| 80 |
For the fastest interpretation of the current metrics, start with
|
|
|
|
| 111 |
| Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
|
| 112 |
| Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split; the selected 128-episode setup also has same-split simple/NN metadata baselines for JSON-supported tasks. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [`results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md`](results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md) |
|
| 113 |
| Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
|
| 114 |
+
| Scale-up | The selected 128-episode Qwen3-Omni LoRA diagnostic path has a final verified held-out package: 96/16/16 selected episodes, 3,808 exported windows, 512 validation windows, 448 held-out test windows, and public-safe metrics/predictions. Same-split simple/NN metadata baselines are published for the 12 task ids, and the first Cosmos3-Nano future-window compatibility package is verified as a separate world-model branch. The final Qwen pass reaches 99.78% JSON validity, meeting the 98% target, while action/subtask quality remains weak and is the next error-analysis target. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/omni_model_comparison.json`](docs/data/omni_model_comparison.json), [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json), [`results/omni_finetune/OMNI_MODEL_COMPARISON.md`](results/omni_finetune/OMNI_MODEL_COMPARISON.md), [`results/omni_finetune/verified_public/`](results/omni_finetune/verified_public/), [`results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md`](results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md) |
|
| 115 |
|
| 116 |
Detailed dataset notes, reproduction checks, and generated JSON reports are
|
| 117 |
included for readers who want to inspect the implementation, but they are
|
|
|
|
| 133 |
| Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented |
|
| 134 |
| Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
|
| 135 |
| Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links |
|
| 136 |
+
| Qwen3-Omni multi-episode pilot | Final verified diagnostic result package exists for the selected 96/16/16 episode split; JSON validity meets the target, while action/subtask metrics remain weak |
|
| 137 |
| Raw Xperience-10M data / full Qwen weights | Not redistributed |
|
| 138 |
|
| 139 |
## 90-Second Research Project Path
|
|
|
|
| 152 |
| 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
|
| 153 |
| 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
|
| 154 |
| 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
|
| 155 |
+
| 11 | What is still pending? | [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json), [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | The final held-out diagnostic Qwen pass is verified and JSON-validity target is met; strong action/subtask model quality remains pending. |
|
| 156 |
|
| 157 |
A compact reader-path summary is available at
|
| 158 |
[`docs/data/project_packet.json`](docs/data/project_packet.json).
|
|
|
|
| 481 |
|
| 482 |
This repo includes a first Qwen3-Omni fine-tuning path over Xperience-10M. The
|
| 483 |
repository separates public-sample evidence from multi-episode fine-tuning
|
| 484 |
+
artifacts. The selected-episode held-out package is now verified as a
|
| 485 |
+
diagnostic result, not a strong final action/subtask model.
|
| 486 |
The useful distinction is:
|
| 487 |
|
| 488 |
- direct Qwen3-Omni inputs: RGB/fisheye video, embedded MP4 audio, and language
|
|
|
|
| 505 |
passes and `scripts/omni/package_verified_omni_result.py` creates a
|
| 506 |
public-safe derived-artifact package. The current verified package is listed in
|
| 507 |
[`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json).
|
| 508 |
+
The current cross-version comparison is generated at
|
| 509 |
+
[`docs/data/omni_model_comparison.json`](docs/data/omni_model_comparison.json)
|
| 510 |
+
and [`results/omni_finetune/OMNI_MODEL_COMPARISON.md`](results/omni_finetune/OMNI_MODEL_COMPARISON.md);
|
| 511 |
+
it separates the single-episode task suite, 128-episode aligned simple/NN
|
| 512 |
+
baselines, and verified Qwen3/Cosmos model-branch packages.
|
| 513 |
|
| 514 |
### Sample Count Decision
|
| 515 |
|
|
|
|
| 549 |
- gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
|
| 550 |
- selected_episode_plan: 128 source-balanced episodes, 96/16/16 train/val/test
|
| 551 |
- selected_download_size: 277.71 GiB excluding `visualization.rrd`
|
| 552 |
+
- verified_final_diagnostic_package: true
|
| 553 |
- selected_split: 96 train / 16 validation / 16 held-out test episodes
|
| 554 |
- exported_windows: 2,848 train / 512 validation / 448 test
|
| 555 |
- validation_samples_used: 512
|
| 556 |
- held_out_eval: 448 test windows from 14 exported test episodes
|
| 557 |
+
- final_train_loss / final_val_loss: 0.0277 / 0.0278
|
| 558 |
+
- current_quality_target: JSON validity 99.78%, meeting the 98% target; action/subtask quality remains weak
|
| 559 |
+
- qwen3_lora_adapter_repo: https://huggingface.co/cy0307/ropedia-qwen3-omni-lora-128ep
|
| 560 |
+
- 128_aligned_baselines: 12 task ids, 8 simple metadata/text baselines, 6 neural metadata/text baselines
|
| 561 |
+
- cosmos3_branch: verified Cosmos3-Nano future-window compatibility package, 378 held-out future-window predictions from 14 test episodes
|
| 562 |
- gated dataset: available for selected multi-episode data preparation
|
| 563 |
- source_discovery: `results/omni_finetune/source_discovery.json`
|
| 564 |
- data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
|
|
|
|
| 724 |
The public-safe verified package intentionally excludes raw data, base Qwen
|
| 725 |
weights, LoRA weights, and full checkpoints. Adapter upload is a separate step:
|
| 726 |
use it only when the intended adapter directory is present and the model card
|
| 727 |
+
clearly distinguishes older smoke weights from the final selected-episode
|
| 728 |
+
diagnostic run.
|
| 729 |
|
| 730 |
```bash
|
| 731 |
python3 scripts/omni/upload_qwen3_omni_lora_to_hf.py \
|
| 732 |
+
--repo-id cy0307/ropedia-qwen3-omni-lora-128ep \
|
| 733 |
--source-dir /path/to/adapter_upload_package \
|
| 734 |
--message "Upload Xperience-10M Qwen3-Omni LoRA pilot"
|
| 735 |
```
|