Robotics
PyTorch
Cosmos
xperience10m_task_baseline_suite
embodied-ai
multimodal
xperience-10m
baseline
evaluation
qwen3-omni
Instructions to use cy0307/ropedia-xperience-10m-task-baselines with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use cy0307/ropedia-xperience-10m-task-baselines with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| # Project Brief | |
| This project turns the public Ropedia Xperience-10M sample into a concrete | |
| research task lab for embodied AI. It is designed to answer a practical | |
| question: what can be built, measured, and extended from a richly synchronized | |
| egocentric episode before scaling to held-out multi-episode training? | |
| ## Research Intent | |
| The public sample is treated as a small but real research system. The project | |
| does not try to inflate one episode into a final benchmark. Instead, it shows | |
| the full path from data inspection to task design, baseline modeling, | |
| evaluation, artifact packaging, and a guarded scale-up plan. A reader should be | |
| able to trace one model input, understand each task, reproduce the public-sample | |
| results, and see what remains before multi-episode model-quality claims. | |
| ## Capability Map | |
| | Capability | Evidence in this project | | |
| | --- | --- | | |
| | Data understanding | `feature_manifest.json`, `available_modalities.json`, modality atlas, episode-window HF viewer | | |
| | Task design | 20 unified task contracts, task cards, case-study walkthroughs, and four research-direction extension probes | | |
| | Evaluation rigor | chronological split, per-task metrics, predictions, confusion matrices, leakage notes, and generated takeaways | | |
| | Scale-up planning | Final verified 96/16/16 Qwen3-Omni diagnostic result, same-split 128-episode baseline alignment, Cosmos3-Nano compatibility branch, and policy-model candidates after action-space conversion | | |
| ## What Exists Now | |
| | Layer | Current artifact | | |
| | --- | --- | | |
| | Data unit | 1 public sample episode, 5,821 frames, 1,161 synchronized 20-frame windows | | |
| | Modalities | Video-derived features, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived features | | |
| | Task suite | 20 embodied-AI task contracts with inputs, targets, metrics, predictions, and setup alignment | | |
| | Models | Minimal linear/ridge/logistic baselines plus compact PyTorch MLP heads for the unified 20-task public-sample suite | | |
| | Research map | Four Ropedia research directions with direct, proxy, diagnostic, and extension-task coverage | | |
| | Scale-up path | A selected 96/16/16 Qwen3-Omni LoRA final diagnostic result is verified; strict-JSON validity meets target, while weak action/subtask metrics guide the next error-analysis pass | | |
| ## How To Read It | |
| 1. Start with `PUBLIC_READER_MAP.md` if you need to choose between GitHub, | |
| the website, Hugging Face artifacts, baseline weights, model branches, or | |
| release-health files. | |
| 2. Start with the website or this brief to understand the project shape. | |
| 3. Open `RESEARCH_ROADMAP.md` to see how the work scales from the public | |
| sample to multi-episode modeling. | |
| 4. Open `EVALUATION_PROTOCOL.md` before comparing task scores. | |
| 5. Use `RESEARCH_TAKEAWAYS.md` for the current metric interpretation. | |
| 6. Inspect `results/episode_task_suite/feature_manifest.json` to understand one model input. | |
| 7. Use `TASK_SUITE_20.md` and `docs/data/task_suite_20.json` to read the unified 20-task suite; the historical `docs/data/tier2_task_suite.json` path stores the tasks 13-20 result bundle. | |
| 8. Use `docs/data/omni_finetune_verified_result.json` for the current multi-episode Qwen3-Omni pilot result. | |
| ## What This Enables | |
| The public sample is enough to build and verify task definitions, feature | |
| contracts, metrics, visualization, and baseline code. It is not enough to | |
| measure final model quality for a general embodied-AI model. The first | |
| multi-episode Qwen3-Omni diagnostic pilot now verifies the held-out training | |
| loop with validation loss recorded; the next research stage is to improve | |
| JSON-format reliability and error analysis before larger robustness or | |
| alternative backbone claims. | |
| ## Best Entry Points | |
| | Entry point | Link | | |
| | --- | --- | | |
| | Public reader map | https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite/blob/main/PUBLIC_READER_MAP.md | | |
| | Visual dashboard | https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/ | | |
| | Interactive HF Space | https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite | | |
| | Derived artifacts | https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts | | |
| | Baseline model bundle | https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines | | |
| | Official Xperience-10M dataset | https://huggingface.co/datasets/ropedia-ai/xperience-10m | | |