A multilingual public research surface for Xperience-10M: sample data, 20 embodied-AI tasks, baselines, Qwen3/Cosmos diagnostics, and foundation-model training directions.
English · 中文 · Español · Français · Deutsch · 日本語 · 한국어 · Português
**Ropedia Xperience-10M Task Suite** turns the public Xperience-10M sample into a readable embodied-AI benchmark surface. It keeps the evidence trail explicit: what is derived from the one public sample episode, what is evaluated on selected 128-episode held-out splits, what is mirrored to Hugging Face, and what still requires gated raw data or new model-specific evaluators. **Updated:** 2026-06-18. **Scope:** one public sample episode for the fully reproducible task suite; selected 128-episode public-safe artifacts for Qwen3-Omni, Cosmos3, metadata baselines, and raw-feature baselines. Raw Xperience-10M MP4/HDF5/RRD files, full Qwen weights, and gated data are not redistributed here. ## Contents - [How To Read This Project](#how-to-read-this-project) - [At A Glance](#at-a-glance) - [Fast Reader Map](#fast-reader-map) - [Why This Project Exists](#why-this-project-exists) - [Start Here](#start-here) - [Current Research Scope](#current-research-scope) - [Evaluation Protocol](#evaluation-protocol) - [Dataset Context](#dataset-context) - [Reproducibility](#reproducibility) - [Citation](#citation) ## How To Read This Project Use the first two tables to orient yourself, then jump to the evidence artifact that matches your question. The dashboard is the best visual overview; the GitHub repo is the source of truth for scripts and generated JSON; Hugging Face mirrors contain public-safe cards, metrics, figures, and model artifacts. The multilingual README files are reader guides. The canonical technical evidence is still the committed task contracts, result matrices, validation JSON, and public-safe result packages. ## At A Glance| Signal | Current public state |
|---|---|
| 20 task contracts | Action, procedure, transition, trajectory, contact, objects, language, retrieval, reconstruction, order, sync, long-horizon forecasting, interaction text, action-object binding, sensor bridging, camera sync, and transition timing. |
| 180 method-task records | 9 methods x 20 tasks. The current public matrix is complete at 180/180 scored records, with proxy flags kept visible where a compact substitute target is used. |
| Public-sample baselines | Minimal and Neural MLP baselines cover all 20 tasks on the one public sample episode. |
| 128-episode comparison layer | Metadata/simple, metadata/NN, raw-feature simple, raw-feature NN, Qwen3-Omni, Cosmos3-Super, and Cosmos3-Nano branches are separated by evidence type. |
| Foundation directions | Spatial intelligence, human-video world modeling, and vision-language-action pipelines are documented as trainable directions with task mappings and model-evidence requirements. |
| Public mirrors | GitHub, GitHub Pages, HF Space, HF artifact dataset, HF baseline model repo, Qwen3/Cosmos model repos, and HF collection. |
| Reader goal | Start here | Then inspect |
|---|---|---|
| Understand quickly | Project brief Project status |
Dashboard |
| Choose the public surface | Public reader map | public_reader_map.json |
| Inspect the 20 tasks | TASK_SUITE_20.md | task_suite_20.json task walkthroughs |
| Compare results | Research takeaways | 20-result matrix radar JSON gap audit |
| Understand one sample | Single-episode explorer | raw sample file map feature manifest |
| Read foundation directions | Three foundation pipelines | three_foundation_pipelines.json foundation model plan |
| Reproduce or audit | Reproducibility Evidence contract |
quality gates publication audit mirror parity |
| Capability | What this project shows |
|---|---|
| Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals. |
| Task design | Defines 20 human-readable tasks in one unified public-sample suite, plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs. |
| Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates sample evidence from held-out claims. |
| Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model branches, policy-model branches, and the future Xperience-native foundation-model pretraining goal. |
| Reader goal | Best entry point |
|---|---|
| Choose the right public surface | PUBLIC_READER_MAP.md public_reader_map.json |
| Understand the whole project quickly | PROJECT_BRIEF.md |
| See the visual research dashboard | GitHub Pages dashboard |
| Navigate the unified 20 tasks, four tracks, and scale-up plan | Interactive research roadmap TASK_SUITE_20.md task_suite_20.json research_roadmap_interactive.json |
| Compare current task metrics | RESEARCH_TAKEAWAYS.md summary_metrics.json |
| Compare possible foundation backbones | FOUNDATION_MODEL_PLAN.md foundation_model_plan.json |
| Understand the future native pretraining goal | XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md |
| See additional concrete project directions | ADDITIONAL_DEVELOPMENT_DIRECTIONS.md additional_development_directions.json |
| Understand one model input | feature_manifest.json windows.csv |
| Check multi-episode data status | DATA_ACCESS_STATUS.md |
| Surface | What it is for |
|---|---|
| GitHub repo | Source of truth for docs, scripts, generated JSON, validators, and commit history. |
| GitHub Pages dashboard | Best visual overview of the sample, 20 tasks, radar results, foundation directions, and resources. |
| Hugging Face Space | Hub-hosted copy of the dashboard and static app assets. |
| HF artifact dataset | Public-safe metrics, reports, website JSON, result packages, and derived evidence files. |
| HF baseline model repo | Minimal/neural baseline weights, figures, metrics, and mirrored task artifacts. |
| Qwen3/Cosmos model repos | Adapter-specific public weights or package cards when a model branch is verified and publishable. |
| Theme | Current implementation |
|---|---|
| Dataset slice | One public Xperience-10M sample episode, 5,821 frames, 1,161 windows, and an 8,546-dimensional representation. |
| Modalities | Video, audio, depth, camera pose/SLAM, hand/body mocap, IMU, calibration, and language annotations. |
| Task suite | 20 human-readable tasks form one embodied-AI public-sample suite; tasks 1-12 are the original contracts and tasks 13-20 reuse the same windows, split discipline, and minimal/neural head pattern. |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split; companion simple/NN metadata baselines are also aligned to the selected 128-episode 96/16/16 split. |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
| Scale-up path |
|
| Public surfaces | GitHub repo, GitHub Pages dashboard, GHCR static-site package, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection. |
| Layer | Current scope | Where to start |
|---|---|---|
| Data understanding | One public Xperience-10M sample episode is converted into 5,821 frames, 1,161 aligned windows, and an 8,546-dimensional multimodal representation. | PROJECT_BRIEF.md PROJECT_STATUS.md |
| Task suite |
Twenty human-readable tasks cover recognition, prediction, retrieval, reconstruction, synchronization, long-horizon forecasting, interaction text, action-object binding, sensor bridging, camera sync, and transition timing.
Tasks 13-20 keep the historical tier2_task_suite artifact path for link stability, but they are part of the same suite.
|
TASK_SUITE_20.md task_suite_20.json RESEARCH_TAKEAWAYS.md summary_report.json TIER2_TASK_BASELINES.md |
| Baselines | Minimal heads and compact PyTorch MLP heads provide a controlled single-episode comparison on the same chronological split. The selected 128-episode setup adds same-split metadata simple/NN baselines for JSON-supported tasks and raw-feature simple/NN baselines on all 20 task axes. Tasks 15 and 19 are explicitly marked as compact-proxy completions. |
neural_mlp/ BASELINE_ALIGNMENT_REPORT.md raw20 run summary |
| Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | AUDIO_ABLATION_SUMMARY.md single_episode_explorer.html |
| Scale-up |
|
RESEARCH_ROADMAP.md FOUNDATION_MODEL_PLAN.md XPERIENCE10M_128_EPISODE_FEATURE_INDEX.md xperience10m_128_episode_feature_index.json TASK_SUITE_ENHANCEMENT_128.md task_suite_enhancement_128.json omni_model_comparison.json omni_finetune_verified_result.json qwen3_v5_v6_comparison.json QWEN3_V5_V6_COMPARISON_20260614.md OMNI_MODEL_COMPARISON.md verified_public/ task_suite_enhancement_128_v1_20260608/ |
| Area | Current decision |
|---|---|
| Public-sample pipeline | Verified on one public sample episode: 5,821 frames, 1,161 windows, 8,546 dimensions. |
| 20-task suite | Verified minimal baselines with committed metrics, predictions, and manifests. |
| Neural heads | Verified compact PyTorch MLP heads over the same task contracts and chronological splits. |
| Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented. |
| Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics. |
| Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links. |
| Qwen3-Omni multi-episode pilot | Final verified diagnostic result package exists for the selected 96/16/16 episode split; JSON validity meets the target, while action/subtask metrics remain weak. |
| Raw data / full Qwen weights | Raw Xperience-10M data and full Qwen weights are not redistributed. |
| Step | Question | Primary artifacts | What should be true |
|---|---|---|---|
| 1 | What is this project? | PROJECT_BRIEF.md PROJECT_STATUS.md Dashboard | A public-sample Xperience-10M research project with 20 tasks, baselines, and a scale-up plan. |
| 2 | What data is used? | Dataset-card alignment Official HF dataset Sample HF dataset | The implemented suite uses one public sample episode; the gated dataset is reserved for selected multi-episode training. |
| 3 | What does one model input contain? | windows.csv feature_manifest.json available_modalities.json | Each window is an aligned multimodal unit with video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals. |
| 4 | What are the 20 tasks? | TASK_SUITE_20.md task_suite_20.json task walkthroughs task_walkthroughs.json | Every task has a human-readable name, input, output, metric, baseline scores, and an explicit artifact path. |
| 5 | How are tasks evaluated? | EVALUATION_PROTOCOL.md evaluation_protocol.json | The window unit, chronological split, leakage controls, task metrics, and current limitations are explicit. |
| 6 | What do current results mean? | RESEARCH_TAKEAWAYS.md research_takeaways.json summary_metrics.json | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
| 7 | Which models are implemented? | summary_report.json neural_mlp/ HF baseline repo | Each task has minimal and neural-head evidence over the same feature windows. |
| 8 | What research directions does this support? | RESEARCH_ROADMAP.md research_directions.json research_direction_extensions.json task_suite_20.json | The unified tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
| 9 | Which foundation model comes next? | FOUNDATION_MODEL_PLAN.md foundation_model_plan.json Native pretraining plan | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 has Nano compatibility and Super forward-dynamics LoRA; policy models wait for robot-compatible action targets. |
| 10 | How can the 128-episode suite be pushed without more data? | TASK_SUITE_ENHANCEMENT_128.md task_suite_enhancement_128.json | The enhancement pack proposes dense windows, hierarchical action/subtask labels, raw-feature shard priorities, and multiscale_20s10_40s20_80s40 as the next export target. |
| 11 | How do I reproduce it? | REPRODUCIBILITY.md reproducibility_audit.md | Public commands and expected outputs are documented for the sample-episode task suite. |
| 12 | What is still pending? | omni_finetune_verified_result.json DATA_ACCESS_STATUS.md MULTI_EPISODE_ACCESS_STATUS.md | The final held-out diagnostic Qwen pass is verified and JSON-validity target is met; strong action/subtask model quality remains pending. |
| Layer | What to inspect | Why it matters |
|---|---|---|
| Project status | PROJECT_STATUS.md project_status.json | Gives a one-table current project summary before reading the full artifact trail. |
| Data contract | windows.csv feature_manifest.json modality manifests | Confirms what each sample window contains before modeling. |
| Dataset context | XPERIENCE10M_DATASET_CARD_ALIGNMENT.md official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses. |
| Visual assets | FIGURE_INDEX.md docs/assets/ | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets. |
| Evaluation protocol | EVALUATION_PROTOCOL.md evaluation_protocol.json | Defines the task unit, split, metrics, leakage controls, and current limitations. |
| Research roadmap | RESEARCH_ROADMAP.md research_roadmap.json | Shows the path from sample-level task development to multi-episode work, larger model branches, and the future native-pretraining goal. |
| Additional development directions | ADDITIONAL_DEVELOPMENT_DIRECTIONS.md additional_development_directions.json | Records concrete non-backbone tracks: taxonomy, benchmark protocol, representation learning, skill graphs, affordances, 3D/4D memory, QA, and policy transfer. |
| Xperience Embodied Foundation Model plan | XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md | Describes the long-term full-corpus pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol. |
| Minimal heads | softmax ridge projection/regression multi-label logistic heads | Keeps every input/output contract visible and inspectable. |
| Neural heads | PyTorch MLP classifiers/regressors under neural_mlp/ | Checks whether nonlinear heads improve each task without changing features. |
| Evidence | metrics predictions confusion matrices diagrams dashboard | Makes the single-episode task development inspectable without rerunning first. |
| Artifact guide | ARTIFACT_GUIDE.md | Groups the public evidence into research-project layers after the first-pass overview. |
| Reproducibility contract | REPRODUCIBILITY.md reproducibility_matrix.json | States public commands, expected outputs, exact-match reproduction evidence, and non-reproducible boundaries. |
| Citation metadata | CITATION.cff codemeta.json LICENSE | Makes the repo easier to cite, index, and reuse without confusing code license and dataset terms. |