--- license: mit library_name: pytorch tags: - embodied-ai - robotics - multimodal - xperience-10m - baseline - evaluation - qwen3-omni - cosmos datasets: - ropedia-ai/xperience-10m-sample - ropedia-ai/xperience-10m metrics: - accuracy - f1 - precision - recall ---
A multilingual public research surface for Xperience-10M: sample data, 20 embodied-AI tasks, baselines, Qwen3-Omni and Cosmos3 diagnostics, and foundation-model training directions.
English · 中文 · Español · Français · Deutsch · 日本語 · 한국어 · Português
**Ropedia Xperience-10M Task Suite** has two public evidence lines. **Line 1** is the 1-sample task lab for raw-file inspection, task construction, and reproducibility. **Line 2** is the selected-128 comparison surface for aligned metadata/raw baselines, Qwen3-Omni v6 LoRA, Cosmos3-Super Reasoner, and Cosmos3-Nano Future Window. Every score points to a source artifact and keeps direct-vs-proxy status visible. **Updated:** 2026-06-21. **Scope:** Line 1 uses one public sample episode. Line 2 uses selected 128-episode public-safe artifacts linked back to official gated episode paths. Raw Xperience-10M MP4/HDF5/RRD files, Qwen3 base weights, Cosmos3 base weights, and gated data are not redistributed here. ## Contents - [How To Read This Project](#how-to-read-this-project) - [At A Glance](#at-a-glance) - [Two Evidence Lines](#two-evidence-lines) - [Fast Reader Map](#fast-reader-map) - [Why This Project Exists](#why-this-project-exists) - [Start Here](#start-here) - [Glossary](#glossary) - [Current Research Scope](#current-research-scope) - [Evaluation Protocol](#evaluation-protocol) - [Dataset Context](#dataset-context) - [Reproducibility](#reproducibility) - [Citation](#citation) ## How To Read This Project Use the two evidence lines first, then choose the artifact that answers your question. The dashboard is the best visual overview; the GitHub repo is the source of truth for scripts and generated JSON; Hugging Face mirrors contain public-safe cards, metrics, figures, and model artifacts. Quick rule: use **Line 1** for “can I inspect and reproduce the task?” Use **Line 2** for “how do aligned baselines and model diagnostics compare on the selected 128 episodes?” The multilingual README files are reader guides. The canonical technical evidence is still the committed task contracts, result matrices, validation JSON, and public-safe result packages. ## At A Glance| Signal | Current public state |
|---|---|
Project identity![]() |
The same logo mark is used across the GitHub README, GitHub Pages dashboard, Hugging Face Space, artifact dataset, model mirrors, favicon, and social preview. Reusable assets: logo mark and social card. |
| Two-line contract | Line 1: 1 sample episode for task construction and reproducibility. Line 2: 128 selected episodes for same-split metadata/raw baselines, Qwen3-Omni v6, and Cosmos3 diagnostics. |
| 180 method-task records | 9 methods x 20 tasks = 180/180 scored records. The ledger separates 174 direct scores from 6 compact-proxy scores. |
| 20 task contracts | Action, procedure, transition, trajectory, contact, objects, language, retrieval, reconstruction, order, sync, long-horizon forecasting, interaction text, action-object binding, sensor bridging, camera sync, and transition timing. |
| Line 1 methods | Minimal and Neural MLP baselines cover all 20 tasks on the one public sample episode: 40/40 direct scores. |
| Line 2 methods | Metadata simple/NN, raw-feature simple/NN, Qwen3-Omni v6 LoRA, Cosmos3-Super Reasoner, and Cosmos3-Nano Future Window cover all 20 selected-128 task axes: 140/140 scores. |
| Foundation directions | Spatial intelligence, human-video world modeling, and vision-language-action pipelines are documented as trainable directions with task mappings and model-evidence requirements. |
| Public mirrors | GitHub, GitHub Pages, HF Space, HF artifact dataset, HF baseline model repo, Qwen3-Omni and Cosmos3 model repos, and HF collection. |
| Line | Data unit | Score statement | Best use | Read separately from |
|---|---|---|---|---|
| 1 sample episode | One public Xperience-10M sample episode: 5,821 frames, 1,161 aligned 20-frame windows, 8,546 feature dimensions. | 40/40 direct scores from Minimal and Neural MLP heads. | Inspect the raw sample, understand file organization, reproduce the 20 task targets, and compare Minimal vs Neural MLP behavior inside one episode. | The selected-128 comparison rows and any broader held-out model behavior. |
| 128 selected episodes | Selected held-out 96/16/16 split: 34,269 exported windows with public-safe processed features linked to official gated episode paths. The Hugging Face artifact dataset exposes these rows separately as selected_128_windows/selected_128; it is not mixed with the one-sample episode_sample/public_sample viewer. |
140/140 selected-128 scores: 134 direct + 6 compact-proxy. | Compare same-split metadata/raw baselines, Qwen3-Omni v6, Cosmos3-Super, and Cosmos3-Nano while keeping the 6 compact-proxy cells visible. | Direct raw-target measurements for the proxy-marked cells. |
| Line | Methods | Tasks | Scored records | Direct scores | Proxy scores |
|---|---|---|---|---|---|
| 1 sample episode | 2 | 20 | 40/40 | 40 | 0 |
| 128 selected episodes | 7 | 20 | 140/140 | 134 | 6 compact-proxy scores, each source-linked and reasoned. |
| Total public matrix | 9 | 20 | 180/180 | 174 | 6 |
| Evidence line | Method block | Methods | Score statement | Read as |
|---|---|---|---|---|
| 1 sample episode | Task-head baselines | Minimal; Neural MLP | 40/40 direct scores. | Task-lab reproducibility and simple-vs-neural behavior. |
| 128 selected episodes | Aligned baseline heads | Metadata simple/NN; raw-feature simple/NN | 80/80 scores: 74 direct + 6 compact-proxy. | Same-split metadata/raw-feature baseline comparison. |
| 128 selected episodes | Qwen3-Omni series | Qwen3-Omni v6 LoRA | 20/20 direct scores from verified selected-128 Qwen3-Omni LoRA and task-specific probes. | Trainable Qwen3-Omni diagnostic baseline on the selected-128 surface. |
| 128 selected episodes | Cosmos3 series | Cosmos3-Super Reasoner; Cosmos3-Nano Future Window | 40/40 direct scores from verified public-safe reasoner and future-window artifacts. | Cosmos3 reasoner and future-window diagnostics on the selected-128 surface. |
| Run | Purpose | Main change | Eval signal | Use now |
|---|---|---|---|---|
| v1 | Prove the selected-128 LoRA/eval/package loop. | First verified 96/16/16 selected-episode Qwen3-Omni LoRA run. | 448 eval; JSON 0.8750; contact 0.6451. | Lineage only. |
| v2 | Make answers schema-checked. | Structured-JSON contract with full-8-GPU LoRA on the same split. | 448 eval; JSON 0.9978; contact 0.7188. | Structured-output ablation. |
| v3 | Separate prompt/eval effects from training. | Strict-label prompt/eval over the v2 adapter; no new adapter training. | 448 eval; JSON 1.0000; contact 0.7210. | Prompt/eval ablation. |
| v4 | Test longer structured-JSON LoRA training. | New four-epoch full-8-GPU adapter on the same selected split. | 448 eval; JSON 1.0000; contact 0.7299. | Overfit/metric-tradeoff evidence. |
| v5 | Move to denser multiscale evaluation. | Multiscale cap96 export with 4,032 held-out predictions. | 4,032 eval; JSON 1.0000; contact 0.7865. | Pinned prior release; stronger on several non-contact metrics. |
| v6 | Publish the current Qwen 20-task row. | Rank64/lr5e-5 multiscale LoRA plus verified task-specific probes. | 4,032 eval; JSON 0.9990; contact 0.8177. | Current public 20-task Qwen3-Omni row. |
| Reader goal | Start here | Then inspect |
|---|---|---|
| Understand quickly | Project brief Project status |
Dashboard |
| Choose the public surface | Public reader map | public_reader_map.json |
| Decode project terms | Glossary | glossary.json |
| Inspect the 20 tasks | TASK_SUITE_20.md | task_suite_20.json task walkthroughs |
| Compare results | Research takeaways | two-line result summary 20-result matrix radar JSON score/proxy audit |
| Understand one sample | Single-episode explorer | raw sample file map feature manifest |
| Read foundation directions | Three foundation pipelines | three_foundation_pipelines.json foundation model plan |
| Reproduce or audit | Reproducibility Evidence contract |
quality gates publication audit mirror parity |
| Capability | What this project shows |
|---|---|
| Multimodal data understanding | Parses the public sample into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals. |
| Task design | Defines 20 human-readable tasks in one unified public-sample suite, plus four direction-extension probes with inputs, outputs, process modules, metrics, and case-study walkthroughs. |
| Model and evaluation discipline | Runs minimal and compact neural baselines, records predictions/metrics, keeps chronological split boundaries explicit, and separates the sample readout from held-out comparison rows. |
| Scale-up planning | Connects the public-sample pipeline to 32/128-episode held-out pilots, Qwen3-Omni LoRA, Cosmos-style world-model tracks, policy/VLA tracks, and the future Xperience-native foundation-model pretraining goal. |
| Reader goal | Best entry point |
|---|---|
| Choose the right public surface | PUBLIC_READER_MAP.md public_reader_map.json |
| Resolve confusing terms and abbreviations | GLOSSARY.md glossary.json |
| Understand the whole project quickly | PROJECT_BRIEF.md |
| See the visual research dashboard | GitHub Pages dashboard |
| Navigate the unified 20 tasks, four tracks, and scale-up plan | Interactive research roadmap TASK_SUITE_20.md task_suite_20.json research_roadmap_interactive.json |
| Compare current task metrics | RESEARCH_TAKEAWAYS.md summary_metrics.json |
| Compare possible foundation backbones | FOUNDATION_MODEL_PLAN.md foundation_model_plan.json |
| Understand the future native pretraining goal | XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md |
| See additional concrete project directions | ADDITIONAL_DEVELOPMENT_DIRECTIONS.md additional_development_directions.json |
| Understand one model input | feature_manifest.json windows.csv |
| Check multi-episode data status | DATA_ACCESS_STATUS.md |
| Surface | What it is for |
|---|---|
| GitHub repo | Source of truth for docs, scripts, generated JSON, validators, and commit history. |
| GitHub Pages dashboard | Best visual overview of the sample, 20 tasks, radar results, foundation directions, and resources. |
| Hugging Face Space | Hub-hosted copy of the dashboard and static app assets. |
| HF artifact dataset | Public-safe metrics, reports, website JSON, result packages, and derived evidence files. |
| HF baseline model repo | Minimal/neural baseline weights, figures, metrics, and mirrored task artifacts. |
| Qwen3-Omni and Cosmos3 model repos | Adapter-specific public weights or package cards when Qwen3-Omni v6, Cosmos3-Super, or Cosmos3-Nano runs are verified and publishable. |
| Theme | Current implementation |
|---|---|
| Dataset slice | One public Xperience-10M sample episode, 5,821 frames, 1,161 windows, and an 8,546-dimensional representation. |
| Modalities | Video, audio, depth, camera pose/SLAM, hand/body mocap, IMU, calibration, and language annotations. |
| Task suite | 20 human-readable tasks form one embodied-AI public-sample suite with shared windowing, split discipline, leakage controls, and minimal/neural head pattern. |
| Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split; companion simple/NN metadata baselines are also aligned to the selected 128-episode 96/16/16 split. |
| Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
| Scale-up path |
|
| Public surfaces | GitHub repo, GitHub Pages dashboard, GHCR static-site package, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection. |
| Layer | Current scope | Where to start |
|---|---|---|
| Data understanding | One public Xperience-10M sample episode is converted into 5,821 frames, 1,161 aligned windows, and an 8,546-dimensional multimodal representation. | PROJECT_BRIEF.md PROJECT_STATUS.md |
| Task suite |
Twenty human-readable tasks cover recognition, prediction, retrieval, reconstruction, synchronization, long-horizon forecasting, interaction text, action-object binding, sensor bridging, camera sync, and transition timing.
Historical tier2_task_suite artifact paths are kept for link stability, but they are provenance paths inside the same suite.
|
TASK_SUITE_20.md task_suite_20.json RESEARCH_TAKEAWAYS.md summary_report.json TIER2_TASK_BASELINES.md |
| Baselines | Minimal heads and compact PyTorch MLP heads provide a controlled single-episode comparison on the same chronological split. The selected 128-episode setup adds same-split metadata simple/NN baselines for JSON-supported tasks and raw-feature simple/NN baselines on all 20 task axes. Tasks 15 and 19 are explicitly marked as compact-proxy completions. |
neural_mlp/ BASELINE_ALIGNMENT_REPORT.md raw20 run summary |
| Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | AUDIO_ABLATION_SUMMARY.md single_episode_explorer.html |
| Scale-up |
|
RESEARCH_ROADMAP.md FOUNDATION_MODEL_PLAN.md XPERIENCE10M_128_EPISODE_FEATURE_INDEX.md xperience10m_128_episode_feature_index.json TASK_SUITE_ENHANCEMENT_128.md task_suite_enhancement_128.json omni_model_comparison.json omni_finetune_verified_result.json qwen3_v5_v6_comparison.json QWEN3_V5_V6_COMPARISON_20260614.md OMNI_MODEL_COMPARISON.md verified_public/ task_suite_enhancement_128_v1_20260608/ |
| Area | Current decision |
|---|---|
| Public-sample pipeline | Verified on one public sample episode: 5,821 frames, 1,161 windows, 8,546 dimensions. |
| 20-task suite | Verified minimal baselines with committed metrics, predictions, and manifests. |
| Neural heads | Verified compact PyTorch MLP heads over the same task contracts and chronological splits. |
| Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented. |
| Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics. |
| Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links. |
| Qwen3-Omni multi-episode pilot | Final verified diagnostic result package exists for the selected 96/16/16 episode split; JSON validity meets the target, while action/subtask metrics remain weak. |
| Raw data / full Qwen weights | Raw Xperience-10M data and full Qwen weights are not redistributed. |
| Step | Question | Primary artifacts | What should be true |
|---|---|---|---|
| 1 | What is this project? | PROJECT_BRIEF.md PROJECT_STATUS.md Dashboard | A public-sample Xperience-10M research project with 20 tasks, baselines, and a scale-up plan. |
| 2 | What data is used? | Dataset-card alignment Official HF dataset Sample HF dataset | The implemented suite uses one public sample episode; the gated dataset is reserved for selected multi-episode training. |
| 3 | What does one model input contain? | windows.csv feature_manifest.json available_modalities.json | Each window is an aligned multimodal unit with video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals. |
| 4 | What are the 20 tasks? | TASK_SUITE_20.md task_suite_20.json task walkthroughs task_walkthroughs.json | Every task has a human-readable name, input, output, metric, baseline scores, and an explicit artifact path. |
| 5 | How are tasks evaluated? | EVALUATION_PROTOCOL.md evaluation_protocol.json | The window unit, chronological split, leakage controls, task metrics, and current limitations are explicit. |
| 6 | What do current results mean? | RESEARCH_TAKEAWAYS.md research_takeaways.json summary_metrics.json | Current metrics describe sample-level task behavior and identify which signals need larger held-out experiments. |
| 7 | Which models are implemented? | summary_report.json neural_mlp/ HF baseline repo | Each task has minimal and neural-head evidence over the same feature windows. |
| 8 | What research directions does this support? | RESEARCH_ROADMAP.md research_directions.json research_direction_extensions.json task_suite_20.json | The unified tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
| 9 | Which foundation model comes next? | FOUNDATION_MODEL_PLAN.md foundation_model_plan.json Native pretraining plan | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 has Nano compatibility and Super forward-dynamics LoRA; policy models wait for robot-compatible action targets. |
| 10 | How can the 128-episode suite be pushed without more data? | TASK_SUITE_ENHANCEMENT_128.md task_suite_enhancement_128.json | The enhancement pack proposes dense windows, hierarchical action/subtask labels, raw-feature shard priorities, and multiscale_20s10_40s20_80s40 as the next export target. |
| 11 | How do I reproduce it? | REPRODUCIBILITY.md reproducibility_audit.md | Public commands and expected outputs are documented for the sample-episode task suite. |
| 12 | What is still pending? | omni_finetune_verified_result.json DATA_ACCESS_STATUS.md MULTI_EPISODE_ACCESS_STATUS.md | The final held-out diagnostic Qwen pass is verified and JSON-validity target is met; strong action/subtask model quality remains pending. |
| View | What to inspect | Why it matters |
|---|---|---|
| Project status | PROJECT_STATUS.md project_status.json | Gives a one-table current project summary before reading the full artifact trail. |
| Data contract | windows.csv feature_manifest.json modality manifests | Confirms what each sample window contains before modeling. |
| Dataset context | XPERIENCE10M_DATASET_CARD_ALIGNMENT.md official dataset links | Explains the official dataset, public sample, modalities, access boundary, and what this repo uses. |
| Visual assets | FIGURE_INDEX.md docs/assets/ | Shows the task-suite graphic, modality thumbnails, pipeline diagrams, charts, and logo assets. |
| Evaluation protocol | EVALUATION_PROTOCOL.md evaluation_protocol.json | Defines the task unit, split, metrics, leakage controls, and current limitations. |
| Research roadmap | RESEARCH_ROADMAP.md research_roadmap.json | Shows the path from sample-level task development to multi-episode work, larger model tracks, and the future native-pretraining goal. |
| Additional development directions | ADDITIONAL_DEVELOPMENT_DIRECTIONS.md additional_development_directions.json | Records concrete non-backbone tracks: taxonomy, benchmark protocol, representation learning, skill graphs, affordances, 3D/4D memory, QA, and policy transfer. |
| Xperience Embodied Foundation Model plan | XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md | Describes the long-term full-corpus pretraining goal, target modules, objectives, staged scale-up, hardware ranges, and evaluation protocol. |
| Minimal heads | softmax ridge projection/regression multi-label logistic heads | Keeps every input/output contract visible and inspectable. |
| Neural heads | PyTorch MLP classifiers/regressors under neural_mlp/ | Checks whether nonlinear heads improve each task without changing features. |
| Evidence | metrics predictions confusion matrices diagrams dashboard | Makes the single-episode task development inspectable without rerunning first. |
| Artifact guide | ARTIFACT_GUIDE.md | Groups the public evidence into reader-facing views after the first-pass overview. |
| Reproducibility contract | REPRODUCIBILITY.md reproducibility_matrix.json | States public commands, expected outputs, exact-match reproduction evidence, and non-reproducible boundaries. |
| Citation metadata | CITATION.cff codemeta.json LICENSE | Makes the repo easier to cite, index, and reuse without confusing code license and dataset terms. |