two evidence lines / 180 scored records

Ropedia Xperience-10M Task Suite.

The public suite has two evidence lines. Line 1 uses one public sample episode to make the 20-task lab inspectable and reproducible. Line 2 uses 128 selected episodes to compare aligned baselines, Qwen3-Omni v6 LoRA, Cosmos3-Super Reasoner, and Cosmos3-Nano Future Window. The public matrix is complete at 180/180 scored method-task records, with six compact-proxy cells explicitly marked.

line 1 / public sample

1 sample episode: task lab

One public episode becomes aligned windows, task targets, Minimal heads, and Neural MLP heads.

best use

Inspect the sample files, task targets, and local baseline runs.

read separately from

Selected-128 comparison rows and held-out model behavior.

5,821frames 1,16120-frame windows 40/40direct task scores
Open sample line
line 2 / 128 selected episodes

128 selected episodes: comparison surface

Seven methods share the selected-episode surface and the same 20 task axes.

best use

Compare same-split baselines, Qwen3-Omni v6, and Cosmos3 rows.

read separately from

Direct raw-target metrics for the 6 proxy-marked cells.

128selected episodes 34,269exported windows 140/140134 direct + 6 proxy
Open 128-episode line
5,821frames in sample episode
1,16120-frame windows
8,546feature dimensions
20unified task contracts
home radar comparison 4 grouped panels / 180 scored records / 174 direct + 6 compact-proxy
Unified 20-task grouped radar board with method-family panels, task key, score counts, and raw128 proxy notes

Model comparison is grouped by method family.

The full SVG names every task axis, separates the nine methods into readable panels, and keeps source artifacts plus proxy notes attached to the same comparison view.

180method-task records
174direct scores
6compact-proxy scores
34,269128ep windows
Panel 1: Minimal + Neural MLPSingle public-sample episode; 40/40 direct task scores.
Panel 2: 128ep metadata/textAligned JSONL and staged-target baselines; 40/40 scored with proxy flags retained.
Panel 3: 128ep raw featuresSensor-block simple/NN heads; 40/40 scored with task 15/19 compact proxies marked.
Panel 4: Qwen3 + Cosmos3Qwen3-Omni v6 LoRA, Cosmos3-Super, and Cosmos3-Nano; 60/60 scored from verified artifacts.
All 20 radar task axes 01 Action Recognition 02 Procedure Step Recognition 03 Action Boundary Detection 04 Next-Action Prediction 05 Hand Trajectory Forecasting 06 Contact State Prediction 07 Object Relevance Prediction 08 Language Grounding 09 Cross-Modal Retrieval 10 Cross-Modal Reconstruction 11 Temporal Order Verification 12 Multimodal Synchronization Detection 13 Long-Horizon Next-Action Forecasting 14 Long-Horizon Next-Subtask Forecasting 15 Interaction Text Prediction 16 Action-Object Relation Prediction 17 Future Object-Set Forecasting 18 IMU-to-Hand Pose Reconstruction 19 Camera-View Synchronization Retrieval 20 Time-to-Next-Transition Regression

Two evidence lines: 1 episode and 128 episodes.

Read the suite as two lines. Line 1 proves the task lab is inspectable and reproducible. Line 2 compares selected-128 metadata/raw baselines, Qwen3-Omni v6 LoRA, Cosmos3-Super Reasoner, and Cosmos3-Nano Future Window. Keep the lines separate when interpreting scores.

Ropedia Xperience-10M project logo
About this public surface Ropedia Xperience-10M Task Suite

The mark identifies the shared public package across the GitHub repository, GitHub Pages dashboard, Hugging Face Space, artifact dataset, model mirrors, and social preview. Use this area as the project identity checkpoint before reading the 1-episode and selected-128 evidence lines.

Two evidence-line map showing 1 sample episode, 128 selected episodes, and the combined 180 scored method-task records
Line Data unit Score statement Best use Read separately from Start here
1 sample episode One public Xperience-10M sample episode; 5,821 frames; 1,161 aligned 20-frame windows; 8,546-dimensional feature contract. 40/40 direct scores from Minimal and Neural MLP heads. Raw sample inspection, file organization, task definitions, local reproduction, and controlled Minimal-vs-Neural baseline behavior. The selected-128 comparison rows and broader held-out model behavior. Raw browser
1-episode radar JSON
result summary JSON
line JSON
128 selected episodes Selected held-out 96/16/16 split; 34,269 exported windows; public-safe metadata/raw-feature artifacts linked to official gated episode paths. 140/140 selected-128 scores: 134 direct + 6 compact-proxy. Same-split comparison across metadata/raw baselines, Qwen3-Omni v6 LoRA, Cosmos3-Super, Cosmos3-Nano, and scale-up decisions. Direct raw-target interpretation for the proxy-marked cells. 128-episode radar JSON
feature index JSON
HF selected-128 windows
result summary JSON
line doc
Evidence line Method block Methods Score statement Read as
1 sample episode Task-head baselines Minimal; Neural MLP 40/40 direct scores. Task-lab reproducibility and simple-vs-neural behavior.
128 selected episodes Aligned baseline heads Metadata simple/NN; raw-feature simple/NN 80/80 scores: 74 direct + 6 compact-proxy. Same-split metadata/raw-feature baseline comparison.
128 selected episodes Qwen3-Omni series Qwen3-Omni v6 LoRA 20/20 direct scores from verified selected-128 Qwen3-Omni LoRA and task-specific probes. Trainable Qwen3-Omni diagnostic baseline on the selected-128 surface.
128 selected episodes Cosmos3 series Cosmos3-Super Reasoner; Cosmos3-Nano Future Window 40/40 direct scores from verified public-safe reasoner and future-window artifacts. Cosmos3 reasoner and future-window diagnostics on the selected-128 surface.

Cosmos3-Super Forward-Dynamics LoRA is published as a separate fine-tuned adapter with weights/results; it is not counted as a 20-task matrix method row.

Qwen run Purpose Main change Eval signal Use now
v1 Prove the selected-128 LoRA/eval/package loop. First verified 96/16/16 selected-episode Qwen3-Omni LoRA run. 448 eval; JSON 0.8750; contact 0.6451. Lineage only.
v2 Make answers schema-checked. Structured-JSON contract with full-8-GPU LoRA on the same split. 448 eval; JSON 0.9978; contact 0.7188. Structured-output ablation.
v3 Separate prompt/eval effects from training. Strict-label prompt/eval over the v2 adapter; no new adapter training. 448 eval; JSON 1.0000; contact 0.7210. Prompt/eval ablation.
v4 Test longer structured-JSON LoRA training. New four-epoch full-8-GPU adapter on the same selected split. 448 eval; JSON 1.0000; contact 0.7299. Overfit/metric-tradeoff evidence.
v5 Move to denser multiscale evaluation. Multiscale cap96 export with 4,032 held-out predictions. 4,032 eval; JSON 1.0000; contact 0.7865. Pinned prior release; stronger on several non-contact metrics.
v6 Publish the current Qwen 20-task row. Rank64/lr5e-5 multiscale LoRA plus verified task-specific probes. 4,032 eval; JSON 0.9990; contact 0.8177. Current public 20-task Qwen3-Omni row.

Qwen v1-v6 are run-lineage labels inside the selected-128 evidence line, not project evidence lines. Use v6 for the public 20-task Qwen3-Omni row; keep v5 as the pinned prior multiscale comparator; read v1-v4 as pipeline-hardening and ablation evidence. Full details: qwen3_omni_run_lineage.json and QWEN3_OMNI_RUN_LINEAGE.md.

01 understand Start with scope and status

Use this route if you need the project story, what is public, and how to read each result family.

Read overview
02 inspect Follow the task evidence

Use the 20-task suite, radar, matrix, and source audit to compare methods without losing metric provenance.

Open task suite
03 reproduce Run or verify the release

Use scripts, validators, mirrors, and checks when you want to rerun or trust the public package.

Open reproduce path
04 extend Choose the next model track

Use directions and scale-up resources for spatial, world-model, VLA, Qwen3-Omni, and Cosmos3 follow-up work.

Open directions
Public reader map

Choose the right entry point without losing the evidence trail.

The project keeps source code, visual explanation, derived artifacts, model outputs, and release checks on different public surfaces. This map shows what each surface is responsible for before you dive into the full file set.

overview Understand the project quickly

Start with the brief and status files, then use the dashboard for the visual story.

benchmark Inspect the 20-task suite

Use the task contract, protocol, walkthroughs, and radar matrix to follow each scored axis.

sample data Understand one data sample

Open the sample explorer, raw-file manifest, and feature manifest before reading model scores.

terms Decode terminology

Use the glossary when evidence lines, direct/proxy scores, Qwen v1-v6, Cosmos branches, or HF surfaces are unclear.

results Compare methods cleanly

Single-episode baselines, 128-episode aligned baselines, Qwen3-Omni v6 LoRA, and Cosmos3-Super/Nano diagnostics stay separated by evidence type.

directions Read the three foundation pipelines

Spatial intelligence, human-video world modeling, and vision-language-action are documented as trainable directions with task mappings.

release health Verify public copies

Publication checks validate source alignment, package contents, mirror parity, and live URL/hash status.

Project brief

From one public episode to an extensible embodied-AI task lab.

Xperience-10M is much larger than the public sample. This project focuses on the sample available now, turns it into clear task contracts and baseline artifacts, and keeps the same data contract ready for held-out multi-episode training when more episodes are prepared.

What this is

A research-development lab for understanding synchronized egocentric multimodal data, defining embodied-AI tasks, and testing small baselines before omni-model fine-tuning.

What is implemented
  • 1,161 aligned windows from one public sample episode
  • 20 unified task contracts with minimal and neural evidence
  • One shared setup across all 20 task axes
  • Four research-direction maps and extension probes
What comes next

The next model-quality stage is stronger action/subtask modeling on the same held-out split, using dense/multiscale windows before requiring more raw episodes.

Data understanding

Maps one public episode into synchronized windows across video, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals.

Task design

Defines embodied-AI inputs, process modules, outputs, metrics, and case-study walkthroughs instead of treating the sample as a generic classification file.

Evaluation discipline

Keeps chronological splits, predictions, confusion matrices, leakage notes, and single-episode limits visible before moving to broader model-quality reads.

Scale-up readiness

Connects the same data contract to 128-episode baselines, a no-new-episode enhancement pack, Qwen3-Omni LoRA, Cosmos-style world modeling, policy/VLA tracks, and the later Xperience-native pretraining goal.

1-Episode 20-Task Radar

Minimal and Neural MLP baselines over the original public-sample episode, with 40/40 scored method-task records.

Single-episode 20-task radar comparing Minimal and Neural MLP across all 20 scored task axes

128-Episode 20-Task Radar

Metadata, raw-feature, Qwen3-Omni, and Cosmos3 methods on the aligned 128-episode surface, with all 140 rows scored and proxy/evidence notes kept explicit.

128-episode grouped 20-task radar comparing metadata baselines, raw-feature baselines, Qwen3-Omni, and Cosmos3 series with explicit score counts
featured

Interactive research roadmap

Use this as the front door for the project: it links the unified 20 tasks, four research tracks, current sample evidence, and the multi-episode Qwen3-Omni scale-up path.

tracks 4 tasks 20 setup unified roadmap phases 5
verified

Multimodal episode pipeline

One Xperience-10M public sample episode is converted into aligned windows and a documented feature contract.

frames 5,821 windows 1,161 features 8,546
verified

Task suite and baseline heads

The unified task suite has minimal and neural baseline evidence across one 20-axis task surface with shared windows, splits, and label discipline.

tasks 20 minimal heads 20 neural heads 20
verified

Dataset source alignment

The public description is aligned to the official gated Xperience-10M dataset card, including modalities, scale, access, and current project coverage. The source snapshot records 31.9 TB on the HF surface, an about-1PB full-scale storage statement, 12,103 episode folders as upstream metadata, not a local data inventory, public sample license cc-by-nc-4.0, HOMIE Toolkit and Rerun 0.29.0 source tooling, and the official limited diversity note. See data/source_alignment_audit.json.

full dataset gated sample scope 1 episode raw data mirrored no
verified

Public research artifacts

Metrics, figures, walkthroughs, baseline weights, Qwen3-Omni results, and Cosmos3 public-safe packages are staged across GitHub, GitHub Pages, and Hugging Face.

tasks 20 baselines minimal + neural reader path tabs
verified diagnostic

Qwen3-Omni held-out pilot

The first selected-episode LoRA pilot is packaged with real held-out predictions and metrics. It proves the pipeline, while the weak scores make it a baseline for error analysis.

split 96 / 16 / 16 test windows 4,032 JSON validity 99.90%
current plan

No-new-episode stress plan

Shows how the current selected split can be stressed without more episodes: dense windows, hierarchical labels, raw-feature shards, and `multiscale_20s10_40s20_80s40` as the next export target.

current windows 3,808 multiscale estimate 106,095 data file task_suite_enhancement_128.json
not redistributed

Data governance

Raw MP4/HDF5/RRD files, private gated Xperience-10M data, and full Qwen weights are excluded from the public repo and HF mirrors.

raw Xperience-10M excluded full Qwen weights excluded derived artifacts included

Research roadmap.

The project path moves from the current public-sample task lab to the latest verified Qwen3-Omni diagnostic run, same-split 128-episode baseline alignment, a no-new-episode enhancement pack, action/subtask error analysis, robustness runs, world/policy tracks, and the future Xperience Embodied Foundation Model pretraining goal.

implemented

Public-Sample Task Lab

One public episode is converted into aligned windows, task contracts, minimal baselines, neural heads, walkthroughs, and figures.

Entry

Public Xperience-10M sample episode available.

Evidence

Status, protocol, takeaways, summary metrics, and episode-task outputs.

implemented

Multi-Episode Data Preparation

Prepare official gated episodes while preserving episode-level separation and recording missing-view coverage. The first selected split is available for Qwen3-Omni diagnostics.

Entry

Gated data access and enough storage for selected episodes.

Evidence

Selected-episode plan, data boundary, preparation notes, and verified package summary.

verified latest run

Qwen3-Omni LoRA Latest Diagnostic Branch

Train lightweight adapters on selected prepared episodes and evaluate on held-out episodes with committed predictions, metrics, and run reports.

Entry

Selected episodes prepared with no train/test episode leakage.

Evidence

Verified result summary, v5/v6 comparison, dataset manifest, training metadata, progress logs, metrics, and predictions.

verified companion result

128-Episode Same-Split Simple/NN Baselines

Align simple metadata/text baselines, raw-feature proxies, and neural MLP baselines to the same selected 96/16/16 split and the unified 20-task axes used by the public result matrix.

Entry

Derived Qwen JSONL export for the selected 96/16/16 split.

Evidence

Baseline alignment report, summary metrics, task metrics, and the 128-task baseline runner.

current no-new-episode plan

Selected-128 enhancement stage

Use the same selected split, estimate dense/multiscale window exports, define hierarchical action/subtask targets, and prioritize raw-feature shards for tasks that metadata baselines cannot cover.

Entry

Current 3,808-window selected 96/16/16 export and verified Qwen v4 metrics.

Evidence

TASK_SUITE_ENHANCEMENT_128.md, task_suite_enhancement_128.json, dense-window CSV, and the enhancement builder script.

active next step

Action/Subtask Error-Analysis Pass

Keep the 96/16/16 split, tighten JSON decoding or target formatting, and analyze action/subtask failures before presenting stronger model-quality numbers.

Entry

The final diagnostic package is verified, meets strict JSON validity, and exposes weak action/subtask quality.

Evidence

Updated quality-target report, error-analysis tables, held-out metrics, and public-safe package.

current

Foundation-Model Selection Matrix

Keep Qwen3-Omni as the first trainable held-out pilot, use Cosmos 3 for world modeling and forward-dynamics trainer development, and stage policy candidates after robot-compatible action targets are explicit.

Entry

Completed 128-episode preparation or a smaller 3-8 episode preprocessing dry run.

Evidence

Foundation model plan, source links, model-specific entry conditions, and evaluation additions.

partially implemented

64-128 Episode Robustness Run

Test whether pilot conclusions survive broader sessions, missing modalities, and stronger ablations.

Entry

Selected multi-episode pilot trains and evaluates cleanly.

Evidence

Metrics by session, task, modality, ablation, and failure type.

planned

Cosmos 3 and Policy-Model Extensions

Extend toward future-window prediction, action-conditioned world modeling, synthetic-data tests, policy-style next action, and affordance reasoning.

Entry

Enough multi-episode data, compute budget, and model-specific action or world-state targets.

Evidence

Task-specific held-out evaluations, qualitative inspection, and updated model cards.

future

Xperience Embodied Foundation Model Pretraining

Pretrain an Xperience-native domain model over synchronized video, audio, depth, pose, mocap, IMU, and language after smaller scaling stages prove value.

Entry

Full-corpus access, PB-scale storage path, multi-node compute, and positive scaling evidence.

Evidence

Pretraining manifests, scaling curves, held-out evaluations, checkpoint inventory, model card, and data-boundary report.

Additional development directions.

Beyond the current task heads, Qwen3-Omni fine-tuning path, Cosmos/world-model track, and future native pretraining goal, Xperience-10M can support three foundation pipeline tracks plus several concrete research-development tracks.

High-resolution slide diagram showing the Spatial intelligence models direction for Xperience-10M.
High-resolution direction slide

Spatial intelligence models

Train spatial-memory models from multiview RGB, egocentric video, depth, pose, calibration, object/contact cues, and language prompts; evaluate spatial QA, object permanence, counting, retrieval, and pose-aware consistency.

Sample input

Use windows.csv and shared_windows.npz to slice each 20-frame window, then join six MP4 RGB streams with annotation.hdf5 depth, camera pose, SLAM/calibration, object cues, contacts, and optional language questions.

Training output

Build targets such as camera-view match, object relevance, object-set memory, depth/pose reconstruction proxy, caption-grounded retrieval, and spatial QA answers derived from the same public annotation timeline.

High-resolution slide diagram showing the Human-video world models direction for Xperience-10M.
High-resolution direction slide

Human-video world models

Train future-prediction models from observed interaction windows to score next action, next subtask, future object set, contact transition, camera-motion delta, and latent future state, with Qwen-style probes and Cosmos-style dynamics kept separate.

Sample input

Take the current 20-frame observed window at time t from shared_windows.npz: RGB/audio/sensor summaries, hand/body motion, camera pose, current object/contact state, and current action/subtask context only.

Training output

Shift the same episode timeline forward to produce next-action, next-subtask, future object-set, contact-transition, time-to-transition, camera-motion delta, or latent/future-feature targets. Future labels stay out of the input.

High-resolution slide diagram showing the Vision-language-action models direction for Xperience-10M.
High-resolution direction slide

Vision-language-action models

Train VLA or policy-compatible heads only after converting egocentric video, captions, hand/body motion, contacts, objects, and procedures into traceable action tokens, chunks, and object-conditioned action targets.

Sample input

Use egocentric/fisheye video windows, caption/object context from annotation.hdf5, hand/body mocap, contact state, and current subtask text as the observation-language side of each training pair.

Training output

For the one-sample suite, output action-token proxies: current/next action, object-conditioned action relation, contact state, interaction-text class, subtask transition, or hand-trajectory/action-chunk proxy. Robot action chunks need a later retargeting converter.

Episode taxonomy and data engine

Build an episode atlas, category tags, balance report, and split builder across activities, objects, scenes, sessions, people, and missing modalities.

direction data

Standardized benchmark protocol

Version train/val/test manifests, task cards, leakage checks, metric scripts, and reference baselines so future model scores are comparable.

direction note

Multimodal representation learning

Train contrastive and masked-prediction encoders over synchronized video, audio, depth, pose, mocap, IMU, and language windows.

JSON plan

Skill and procedure graphs

Mine action steps, transitions, preconditions, effects, and temporal graphs that connect egocentric perception to planning.

current task map

Human-object affordances

Add contact, reachable-object, tool-use, and next-affordance tasks using hands, mocap, objects, contacts, video, and language.

task walkthroughs

3D/4D scene and object memory

Fuse depth, pose/SLAM, multiview video, and object cues into persistent scene/object maps for spatial reasoning and object permanence.

model tracks

Quality and sync diagnostics

Track timestamp drift, missing streams, calibration consistency, corrupted files, and degraded-mode manifests before large training runs.

evidence contract

Policy and simulation transfer

Convert mocap, hand trajectories, contacts, and object states into action tokens, robot-compatible targets, and imitation-learning examples.

foundation plan

Evaluation protocol is explicit.

The protocol is generated from committed metric artifacts so readers can see the exact data unit, split, task targets, leakage controls, and current limitations before comparing scores.

Data unit

One 20-frame aligned window from the public sample episode, stride 5 frames, 1,161 windows total, represented by 8,546 synchronized multimodal dimensions.

evaluation protocol

Split policy

Single-episode chronological 70/30 train/test split. This avoids random future-window mixing; cross-episode generalization is measured in the later multi-episode pilot.

protocol document

Metric contract

All 20 tasks list input, target, primary metric, baseline score, and source artifact path in the unified suite file.

task_suite_20.json

Leakage controls

Scalers fit on train windows only; future labels, target-side signals, caption/object labels, and contact labels stay on the target side unless explicitly queried.

builder script

Audio ablation

Audio and no-audio variants are evaluated across the walkthrough-backed task contracts under the same chronological split.

audio summary

Foundation track selection

Qwen3-Omni is the first trainable baseline, Cosmos 3 is the world-model track with a camera-pose proxy forward-dynamics contract ready for trainer work, policy models wait for robot-compatible action targets, and Xperience-native pretraining remains a later full-corpus goal.

backbone plan

Next evaluation stage

This public-sample run covers single-episode task development. The selected multi-episode Qwen3-Omni final diagnostic result is verified and meets the JSON-validity target; Cosmos3-Nano has a verified future-window compatibility package; and Cosmos3-Super has a verified base-weight JSON-task evaluation plus a fine-tuned forward-dynamics LoRA branch. The next stage is action/subtask error analysis, stronger held-out metrics, and policy-target conversion.

result comparison

Selected-128 next stressor

Before adding episodes, the suite should try `multiscale_20s10_40s20_80s40`, hierarchical action/subtask targets, label-normalized scoring, and compact raw-feature shards for unsupported tasks.

task_suite_enhancement_128.json

Public-safe scale-up gate

Future Omni, Cosmos, and policy tracks use the same episode split discipline, training metadata, held-out predictions, metrics, run report, and public-safe package gate.

scale-up status

Current experiments and next milestones.

The project shows the completed public-sample task suite and the first verified multi-episode Qwen3-Omni diagnostic pilot, then lays out the next quality-improvement and model-extension steps.

verified

Aligned Xperience-10M sample windows

5,821 frames become 1,161 synchronized 20-frame windows with an 8,546-dimensional representation.

verified

20 task contracts + 180 public results

The current release reports nine method families over the unified 20-task axes, with minimal, neural, 128-episode, Qwen3, Cosmos3, and proxy-scored rows kept source-linked.

verified

Audio contribution is measured task by task

Audio variants improve the primary metric on 6 walkthrough-backed task contracts in this single-episode setting.

verified

Four research directions are mapped by evidence type

The Ropedia directions are labeled as direct, proxy, or diagnostic coverage, plus one coded extension probe per direction.

current plan

Foundation backbones are separated by role

Qwen3-Omni stays first for held-out LoRA; Cosmos 3 is the world-model track with camera-pose proxy forward-dynamics targets ready for trainer work; OpenVLA/openpi/GR00T are policy candidates after robot-compatible action conversion; Xperience-native pretraining is the later full-corpus goal.

verified diagnostic

Qwen3-Omni and Cosmos3 series

The selected 96/16/16 episode split now has a verified Qwen3-Omni v6 package with 4,032 held-out test predictions and 99.90% JSON validity. Cosmos3-Nano has 378 held-out future-window predictions, Cosmos3-Super Reasoner has 448 held-out base-weight JSON-task predictions, and Cosmos3-Super Forward-Dynamics LoRA has 448 held-out loss records.

verified

Multi-episode pilot status is explicit

The Qwen3-Omni notes separate earlier diagnostic packages, the final 128-episode LoRA result, and the next action/subtask error-analysis pass.

verified

Figures are indexed

The visual set includes the logo, raw-sample stream thumbnails, task-suite figure, unified 20-task model radar, model-architecture figure, provenance baseline chart, and Qwen3-Omni LoRA training-flow figure.

verified

Brand assets are packaged consistently

The project logo is used consistently in the website header, favicon, README/HF cards, and social preview.

verified

Raw dataset files are not redistributed

The public project shares derived task artifacts, figures, reports, and lightweight baseline files. Raw Xperience-10M videos, HDF5 annotations, RRD visualizations, gated data, and full Qwen weights stay outside the repo.

verified

The dashboard is designed as the visual entry point

Tabs organize the sample data, 20 tasks, model method, results, research directions, and next-stage resources.

verified

Reproduction path is documented

The reproduction guide lists the public sample setup, task-suite rebuild, neural heads, figure generation, and expected outputs.

verified

Official dataset source is linked

The project keeps the official Xperience-10M dataset, public sample, dataset website, and HOMIE toolkit visible so readers can trace the data source.

Research reading path.

A newcomer should be able to move from the dataset sample to the task design, model baselines, current limitations, and scale-up plan without reading every file first.

02

Inspect one model input

Use the window table and feature manifest to see the aligned sample unit, modality sources, and leakage controls.

03

Compare minimal vs neural heads

Every task has a small interpretable baseline and a matching neural MLP head over the same feature contract and chronological split.

04

Check the scale-up gate

The multi-episode Qwen3-Omni path now has a final verified diagnostic package and public LoRA adapter. The native-pretraining plan shows how this can grow into a full-corpus research direction after action/subtask improvements and stronger task metrics.

Verified nowOne public episode, 5,821 frames, 1,161 aligned windows, 8,546 dimensions, 20 unified task contracts, 12 original neural heads, and 4 direction-extension probes.
Next: no-new-episode scaleThe selected 128-episode suite should next use dense/multiscale windows, hierarchical labels, and raw-feature shards before adding more episodes.
Next: error analysisThe selected 128-episode Qwen3-Omni LoRA result has a final verified diagnostic package; JSON validity meets target, and the next pass should improve action/subtask quality.
Not redistributedRaw videos, raw annotations, full Qwen weights, and private gated Xperience-10M data are not included in the public repo or HF bundles.

Aligned with the official dataset card.

The official Xperience-10M card describes a gated, large-scale 4D egocentric multimodal dataset. This project records that full upstream scope while focusing the implemented artifacts on one public sample episode. The source-alignment record keeps 31.9 TB, about-1PB, 12,103 episode folders, cc-by-nc-4.0, HOMIE Toolkit, Rerun 0.29.0, not a local data inventory, limited diversity, and data/source_alignment_audit.json visible on the public site.

Official dataset

Xperience-10M is a gated large-scale egocentric multimodal dataset for embodied AI, robotics, spatial intelligence, and world modeling.

official HF dataset

Line 1 public sample

The one-episode line builds the inspectable 20-task lab. Use Line 2 for selected-128 held-out comparison.

sample dataset

Sample streams

The raw browser is the canonical place to inspect synchronized video, embedded audio, HDF5 annotation groups, depth, pose/SLAM, mocap, IMU, calibration, and language-derived signals.

open raw browserstream metadata

Multi-episode pilot

The selected 128-episode Qwen3-Omni LoRA v6 diagnostic run is verified with 4,032 held-out test predictions and 99.90% JSON validity. Action/subtask metrics are still weak, so this remains a baseline for error analysis.

LoRA adapterv5/v6 comparison

Raw sample browser

The Data tab now exposes the official public sample files directly, including playable MP4 video streams and the audio track embedded in fisheye_cam0.mp4.

open raw browserraw manifest

Data boundary

Raw MP4, HDF5, RRD files are streamed from the official public sample source when opened here; private gated data and full Qwen weights are not redistributed in this project.

data notice

Current project subset

One public sample episode, 5,821 frames, 1,161 aligned windows, 8,546-dimensional task inputs, and direct links to the official raw sample files.

task suiteraw manifest

Covered now

Action/subtask labels, next-action prediction, temporal diagnostics, hand trajectory, contact, object relevance, caption grounding, retrieval, reconstruction, misalignment, long-horizon forecasting, interaction text, action-object relation, sensor bridging, camera sync, and transition timing.

summary metrics

Responsible use

This project is for research exploration and excludes identity recognition, surveillance, biometric profiling, sensitive-attribute inference, and safety-critical deployment.

use notes

Later milestones

Full audio-visual learning, caption generation, depth-pixel prediction, SLAM estimation, neural rendering, policy learning, cross-episode generalization, held-out Qwen3-Omni evaluation, and future Xperience-native pretraining.

native pretraining

Raw public sample browser.

Open each official Xperience-10M sample file from the project page. Video and audio use compact browser previews derived from the official MP4 files, with direct links beside them for the full raw Hugging Face sources. HDF5 and RRD files are shown with their role, size, organization, and direct source links.

fisheye_cam0.mp4

Fisheye camera 0 stream and the public sample audio source. This file can be played as video and as the embedded audio track.

video + audio

Playing a 12 second fast-start preview derived from the official raw MP4. Use the source link for the complete file.

Embedded audio preview from fisheye_cam0.mp4

Video features feed visual tasks; the embedded audio stream feeds audio ablation and acoustic feature blocks.

Sample folder organization

The official public sample is one episode folder. The task suite reads the HDF5 annotations and six synchronized MP4 streams, then writes 20-frame windows with a 5-frame stride.

xperience-10m-sample/
  annotation.hdf5
  fisheye_cam0.mp4
  fisheye_cam1.mp4
  fisheye_cam2.mp4
  fisheye_cam3.mp4
  stereo_left.mp4
  stereo_right.mp4
  visualization.rrd

annotation.hdf5 group map

The raw HDF5 is a binary container, so the browser shows its organization rather than loading the whole file into memory.

calibrationCamera intrinsics/extrinsics and static alignment values.
captionJSON text with actions, objects, interactions, segments, and global summary.
depthDepth maps and confidence channels aligned to the episode timeline.
full_body_mocapFull-body joint and contact signals for human motion modeling.
hand_mocapLeft and right hand joint trajectories used by forecast tasks.
imuAccelerometer and gyroscope streams sampled above video rate.
metadataEpisode metadata, frame indexing, and source bookkeeping.
slamCamera trajectory, pose, and sparse SLAM point-cloud information.
videoVideo metadata and per-frame alignment information.

Stream-to-feature use

The source streams are summarized once here, next to the playable files and HDF5 map.

VideoSix synchronized MP4 streams feed RGB, fisheye, stereo, and frame-statistic task inputs.
AudioThe embedded fisheye_cam0 audio feeds acoustic feature blocks and audio ablations.
DepthDepth maps and confidence channels provide geometry signals for spatial and reconstruction probes.
Pose / SLAMTrajectory, camera pose, and sparse map values become position and orientation features.
Motion captureBody and hand joint tracks support motion, contact, hand forecast, and policy-style targets.
InertialAccelerometer and gyroscope streams become wearable-motion statistics.
LanguageObject tags and caption-derived labels become semantic targets; raw caption text remains governed by the official sample.

Small derived modality thumbnails remain in modality_atlas.json; raw MP4, HDF5, and RRD files are not redistributed.

Ropedia Xperience-10M Unified 20-Task Suite.

The suite connects synchronized multimodal windows to 20 task contracts in one table, one radar surface, and one source-linked result matrix. Historical filenames remain only for stable artifact links.

Infographic showing Ropedia Xperience-10M task families with compact modality cards and visible thumbnails

Unified plus split radars

The unified radar keeps all nine methods in one comparison board, but groups them into small-multiple panels so each method family can be read directly. The split radars separate the 1-episode Minimal/NN baseline comparison from the 128-episode metadata/raw, Qwen3-Omni v6 LoRA, and Cosmos3-Super/Nano comparison.

Metric normalization

Higher-is-better metrics are normalized to 0-1; lower-is-better metrics are converted to best/value within the task. The SVG uses sqrt(normalized score) only for visual radius, while raw values, linear normalized scores, status reasons, sources, and compact proxy notes remain in the JSON mirrors.

Score/proxy audit

The matrix has 180/180 scored method-task records: 174 direct scores and 6 compact-proxy scores. The audit records the source artifact, metric key, and proxy reason for each marked cell.

Unified grouped 20-task radar comparing Minimal, Neural MLP, 128-episode metadata/raw baselines, Qwen3-Omni, and Cosmos3 with task names, method details, 20-record counts, score counts, and proxy notes

1-Episode 20-Task Radar

Minimal and Neural MLP are both scored on all 20 public-sample task contracts in one enlarged panel without 128-episode methods competing for attention.

Single-episode 20-task radar comparing Minimal and Neural MLP across all 20 scored task axes

128-Episode 20-Task Radar

Seven aligned 128-episode methods cover all 20 axes across metadata/text, raw-feature, and foundation-model panels. Proxy axes stay labeled in the SVG and JSON.

128-episode grouped 20-task radar comparing raw-feature baselines, metadata baselines, Qwen3-Omni, and Cosmos3 series with explicit score counts

From raw episode to research artifacts.

Every script works from one data contract: aligned multimodal windows, explicit labels, cached feature extraction, and a manifest that makes omitted modalities visible.

Verified Xperience-10M multimodal pipeline diagram

Qwen3-Omni LoRA training flow

Raw valid episodes move through split validation, parallel export, video/audio/text formatting, sensor-bridge features, LoRA training, and sealed held-out evaluation.

What the figure represents

It documents the selected 128-episode final diagnostic result and the action/subtask improvement path for stronger held-out metrics.

Detailed Qwen3-Omni LoRA training pipeline from raw Xperience-10M episodes to adapter outputs, predictions, metrics, and reports

What this project enables

It demonstrates the full development loop: reading Xperience-10M sample data, aligning modalities, converting them into model-ready windows, defining meaningful tasks, producing metrics, and packaging artifacts for continued research.

What still needs more data

General embodied-intelligence model quality requires many episodes and held-out episode splits; the public sample is the development harness for that next stage.

Results by evidence line.

Read results in this order: choose the line, open the matching radar, inspect the matrix row, then check proxy flags before interpreting totals.

01 Choose the line

Use 1 episode for the task lab. Use 128 episodes for the selected comparison surface.

02 Open the radar

Single-episode radar shows Minimal vs Neural MLP. The 128-episode radar shows metadata/raw baselines, Qwen3-Omni v6, Cosmos3-Super, and Cosmos3-Nano.

03 Inspect the matrix

Each score keeps method, task, metric key, source artifact, and status.

04 Check proxy cells

Six selected-128 scores are compact proxies and stay marked in the audit.

1 episode results

Task-lab evidence

Minimal and Neural MLP heads are both scored on all 20 public-sample task contracts. All 40 scores are direct task-target metrics.

best read as

A reproducible public task suite and baseline behavior check.

2methods 20task axes 40/40scores
Open 1-episode radar
128 episode results

Scale-up evidence

Metadata/raw baselines, Qwen3-Omni v6 LoRA, Cosmos3-Super Reasoner, and Cosmos3-Nano Future Window use the aligned 128-episode surface. It has 134 direct scores plus 6 compact-proxy scores.

best read as

A same-split comparison table with explicit source and proxy status.

7methods 20task axes 140/140scores
Open 128-episode radar
Line Methods Tasks Scored records Direct scores Proxy scores Machine-readable source
1 sample episode 2 20 40/40 40 0 single-episode radar JSON
128 selected episodes 7 20 140/140 134 6 compact-proxy scores, each source-linked and reasoned. 128-episode radar JSON
Total public matrix 9 20 180/180 174 6 two-line result summary JSON
Line Block Methods Records Evidence type Primary artifact
1 sample episode Task-head baselines Minimal; Neural MLP 40 direct Direct target metrics on the public sample windows. single-episode radar JSON
128 selected episodes Aligned baseline heads Metadata simple/NN; raw-feature simple/NN 74 direct + 6 compact-proxy Processed-target metrics where available; proxy cells remain source-linked. score/proxy audit
128 selected episodes Qwen3-Omni series Qwen3-Omni v6 LoRA 20 direct Verified selected-128 LoRA and task-specific probe artifacts. model comparison JSON
128 selected episodes Cosmos3 series Cosmos3-Super Reasoner; Cosmos3-Nano Future Window 40 direct Verified reasoner and future-window public-safe artifacts; forward-dynamics LoRA is a separate adapter artifact outside the 20-task method rows. model comparison JSON
180-result table

All methods x all 20 tasks, in one source-linked table.

Each cell shows the raw metric value to cite, the normalized radar value, the metric key, and a direct/proxy badge. The table is generated from the same task_method_20_result_matrix.json used by the radar, so values stay aligned across GitHub, the website, and Hugging Face mirrors.

180method-task records
9method rows
20task columns
174direct scores
6compact-proxy scores
direct task scoreiA metric computed against the task target directly. This is the preferred score type in the 20-task matrix. compact proxy scoreiA bounded proxy metric when a direct raw target is not publicly available. It stays explicit so readers do not over-read it. raw value is citeableiThe original metric value emitted by the runner or verified package. This is the value to cite. normalized value is radar-onlyiA 0-1 plotting value used only to draw comparable radar polygons across metrics with different scales.
MethodiOne named method family in the matrix, such as Minimal, 128ep Raw NN, Qwen3-Omni v6, or Cosmos3-Super. LineiA reading lane for a group of results: Line 1 is one public sample episode; Line 2 is selected-128 held-out comparison. RecordsiOne method evaluated on one task. 9 methods x 20 tasks gives 180 public result records. DirectiA metric computed against the task target directly. This is the preferred score type in the 20-task matrix. ProxyiA bounded proxy metric when a direct raw target is not publicly available. It stays explicit so readers do not over-read it. Scope
Loading result summary...
Raw and normalized scores for 9 methods across 20 Xperience-10M tasks.
MethodLoading tasks...
Loading 180-result matrix...

Best-practice reading rule: compare methods within the same evidence line first, then use the proxy badges before interpreting cross-method totals. Six compact-proxy cells are intentionally visible rather than blended into direct raw-target scores.

One episode becomes a benchmark contract

The public sample is converted into 5,821 frames, 1,161 aligned 20-frame windows, and an 8,546-dimensional representation for repeatable task evaluation.

research takeaways

Chronological split exposes class shift

All-feature action reaches 0.9829 macro-F1 on its local split, while the chronological action head in the core task suite is 0.0500 macro-F1 with four unseen later action labels.

takeaways

Neural heads help dynamics

Hand MPJPE improves from 0.8647 to 0.1079; temporal-order F1 rises from 0.5400 to 0.8520; misalignment F1 rises from 0.5052 to 0.7153.

metrics

Retrieval and reconstruction remain open

Ridge/cosine retrieval remains stronger than the neural projection here, and cross-modal feature reconstruction still has negative R2.

retrieval metrics

Scale means held-out episodes

The next credible model-quality unit is a held-out multi-episode pilot across different sessions, not more adjacent windows from one sample.

scale-up status

Small baselines, no hidden machinery.

Motion-only and current all-feature classifiers use lightweight heads so the comparison stays readable on a laptop and easy to inspect. The neural run keeps the same features and splits, then swaps in PyTorch MLP heads.

Motion-only action

0.9688macro-F1, 18 classes

Current all-feature action

0.9829macro-F1, 8,546 dimensions

Motion-only subtask

0.9528macro-F1, 14 classes

Current all-feature subtask

0.9173macro-F1, chronological caveats
Macro-F1 comparison chart

Neural MLP heads, same task contracts.

The neural baseline uses small PyTorch MLP classifiers/regressors on the same 8,546-dimensional windows, chronological splits, and leakage filters. This isolates the value of a nonlinear head before moving to heavier Qwen/Omni experiments.

Neural hand forecast

0.1079MPJPE, down from 0.8647 minimal

Neural temporal order

0.8520F1, adjacent-window diagnostic

Neural misalignment

0.7153F1, shifted motion/visual/audio pairs

Neural cross-modal retrieval

0.1300MRR; ridge remains stronger here
Neural MLP episode task score chart Minimal versus neural MLP episode task score chart

The walkthrough-backed tasks organized into four research directions.

Each task is mapped as direct, proxy, or diagnostic evidence for the Ropedia research tracks. The mapping uses two current baselines: minimal interpretable heads and neural MLP heads over the same feature contract.

partially implemented

A. Human Modeling & Motion Understanding

Direct evidence comes from hand trajectory forecasting and contact prediction; action and object relevance are supporting proxies.

2direct2proxy0diagnostic
proxy tasks only

B. 3D/4D Reconstruction & Neural Rendering

Cross-modal retrieval, modality reconstruction, and misalignment detection check reconstruction prerequisites, not full geometry.

0direct2proxy1diagnostic
strongest implemented

C. Egocentric Vision & Interaction

Action, subtask, transition, next-action, object, caption, order, and alignment tasks directly stress egocentric understanding.

6direct2proxy3diagnostic
early proxy tasks

D. Scene Reconstruction & World Modeling

Current probes cover task state, object relevance, retrieval, reconstruction, temporal order, and alignment but no persistent map yet.

0direct6proxy3diagnostic
Coverage of the original Xperience-10M tasks across four research directions

Baseline 1: minimal heads

Softmax, logistic, ridge, and retrieval heads keep every input/output contract readable. They are the first sanity check for whether a task is well-posed.

Baseline 2: neural MLP heads

Small PyTorch MLP classifiers/regressors reuse the same features and splits. They test nonlinear gains before heavier Omni fine-tuning.

Unified 20-task evidence and provenance.

All 20 tasks live in the same task table, task-card grid, radar, and 180-record result matrix. Historical result paths are retained only for exact provenance links.

Unified task artifact package

The public task package has one 20-task JSON, per-task metrics, prediction/rank files, Markdown summaries, radar charts, and the 180-record method-task matrix.

Open unified 20-task JSON · Open 180-record matrix · Open unified radar

One setup, one task surface

Every task uses the same 20-frame window unit, 5-frame stride, 8,546-dimensional feature manifest, chronological split discipline, and minimal/neural comparison pattern unless a task-specific leakage rule removes target-side features.

Historical provenance JSON and historical provenance chart remain available for exact source tracing.

Four Xperience-10M research-direction extension probes with minimal and neural metrics
A / motion

Body and Hand Motion Intensity

Case: classify fast reach/pour windows as high motion and steady holding windows as low motion.

Input: non-mocap video, depth, pose, IMU, SLAM, calibration, and language features.

Output: high_motion or low_motion.

0.7827minimal macro-F10.7986neural macro-F1
B / views

Multi-View Consistency Retrieval

Case: retrieve the synchronized stereo-left window from a fisheye-camera query.

Input: fisheye_cam0 video features against stereo_left candidate features.

Output: ranked synchronized view candidates.

0.5534minimal MRR0.3469neural MRR
C / phase

Action Phase Progress Estimation

Case: estimate whether a Pour coffee window is near the start, middle, or end of its action segment.

Input: non-caption multimodal features.

Output: 0-to-1 progress inside the current action.

0.3416minimal MAE0.3038neural MAE
D / world

Short-Horizon Ego-Motion Forecasting

Case: predict how the camera translation changes over the next 20 frames.

Input: current sensors excluding camera translation and captions.

Output: future camera-translation delta vector.

0.1989minimal MAE0.0989neural MAE

What changed

The four research directions now have coded extension probes, prediction/rank CSVs, JSON metrics, a Markdown summary, and a website chart generated from real sample-window features.

What still needs scale

A full research result still needs many Xperience-10M episodes, held-out episode splits, stronger encoders, and direction-specific models such as body priors, renderers, or persistent scene graphs.

The baseline task heads share four head families.

The diagram separates the shared episode-window representation from the task-specific heads, so the task contracts stay readable before scaling to larger models.

Verified minimal and neural architecture diagram for Ropedia Xperience-10M task heads

Interactive task walkthrough.

Each task uses a common research name and a concrete case study, then opens into the input, middle modules, output, modality evidence, metric, and current limitation.

Representative sample modality for the selected task
Step 1 / 4 · Input
Action Recognition Egocentric Action Recognition

Input: inspect the 20-frame multimodal window before choosing the target.

01 / 12
supervised multiclass classifier

Action Recognition

In the coffee-making sample, a pouring window maps to the current action label.

    Metric: macro-F1. Minimal 0.0500; neural MLP 0.0148.

    Current limitation: single-episode chronological split.

    Task cards and metrics.

    All 20 task contracts are shown together with readable research names, representative modality thumbnails, explicit input-process-output contracts, and verified minimal versus neural scores. Rich interactive walkthroughs are available for the first 12 task cards; the remaining cards use the same unified task JSON contract.

    Assigned visual language for the 20 tasks.

    The overall generated atlas keeps the icon family visible, while each task card below uses its own crisp assigned SVG for reliable loading and public mirrors.

    Generated 4 by 5 atlas of the 20 Ropedia Xperience-10M task icons

    Every model input has a source.

    The point is not hidden complexity. Every input group maps back to a source modality and a manifest entry.

    All modality source chart

    Diagnostics separate memorization from signal.

    The charts make the main lesson visible: within-episode supervised labels are easy under some splits, while retrieval, grounding, forecasting, and alignment remain the useful probes.

    Episode task suite score chart Cross modal retrieval chart Neural MLP task score chart Minimal versus neural score chart Measured audio delta chart across walkthrough-backed task contracts

    Open the single-episode explorer to inspect window-level labels, predictions, modality statistics, object labels, and diagnostic scores. The audio ablation summary records the task-by-task audio contribution.

    Glossary for overloaded terms.

    These are the terms readers most often confuse when moving between the repo, website, Hugging Face mirrors, result matrices, and model-package cards. The full glossary is mirrored as Markdown and JSON.

    scope Separate data, result lanes, and mirrors

    Evidence lines, public-safe artifacts, and gated upstream data are different objects. The glossary keeps those boundaries visible.

    results Read scores by source type

    Direct scores, compact-proxy scores, gap audits, and task-method records should not be interpreted as the same kind of evidence.

    models Keep branches distinct

    Minimal/NN heads, metadata/raw baselines, Qwen3-Omni v1-v6, Cosmos3-Super, Cosmos3-Nano, LoRA adapters, and full-parameter gates each mean something specific.

    Term Meaning here Use it for Do not confuse with
    Evidence lineA reading lane for a group of results.Line 1 is the public sample episode; Line 2 is selected-128 held-out comparison.Qwen v1-v6 run versions.
    Public sample episodeThe one fully inspectable official sample episode.Raw-file browsing, task construction, single-episode baselines.The selected-128 comparison rows.
    Selected 128 episodesPublic-safe derived features linked to official gated episode paths.Same-split Line 2 baseline/model comparisons.Redistributed raw MP4/HDF5/RRD files.
    20-frame windowA fixed short clip slice used as a model input unit.Feature rows, labels, tasks, and many baseline heads.A full episode.
    Task-method recordOne method evaluated on one task.The 9 x 20 public matrix, now 180 scored records.A single prediction row.
    Direct scoreA metric computed against the task target directly.Primary interpretation in the result matrix.Compact-proxy score.
    Compact-proxy scoreA bounded proxy when the direct raw target is not public.Explicitly marked cells in the gap audit and matrix.A direct target measurement.
    Raw metric valueThe original value emitted by the runner or verified package.The value to cite from the 180-result table.Normalized radar value.
    Normalized radar valueA 0-1 plotting value used only for comparable radar polygons.Visual comparison across metrics with different scales.The raw metric value to cite.
    Minimal baselineA simple non-neural task head; the "minimum" reference row in casual wording.Single-episode lower-complexity comparison.Selected-128 Simple baseline rows.
    Simple baselineA non-neural selected-128 baseline family.Metadata/text and raw-feature 128-episode comparisons before NN/foundation rows.The single-episode Minimal baseline.
    Qwen3-Omni v6The current public Qwen 20-task row.Qwen3-Omni LoRA plus task-specific probes.All Qwen v1-v6 experiments.
    Cosmos3-SuperThe larger Cosmos-style branch.Reasoner diagnostics and a verified forward-dynamics LoRA branch.Cosmos3-Nano Future Window.
    LoRA adapterLightweight trainable adapter weights.Public model-branch artifacts when verified.Full base-model weights.
    HF artifact datasetHugging Face dataset repo for derived evidence.Reports, metrics, website JSON, sanitized result packages.The upstream Xperience-10M dataset.
    Mirror parityA check that public copies match source files.Verifying GitHub, website, and HF mirrors.A model-quality metric.

    Research artifacts for the next experiments.

    Metrics, predictions, manifests, lightweight model weights, and derived window artifacts are organized so the project can be inspected, extended, and scaled before rerunning the full pipeline. Raw Xperience-10M data and Qwen weights are not redistributed.

    download Find the right public surface

    Open GitHub, HF Space, artifact dataset, baseline models, or consolidated weights/results without guessing.

    Open public surfaces
    verify Check result parity

    Use validators, source alignment, mirror parity, and live URL/hash checks before trusting a number.

    Open checks
    reproduce Run the task pipeline

    Start from scripts, windows, feature manifests, task contracts, and minimal/neural result outputs.

    Open commands
    scale Continue model work

    Use Qwen3-Omni v6, Cosmos3-Super/Nano packages, the 128-episode feature index, and foundation-model plans for the next runs.

    Open scale-up
    Research artifacts

    From one episode to task heads

    Start with the files that define the sample windows, modality inputs, task contracts, metrics, walkthroughs, and research-direction mapping.

    Task results

    Every task definition, split detail, feature dimension, and minimal/neural metric in one project output.

    task results

    Windows table

    Window start/end frames and aligned action/subtask labels for the public sample episode.

    window table

    Feature inputs

    Source map for the current modality inputs used by the task suite.

    feature inputs

    Neural MLP task results

    Per-task PyTorch MLP metrics, predictions, histories, and checkpoints for the unified task contracts, with historical result-bundle paths retained for provenance.

    neural MLP outputs

    Four-direction taxonomy

    Maps the walkthrough-backed task contracts to the four research tracks: human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling.

    research direction outputs

    Direction extension probes

    Four coded probes, one per research direction, with minimal and neural metrics plus prediction/rank CSVs.

    extension probe outputs

    Task walkthroughs

    Case studies for the walkthrough-backed task contracts, including input, middle process modules, output, metric, limitation, and task-player data.

    walkthrough outputs

    Audio ablation and raw upgrade

    All 72 task/variant rows comparing current audio, no audio, raw audio, replacement, and combined-input settings.

    audio ablation outputs

    Single-episode explorer

    Interactive window-level view of labels, predictions, modality statistics, object labels, and diagnostics.

    open explorer

    Cross-modal retrieval

    The strongest self-supervised signal from the single episode.

    retrieval metrics

    Qwen3-Omni diagnostic run is verified.

    The selected pilot uses 128 source-balanced episodes across 128 different session UUIDs. The latest v6 held-out package is verified, and its weak metrics define the next structured-output and error-analysis pass.

    Selection

    128 complete episodes selected from 128 unique top-level sessions, balanced across episode-size bands and split 96/16/16 for train/val/test.

    source/feature index

    Transfer

    Download raw episodes only from official gated sources, exclude visualization.rrd, validate files, then stage them for training.

    Current LoRA artifact

    The current Qwen3-Omni LoRA artifact is the verified v6 selected 128-episode diagnostic adapter. The v5 row remains pinned as the prior release, and the 1-episode Qwen entry is only a sensor-adapter smoke test.

    model groups

    No-new-episode suite push

    The next suite push does not need more episodes first: use `multiscale_20s10_40s20_80s40`, hierarchical action/subtask targets, and raw-feature shards while keeping the held-out split fixed.

    task_suite_enhancement_128.json

    Backbone tracks

    Qwen3-Omni uses a separate LoRA model repo; Cosmos3-Nano remains a compatibility package; Cosmos3-Super now has a verified forward-dynamics LoRA artifact with weights in a dedicated model repo.

    Cosmos3-Super weights

    Native foundation model

    The long-term goal is a full-corpus Xperience Embodied Foundation Model trained on synchronized perception, geometry, motion, inertial, audio, and language streams after smaller scaling stages validate the approach.

    pretraining plan

    Reproduce the suite.

    Raw Xperience-10M data is not redistributed here. The reproduction guide states the commands, expected outputs, exact-match reproduction record, and multi-episode requirements.

    Reproducibility guide

    Human-readable commands, expected artifacts, and current scope for the public single-episode pipeline.

    reproducibility guide

    Reproducibility matrix

    Machine-readable command matrix covering sample download, baselines, the unified 20-task suite, figures, and validation.

    reproducibility matrix

    Exact-match reproduction record

    The last metric rebuild reproduced the public-sample outputs from a fresh cache and matched the committed metrics.

    reproduction audit

    Project dashboard

    The website organizes the dataset sample, tasks, methods, results, directions, and scale-up path in one tabbed reader flow.

    project materials

    Line 2 model status

    The comparison JSON groups selected-128 baselines, Qwen3-Omni v6 LoRA, Cosmos3-Nano Future Window, and Cosmos3-Super Reasoner. Full Qwen v1-v6 detail stays in a separate lineage audit.

    comparisonQwen v1-v6

    Minimal path: install the toolkit dependencies, download the official sample, run the task suite with neural heads, regenerate the historical provenance bundle, build the unified 20-task index, regenerate visualizations, then rebuild the supporting project reports.

    git clone https://github.com/Ropedia/HOMIE-toolkit.git
    python3.12 -m venv .venv
    source .venv/bin/activate
    pip install -r HOMIE-toolkit/requirements.txt huggingface_hub hf_xet
    git clone https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite.git
    pip install -r ropedia-xperience-10m-task-suite/requirements.txt
    pip install torch
    
    hf download ropedia-ai/xperience-10m-sample \
      --repo-type dataset \
      --local-dir data/sample/xperience-10m-sample
    
    cd ropedia-xperience-10m-task-suite
    export WORKSPACE=/path/to/workspace
    python scripts/episode_task_suite.py --workspace "$WORKSPACE" --include-neural
    python scripts/research_direction_extension_tasks.py
    python scripts/tier2_task_suite.py --workspace "$WORKSPACE"
    python scripts/build_unified_task_suite.py
    python scripts/task_walkthroughs.py
    python scripts/build_evaluation_protocol.py
    python scripts/generate_visualizations.py
    python scripts/render_overview_figures.py
    python scripts/render_task_suite_infographic.py
    python scripts/export_modality_atlas_assets.py
    python scripts/validate_website_integrity.py
    python scripts/validate_scope_claims.py
    python scripts/build_artifact_index.py
    python scripts/validate_mirror_parity.py
    python scripts/validate_publication_package.py