| Xperience-10M |
The upstream embodied human-interaction dataset. |
The source dataset behind the public sample, selected-128 features, task suite, and model diagnostics. |
This repo itself; the repo only redistributes public-safe derived artifacts. |
| Public sample episode |
One officially available sample episode. |
The fully inspectable Line 1 unit used for raw-file browsing, 20-frame windows, task construction, and single-episode baselines. |
Multi-episode generalization. |
| Selected 128 episodes |
A public-safe selected subset of official gated episode paths. |
Line 2 uses derived windows/features and keeps links back to official episode ids and gated source paths. |
Redistributed raw MP4/HDF5/RRD data. |
| Evidence line |
A claim boundary for a group of results. |
Line 1 is one public sample episode; Line 2 is selected-128 held-out comparison. |
Qwen run versions v1-v6, which are model-run lineage, not evidence lines. |
| Official gated data |
Upstream files that require official dataset access. |
Raw Xperience-10M MP4/HDF5/RRD files and full source directories remain outside the public repo. |
Public-safe metrics, derived features, figures, and manifests. |
| Public-safe artifact |
A file that can be mirrored publicly without raw gated content. |
Metrics, JSON summaries, model cards, figures, derived manifests, and approved lightweight weights/adapters. |
Raw dataset redistribution. |
| Episode |
One recorded interaction sequence. |
The basic source unit behind windows, labels, and train/val/test splits. |
A 20-frame window, which is a smaller model input slice. |
| 20-frame window |
A fixed short clip slice. |
The sample episode is converted into aligned 20-frame units for features, labels, and many task heads. |
A full episode or an arbitrary video segment. |
| Window stride |
The frame step between neighboring windows. |
Used to create overlapping examples while preserving chronological order and leakage controls. |
Video frame rate. |
| Feature manifest |
A map from model-input columns to source modalities. |
results/episode_task_suite/feature_manifest.json explains the feature groups and dimensions. |
The raw annotation file. |
| Raw sample file map |
A human-readable inventory of the sample episode files. |
docs/data/raw_sample_files.json explains videos, annotations, calibration, motion, and derived previews. |
A training manifest. |
| annotation.hdf5 |
Upstream annotation container for the sample. |
Contains original labels/metadata; some public derived files expose hashed or processed features rather than every raw text field. |
summary_report.json or task result JSON. |
| Interaction text |
Natural-language interaction/caption content. |
Used by task 15 and some derived text features; public matrices record when text targets are direct or compact-proxy. |
Numeric action ids or subtask ids. |
| Modality |
A type of signal. |
Video, audio, depth, pose/SLAM, motion capture, inertial, calibration, and language-derived signals. |
A task target. |
| Task contract |
The definition of one benchmark task. |
Includes input, target/output, metric, split, source artifact, and limitation. |
A model architecture. |
| Unified 20-task suite |
The current task surface. |
Tasks 1-12 plus tasks 13-20 are presented together and scored across methods where real artifacts exist. |
The historical tier-2 label; tasks 13-20 are now part of the same 20-task suite. |
| Task-method record |
One method evaluated on one task. |
9 methods x 20 tasks gives 180 public result records. |
A single prediction row. |
| Direct score |
A metric computed against the task target directly. |
The preferred score type in the 20-task matrix. |
Compact-proxy score. |
| Compact-proxy score |
A bounded proxy metric when a direct raw target is not publicly available. |
Kept explicit in the matrix and gap audit so readers do not over-read it. |
A direct target measurement. |
| Gap audit |
A coverage and source-status audit. |
docs/data/task_method_20_gap_audit.json explains scored, proxy, and unsupported cells. |
A performance leaderboard. |
| Leakage control |
A split or feature rule that prevents using future/target information unfairly. |
Chronological splits, held-out splits, and source audits protect task interpretation. |
Lower training accuracy. |
| Minimal baseline |
A simple non-neural task head. |
Provides a reproducible lower-complexity comparison for task feasibility. |
The metadata-only baseline family in the selected-128 matrix. |
| Neural MLP |
A compact neural task head. |
Used for single-episode and selected-128 baseline comparisons. |
Foundation-model fine-tuning. |
| Metadata baseline |
A selected-128 baseline using metadata/text-derived public-safe features. |
Helps compare simple and neural heads on the held-out split. |
Raw video/depth/audio feature baselines. |
| Raw-feature baseline |
A selected-128 baseline using exported public-safe raw-feature groups. |
Tracks what non-foundation heads can do with richer processed inputs. |
Raw gated media redistribution. |
| Qwen3-Omni |
The multimodal foundation-model family used for the Qwen branch. |
The current public 20-task Qwen row is Qwen3-Omni v6 LoRA plus task-specific probes. |
Cosmos3 or the single-episode task-head baselines. |
| Qwen v1-v6 |
The Qwen3-Omni run lineage. |
v1-v4 are earlier pipeline/ablation evidence, v5 is the prior pinned release, and v6 is the current public 20-task row. |
Six different evidence lines. |
| Cosmos3-Super |
The larger Cosmos3-style branch tracked in this project. |
Published as Reasoner diagnostics and a separate forward-dynamics LoRA adapter/result branch when verified. |
Cosmos3-Nano. |
| Cosmos3-Nano |
A smaller Cosmos3 compatibility/future-window branch. |
Used for the Nano Future Window row and related diagnostics. |
Cosmos3-Super fine-tuned adapter. |
| LoRA adapter |
A lightweight set of trainable adapter weights. |
Published only when the package is verified and public-safe. |
Full base-model weights. |
| Full-parameter fine-tuning |
Updating the whole model rather than only adapters. |
This project records feasibility gates and short pilots, but does not publish full checkpoints. |
LoRA adapter publication. |
| Foundation pipeline |
A high-level training direction. |
Spatial intelligence, human-video world modeling, and vision-language-action are documented as trainable directions with task mappings. |
A completed public result row. |
| Spatial intelligence |
Learning geometry and spatial reasoning from egocentric data. |
Uses video, depth, camera pose, and language tasks to target 3D/space reasoning. |
World-model future prediction. |
| Human-video world model |
Learning future frames, actions, and interaction dynamics from human video. |
Uses temporal prediction, next-action, transition, and object-forecast tasks. |
Robot policy execution. |
| Vision-language-action |
Mapping perception and language to action chunks. |
A future policy/VLA direction that needs action-target conversion and stronger policy packaging. |
Qwen3-Omni diagnostic scoring. |
| HF Space |
Hugging Face-hosted app/site surface. |
Mirrors the dashboard and static website assets. |
HF artifact dataset or model repo. |
| HF artifact dataset |
Hugging Face dataset repo for derived evidence. |
Stores public-safe reports, metrics, website JSON, and sanitized result packages. |
Original Xperience-10M dataset. |
| HF baseline model repo |
Hugging Face model repo for lightweight baseline artifacts. |
Mirrors baseline weights, figures, metrics, and task artifacts. |
Qwen/Cosmos adapter-specific repos. |
| HF weights/results repo |
Consolidated public-safe model-result bundle. |
Groups baseline weights, verified Qwen/Cosmos artifacts, analysis files, and manifests. |
The upstream raw dataset. |
| Mirror parity |
A check that public copies match the source files. |
docs/data/mirror_parity.json records whether GitHub, website, and HF mirrors agree. |
A model-quality metric. |
| Publication audit |
A public-package validation report. |
Confirms required files exist and forbidden raw/private assets are not included. |
Scientific peer review. |
| Verified package |
A result or artifact bundle that passed local/public validators. |
Only verified packages are promoted to README, website, and HF surfaces as public evidence. |
A running or exploratory experiment. |