--- license: other library_name: pytorch tags: - robotics - embodied-ai - multimodal - ropedia - xperience-10m - baseline - neural-network - pytorch - linear-model - retrieval metrics: - accuracy - f1 - mean-reciprocal-rank - mean-squared-error model-index: - name: Ropedia Xperience-10M Task Baselines results: - task: type: robotics name: Cross-modal retrieval dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: top_5_accuracy value: 0.3764 name: top-5 retrieval accuracy - type: mrr value: 0.2634 name: mean reciprocal rank - task: type: robotics name: Transition detection dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: f1 value: 0.6552 name: macro-F1 - task: type: robotics name: Temporal order dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: f1 value: 0.8718 name: neural MLP F1 --- # Ropedia Xperience-10M Task Baselines This repo stores the minimal baseline weights, neural MLP task-head checkpoints, and metrics for the 12-task Xperience-10M episode suite, plus four lightweight direction-extension probes. It is a baseline-model artifact repo for research development, not a robot foundation model. ![Ropedia Xperience-10M Task Suite logo](assets/brand/xperience10m-logo-social-card.png) ![12-task suite with sample modalities](assets/task_suite_infographic.png?v=xperience10m-taskfirst-v12-modality-xl) The source Xperience-10M sample spans video, audio, depth, pose, motion capture, inertial sensing, and language annotation. The committed minimal and neural task heads use the current 8,378-d feature manifest; audio is documented in the figures but is not yet extracted into a model input feature block. The tabbed research website, task-first 12-head map, responsive modality atlas, interactive scrub/play storyboard, website HTML mirrors, `brand_assets.json`, and `scripts/build_brand_assets.py` are included so this model repo stays aligned with the public Space and artifact dataset. ## Evidence Boundary | Claim layer | Evidence | Boundary | | --- | --- | --- | | Project status | `PROJECT_STATUS.md`, `metrics/project_status.json` | compact verified/data-gated/not-redistributed decision table | | Baseline weights | `artifacts/**/model.npz` | lightweight heads only | | Neural checkpoints | `artifacts/episode_task_suite/neural_mlp/**/model.pt` | same single-episode windows and splits | | Metrics | `artifacts/**/metrics.json`, prediction CSV/NPZ files | debugging and task-contract evidence | | Feature contract | `artifacts/**/feature_manifest.json` | audio documented but not featurized | | Evaluation protocol | `EVALUATION_PROTOCOL.md`, `metrics/evaluation_protocol.json` | windowing, chronological split, leakage controls, and task metrics | | Qwen3-Omni | companion blocker and access-status reports | readiness-only until 32 valid episodes are available | | Source alignment | `SOURCE_ALIGNMENT_AUDIT.md`, `metrics/source_alignment_audit.json`, `scripts/validate_source_alignment.py` | validates full-dataset facts, sample-card facts, API-listing caveats, and public-card boundary markers | | Task surface integrity | `metrics/task_surface_integrity.json`, `scripts/validate_task_surface.py` | task cards use human-readable research names, modality thumbnails, and the interactive storyboard data contract | | Public surface QA | `PUBLIC_SURFACE_QA.md`, `metrics/public_surface_qa.json`, `scripts/build_public_surface_qa.py` | repo, website, and Hugging Face cards preserve SEO/social metadata, accessible tab semantics, public links, QA links, and copy hygiene | | Artifact index | `metrics/artifact_index.json` | compact catalog of project-critical supporting artifacts | | Reproducibility | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json` | public commands, expected outputs, exact-match reproduction evidence, and non-reproducible boundaries | ## 90-Second Research Project Path | Step | Question | Primary artifacts | | --- | --- | --- | | 1 | What has been implemented? | `PROJECT_STATUS.md`, `metrics/project_status.json`, `EVIDENCE_CONTRACT.md`, `ARTIFACT_GUIDE.md`, `QUALITY_GATES.md`, `PUBLIC_SURFACE_QA.md`, `FIGURE_INDEX.md`, `metrics/artifact_index.json`, `metrics/figure_index.json`, `metrics/live_publication_status.json`, `metrics/quality_gates.json`, `metrics/mirror_parity.json`, `metrics/public_surface_qa.json`, `metrics/scope_claims_audit.json`, `metrics/publication_audit.json`, `metrics/task_surface_integrity.json`, `metrics/website_integrity.json`, `metrics/project_manifest.json` | | 2 | Are source facts consistently presented? | `SOURCE_ALIGNMENT_AUDIT.md`, `metrics/source_alignment_audit.json`, `scripts/validate_source_alignment.py` | | 3 | How do I reproduce it? | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json`, companion GitHub `notes/reproducibility_audit.md` | | 4 | What is one model input? | `artifacts/episode_task_suite/feature_manifest.json`, `artifacts/episode_task_suite/available_modalities.json`, companion artifact dataset `windows.csv` | | 5 | Are the task results backed by files? | `artifacts/episode_task_suite/summary_report.json`, `artifacts/episode_task_suite/neural_mlp/`, `metrics/summary_metrics.json` | | 6 | What is still pending? | companion GitHub `results/omni_finetune/DATA_BLOCKER_REPORT.md` and `MULTI_EPISODE_ACCESS_STATUS.md` | ## Official Dataset Alignment The model card mirrors the official-source alignment artifact at `metrics/xperience10m_dataset_card_alignment.json` plus `XPERIENCE10M_DATASET_CARD_ALIGNMENT.md`. That file records the official `ropedia-ai/xperience-10m` card scope, manually gated access, full-scale modalities, episode layout, intended uses, and the claims this small baseline repo does not make. It also records the public sample card (`cc-by-nc-4.0`, HOMIE Toolkit, Rerun 0.29.0 `.rrd` visualization) and the current HF API listing snapshot: 803 session folders and 12,103 episode folders with `annotation.hdf5`, plus the live HF 31.9 TB file-size display. The 31.9 TB display is tracked separately from the official card's about-1PB full-scale storage statement. Those are upstream metadata facts, not local downloads, raw-data redistribution, or model-quality evidence. The public model card also preserves the upstream responsible-use boundary that the dataset is limited in diversity. ## Qwen3-Omni LoRA Boundary The companion GitHub repo includes scripts for Xperience-10M multi-episode access, staging, manifest building, and a Qwen3-Omni LoRA pilot path. The current LoRA checkpoint is a readiness artifact from one locally available episode and 128 train windows. It is not a full 32-episode result. The next real model milestone is a 32-episode held-out-episode LoRA pilot after access to `ropedia-ai/xperience-10m` is approved. The staging plan selects 32 complete episodes from 32 different top-level session UUIDs, then builds held-out episode manifests for training and evaluation. ## Minimal and Neural Architecture ![Minimal 12-task architecture](assets/task_architectures.png) The committed heads are intentionally small: - z-score + linear softmax classifiers - dual ridge regression/projection heads - sigmoid multi-label logistic regression - cosine ranking for retrieval tasks - z-score + PyTorch MLP heads for all 12 human-readable task cards ## Metrics Snapshot These are single-episode chronological-split metrics. They are useful for debugging task definitions and input contracts, not for claiming cross-episode generalization. | Task | Neural MLP metric | Minimal metric | | --- | ---: | ---: | | Action Recognition macro-F1 | 0.0263 | 0.0500 | | Procedure Step Recognition macro-F1 | 0.0175 | 0.0495 | | Action Boundary Detection macro-F1 | 0.6485 | 0.6552 | | Next-Action Prediction macro-F1 | 0.0235 | 0.0593 | | Hand Trajectory Forecasting MPJPE, lower is better | 0.1116 | 0.8223 | | Contact State Prediction macro-F1 | 1.0000 | 1.0000 | | Object Relevance Prediction micro-F1 | 0.1798 | 0.1839 | | Language Grounding MRR | 0.0178 | 0.0172 | | Cross-Modal Retrieval MRR | 0.1530 | 0.2634 | | Cross-Modal Reconstruction R2 | -0.0102 | -0.0160 | | Temporal Order Verification F1 | 0.8718 | 0.5487 | | Multimodal Synchronization Detection F1 | 0.7335 | 0.4866 | ## Included - `artifacts/**/model.npz`: minimal baseline weights, scalers, and labels - `artifacts/episode_task_suite/neural_mlp/**/model.pt`: neural MLP task-head checkpoints - `artifacts/episode_task_suite/neural_mlp/**/history.json`: neural training traces - `artifacts/**/metrics.json`: committed metrics - `artifacts/**/feature_manifest.json`: feature block boundaries where relevant - `assets/`: mirrored figures, modality thumbnails, and brand assets - `metrics/`: mirrored project status, protocol, source-alignment, publication, and scope-claim JSON files - `metrics/public_surface_qa.json`: public repo, website, and Hugging Face presentation QA - `scripts/`: reproduction, visualization, and validation scripts ## Data Notice This repo does not redistribute raw Xperience-10M videos or raw `annotation.hdf5`. Download the original sample from Ropedia / Hugging Face and follow the dataset terms: - https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample - https://huggingface.co/datasets/ropedia-ai/xperience-10m - https://ropedia.com/dataset ## Links | Resource | URL | | --- | --- | | Hugging Face Space | https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite | | Live Hugging Face app | https://cy0307-ropedia-xperience-10m-task-suite.static.hf.space/ | | Artifact dataset | https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts | | GitHub repo | https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite | | GitHub Pages dashboard | https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/ | | Xperience-10M website | https://ropedia.com/dataset | | Xperience-10M release page | https://ropedia.com/blog/20260316_xperience_10m | | Ropedia GitHub organization | https://github.com/Ropedia | | HOMIE Toolkit | https://github.com/Ropedia/HOMIE-toolkit |