--- license: other library_name: pytorch tags: - robotics - embodied-ai - multimodal - ropedia - xperience-10m - baseline - neural-network - pytorch - linear-model - retrieval metrics: - accuracy - f1 - mean-reciprocal-rank - mean-squared-error model-index: - name: Ropedia Xperience-10M Task Baselines results: - task: type: robotics name: Cross-modal retrieval dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: top_5_accuracy value: 0.3764 name: top-5 retrieval accuracy - type: mrr value: 0.2634 name: mean reciprocal rank - task: type: robotics name: Transition detection dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: f1 value: 0.6552 name: macro-F1 - task: type: robotics name: Temporal order dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: f1 value: 0.8718 name: neural MLP F1 --- # Ropedia Xperience-10M Task Baselines This repo stores the minimal baseline weights, neural MLP task-head checkpoints, and metrics for the 12-task Xperience-10M episode suite, plus four lightweight direction-extension probes. It is meant to be read like a model audit, not advertised as a robot foundation model. ![12-task suite with sample modalities](assets/task_suite_infographic.png?v=xperience10m-modalities-v9-large-atlas) The source Xperience-10M sample spans video, audio, depth, pose, motion capture, inertial sensing, and language annotation. The committed minimal and neural task heads use the current 8,378-d feature manifest; audio is documented in the figures but is not yet extracted into a model input feature block. The companion dashboard and this model card mirror the responsive modality atlas metadata in `metrics/modality_atlas.json`, with standalone derived thumbnails in `assets/modalities/`. The committed heads are intentionally small: - z-score + linear softmax classifiers, - dual ridge regression/projection heads, - sigmoid multi-label logistic regression, - cosine ranking for retrieval tasks. - z-score + PyTorch MLP heads for all 12 task definitions. The included architecture and suite figures use the same Ropedia-inspired dark visual system as the public dashboard, but the text, dimensions, and metrics are generated from the committed artifacts rather than drawn by hand. Their purpose is to make every input/output contract auditable before scaling to many episodes. ## 90-Second Reviewer Path | Step | Question | Primary artifacts | | --- | --- | --- | | 1 | What is actually claimed? | `EVIDENCE_CONTRACT.md`, `ARTIFACT_GUIDE.md`, `metrics/artifact_index.json`, `metrics/mirror_parity.json`, `metrics/scope_claims_audit.json`, `metrics/publication_audit.json`, `metrics/website_integrity.json`, `metrics/project_manifest.json` | | 2 | How do I reproduce it? | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json`, companion GitHub `notes/reproducibility_audit.md` | | 3 | What is one model input? | `artifacts/episode_task_suite/feature_manifest.json`, `artifacts/episode_task_suite/available_modalities.json`, companion artifact dataset `windows.csv` | | 4 | Are the task results backed by files? | `artifacts/episode_task_suite/summary_report.json`, `artifacts/episode_task_suite/neural_mlp/`, `metrics/summary_metrics.json` | | 5 | What is still pending? | companion GitHub `results/omni_finetune/DATA_BLOCKER_REPORT.md` and `A100_HF_RELAY_STATUS.md` | Human-readable artifact guide mirror: `ARTIFACT_GUIDE.md`. Machine-readable reviewer packet mirror: `metrics/reviewer_packet.json`. Source-of-truth artifact index mirror: `metrics/artifact_index.json`. ## Evidence Boundary | Claim layer | Evidence | Boundary | | --- | --- | --- | | Baseline weights | `artifacts/**/model.npz` | lightweight heads only | | Neural checkpoints | `artifacts/episode_task_suite/neural_mlp/**/model.pt` | same single-episode windows and splits | | Metrics | `artifacts/**/metrics.json`, prediction CSV/NPZ files | debugging and task-contract evidence | | Feature contract | `artifacts/**/feature_manifest.json` | audio documented but not featurized | | Qwen3-Omni | companion blocker and relay reports | smoke-only until 32 valid episodes are available | | Scope claims guard | `metrics/scope_claims_audit.json` and `scripts/validate_scope_claims.py` | historical `32ep` path strings are provenance, not 32-episode results | | Mirror parity | `metrics/mirror_parity.json` and `scripts/validate_mirror_parity.py` | prepared repo/HF mirrors carry matching critical files | | Publication hygiene | `metrics/publication_audit.json` and validator script mirror | public bundles contain no raw data, generated caches, heavy archives, or token strings | | Website integrity | `metrics/website_integrity.json` and validator script mirror | local links, anchors, JSON bundles, and referenced images only | | Artifact index | `metrics/artifact_index.json` and `scripts/build_artifact_index.py` | compact catalog of the reviewer-critical proof artifacts | | Artifact guide | `ARTIFACT_GUIDE.md` | human-readable map of proof boundary, task evidence, mirrors, and scale-up status | | Reproducibility | `REPRODUCIBILITY.md`, `metrics/reproducibility_matrix.json` | public commands, expected outputs, exact-match audit evidence, and non-reproducible boundaries | | Citation metadata | GitHub `CITATION.cff`, `codemeta.json`, `project_manifest.json`, and `reviewer_packet.json` | code license remains separate from Xperience-10M dataset terms | ## Qwen3-Omni LoRA Boundary The companion GitHub repo now includes scripts for an A100-to-H20 Xperience-10M relay and a Qwen3-Omni LoRA pilot path. The current LoRA checkpoint is a technical smoke artifact from one locally available episode and 128 train windows. It is not a full 32-episode result. The next real model milestone is a 32-episode held-out-episode LoRA pilot after Hugging Face access to `ropedia-ai/xperience-10m` is approved. The staging plan selects 32 complete episodes from 32 different top-level session UUIDs, then transfers them to H20 for manifest building, training, and evaluation. ## What To Look At First | Artifact | Why it is useful | | --- | --- | | `artifacts/**/model.npz` | stores the exact lightweight weights and scalers | | `artifacts/episode_task_suite/neural_mlp/**/model.pt` | stores the neural MLP checkpoints | | `artifacts/**/metrics.json` | records the committed metric values | | `artifacts/**/feature_manifest.json` | maps feature blocks back to source modalities | | `artifacts/episode_task_suite/research_directions/` | maps every task to the four Ropedia research directions with minimal-vs-neural readouts | | `artifacts/episode_task_suite/research_direction_extensions/` | adds one coded extension probe per research direction | | `artifacts/episode_task_suite/task_walkthroughs/` | explains every task with case study, input, process modules, output, and limitation | | `assets/task_architectures.png` | shows the shared pipeline and all 12 heads | | `assets/task_suite_infographic.png` | presents the 12 heads with public-sample modality thumbnails and verified metrics | | `assets/modalities/`, `metrics/modality_atlas.json` | responsive modality-card thumbnails and metadata for sample inspection | | `metrics/artifact_index.json` | indexes proof artifacts with existence, size, and stable-file hashes | | `metrics/mirror_parity.json` | verifies prepared repo/HF mirrors have matching critical files before upload | | `metrics/scope_claims_audit.json` | verifies historical `32ep` smoke-run identifiers are not presented as real 32-episode results | | `metrics/publication_audit.json` | records the latest public-bundle hygiene check | | `metrics/website_integrity.json` | records the latest local website link, anchor, JSON, and image integrity check | | `metrics/project_manifest.json` | mirrors the public URL and citation metadata bundle | ## Included - `artifacts/**/model.npz`: minimal baseline weights, scalers, and labels - `artifacts/episode_task_suite/neural_mlp/**/model.pt`: neural MLP task-head checkpoints - `artifacts/episode_task_suite/neural_mlp/**/history.json`: neural training traces - `artifacts/**/metrics.json`: committed metrics - `artifacts/**/feature_manifest.json`: feature block boundaries where relevant - `artifacts/episode_task_suite/research_directions/*.json|*.csv|*.md`: four-track task taxonomy - `artifacts/episode_task_suite/research_direction_extensions/*.json|*.csv|*.md`: four extension-probe metrics and predictions - `artifacts/episode_task_suite/task_walkthroughs/*.json|*.md`: beginner walkthroughs for all 12 tasks - `scripts/*.py`: training and visualization scripts - `scripts/validate_mirror_parity.py`: prepared mirror parity validator - `scripts/validate_scope_claims.py`: Qwen3-Omni smoke/result claim-boundary validator - `scripts/validate_publication_package.py`: publication hygiene validator - `scripts/validate_website_integrity.py`: website local-reference validator - `notes/*.md`: interpretation and reproducibility notes The companion artifact dataset repo stores CSV/JSON predictions and dashboard assets: https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts The public visual dashboard is here: https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite Direct static app: https://cy0307-ropedia-xperience-10m-task-suite.static.hf.space/ The full Hugging Face collection is here: https://huggingface.co/collections/cy0307/ropedia-xperience-10m-task-suite ## Minimal and Neural Architecture ![Minimal 12-task architecture](assets/task_architectures.png) ## Four Research Directions The baselines are also grouped by the four Ropedia research tracks: | Direction | Current status | Baseline evidence | | --- | --- | --- | | A. Human Modeling & Motion Understanding | partially implemented | hand trajectory forecasting improves from `0.8223` to `0.1116` MPJPE with the neural MLP; contact is degenerate in this sample | | B. 3D/4D Reconstruction & Neural Rendering | proxy tasks only | cross-modal retrieval, feature reconstruction, and misalignment are prerequisites, not full neural rendering | | C. Egocentric Vision & Interaction | strongest implemented track | action/subtask/transition/next-action/object/caption tasks plus alignment/order diagnostics | | D. Scene Reconstruction & World Modeling | early proxy tasks | state, object, retrieval, reconstruction, and temporal tasks are first probes before scene graphs or maps | Primary taxonomy file: `artifacts/episode_task_suite/research_directions/research_direction_taxonomy.json` ## Direction-Extension Probe Snapshot | Direction | Extension task | Minimal | Neural MLP | | --- | --- | ---: | ---: | | A. Human Modeling & Motion Understanding | `body_motion_intensity` | 0.7827 macro-F1 | 0.7986 macro-F1 | | B. 3D/4D Reconstruction & Neural Rendering | `multi_view_consistency_retrieval` | 0.5534 MRR | 0.3469 MRR | | C. Egocentric Vision & Interaction | `action_phase_progress` | 0.3416 MAE | 0.3038 MAE | | D. Scene Reconstruction & World Modeling | `ego_motion_forecast` | 0.1989 MAE | 0.0989 MAE | These probes reuse the same 1,161-window feature tensor and chronological split style. They are direction-specific diagnostics, not full human-body, neural rendering, intent, or world-model solutions. ## Metrics Snapshot | Task | Neural MLP metric | Minimal metric | | --- | ---: | ---: | | `timeline_action` macro-F1 | 0.0263 | 0.0500 | | `timeline_subtask` macro-F1 | 0.0175 | 0.0495 | | `transition_detection` macro-F1 | 0.6485 | 0.6552 | | `next_action` macro-F1 | 0.0235 | 0.0593 | | `hand_trajectory_forecast` MPJPE, lower is better | 0.1116 | 0.8223 | | `contact_prediction` macro-F1 | 1.0000 | 1.0000 | | `object_relevance` micro-F1 | 0.1798 | 0.1839 | | `caption_grounding` MRR | 0.0178 | 0.0172 | | `cross_modal_retrieval` MRR | 0.1530 | 0.2634 | | `modality_reconstruction` R2 | -0.0102 | -0.0160 | | `temporal_order` F1 | 0.8718 | 0.5487 | | `misalignment_detection` F1 | 0.7335 | 0.4866 | ## Data Notice This repo does not redistribute raw Xperience-10M videos or raw `annotation.hdf5`. Download the original sample from Ropedia / Hugging Face and follow the dataset terms: - https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample - https://ropedia.com/dataset ## Source GitHub: https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite GitHub Pages: https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/