--- license: other library_name: pytorch tags: - robotics - embodied-ai - multimodal - ropedia - xperience-10m - baseline - neural-network - pytorch - linear-model - retrieval metrics: - accuracy - f1 - mean-reciprocal-rank - mean-squared-error model-index: - name: Xperience-10M Minimal and Neural Task Baselines results: - task: type: robotics name: Cross-modal retrieval dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: top_5_accuracy value: 0.3764 name: top-5 retrieval accuracy - type: mrr value: 0.2634 name: mean reciprocal rank - task: type: robotics name: Transition detection dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: f1 value: 0.6552 name: macro-F1 - task: type: robotics name: Temporal order dataset: type: ropedia-ai/xperience-10m-sample name: Xperience-10M public sample episode metrics: - type: f1 value: 0.8718 name: neural MLP F1 --- # Xperience-10M Minimal and Neural Task Baselines This repo stores the minimal baseline weights, neural MLP task-head checkpoints, and metrics for the 12-task Xperience-10M episode suite. It is meant to be read like a model audit, not advertised as a robot foundation model. The source Xperience-10M sample spans video, audio, depth, pose, motion capture, inertial sensing, and language annotation. The committed minimal and neural task heads use the current 8,378-d feature manifest; audio is documented in the figures but is not yet extracted into a model input feature block. The committed heads are intentionally small: - z-score + linear softmax classifiers, - dual ridge regression/projection heads, - sigmoid multi-label logistic regression, - cosine ranking for retrieval tasks. - z-score + PyTorch MLP heads for all 12 task definitions. Their purpose is to make every input/output contract auditable before scaling to many episodes. ## Qwen3-Omni LoRA Boundary The companion GitHub repo now includes scripts for an A100-to-H20 Xperience-10M relay and a Qwen3-Omni LoRA pilot path. The current LoRA checkpoint is a technical smoke artifact from one locally available episode and 128 train windows. It is not a full 32-episode result. The next real model milestone is a 32-episode held-out-episode LoRA pilot after Hugging Face access to `ropedia-ai/xperience-10m` is approved. The staging plan selects 32 complete episodes from 32 different top-level session UUIDs, then transfers them to H20 for manifest building, training, and evaluation. ## What To Look At First | Artifact | Why it is useful | | --- | --- | | `artifacts/**/model.npz` | stores the exact lightweight weights and scalers | | `artifacts/episode_task_suite/neural_mlp/**/model.pt` | stores the neural MLP checkpoints | | `artifacts/**/metrics.json` | records the committed metric values | | `artifacts/**/feature_manifest.json` | maps feature blocks back to source modalities | | `artifacts/episode_task_suite/research_directions/` | maps every task to the four Ropedia research directions with minimal-vs-neural readouts | | `assets/task_architectures.png` | shows the shared pipeline and all 12 heads | | `assets/task_suite_infographic.png` | presents the 12 heads with public-sample modality thumbnails and verified metrics | ## Included - `artifacts/**/model.npz`: minimal baseline weights, scalers, and labels - `artifacts/episode_task_suite/neural_mlp/**/model.pt`: neural MLP task-head checkpoints - `artifacts/episode_task_suite/neural_mlp/**/history.json`: neural training traces - `artifacts/**/metrics.json`: committed metrics - `artifacts/**/feature_manifest.json`: feature block boundaries where relevant - `artifacts/episode_task_suite/research_directions/*.json|*.csv|*.md`: four-track task taxonomy - `scripts/*.py`: training and visualization scripts - `notes/*.md`: interpretation and reproducibility notes The companion artifact dataset repo stores CSV/JSON predictions and dashboard assets: https://huggingface.co/datasets/cy0307/ropedia-episode-task-suite-artifacts The public visual dashboard is here: https://huggingface.co/spaces/cy0307/ropedia-episode-task-suite Direct static app: https://cy0307-ropedia-episode-task-suite.static.hf.space/ The full Hugging Face collection is here: https://huggingface.co/collections/cy0307/ropedia-episode-task-suite ## Minimal and Neural Architecture ![Minimal 12-task architecture](assets/task_architectures.png) ## Four Research Directions The baselines are also grouped by the four Ropedia research tracks: | Direction | Current status | Baseline evidence | | --- | --- | --- | | A. Human Modeling & Motion Understanding | partially implemented | hand trajectory forecasting improves from `0.8223` to `0.1116` MPJPE with the neural MLP; contact is degenerate in this sample | | B. 3D/4D Reconstruction & Neural Rendering | proxy tasks only | cross-modal retrieval, feature reconstruction, and misalignment are prerequisites, not full neural rendering | | C. Egocentric Vision & Interaction | strongest implemented track | action/subtask/transition/next-action/object/caption tasks plus alignment/order diagnostics | | D. Scene Reconstruction & World Modeling | early proxy tasks | state, object, retrieval, reconstruction, and temporal tasks are first probes before scene graphs or maps | Primary taxonomy file: `artifacts/episode_task_suite/research_directions/research_direction_taxonomy.json` ## Metrics Snapshot | Task | Neural MLP metric | Minimal metric | | --- | ---: | ---: | | `timeline_action` macro-F1 | 0.0263 | 0.0500 | | `timeline_subtask` macro-F1 | 0.0175 | 0.0495 | | `transition_detection` macro-F1 | 0.6485 | 0.6552 | | `next_action` macro-F1 | 0.0235 | 0.0593 | | `hand_trajectory_forecast` MPJPE, lower is better | 0.1116 | 0.8223 | | `contact_prediction` macro-F1 | 1.0000 | 1.0000 | | `object_relevance` micro-F1 | 0.1798 | 0.1839 | | `caption_grounding` MRR | 0.0178 | 0.0172 | | `cross_modal_retrieval` MRR | 0.1530 | 0.2634 | | `modality_reconstruction` R2 | -0.0102 | -0.0160 | | `temporal_order` F1 | 0.8718 | 0.5487 | | `misalignment_detection` F1 | 0.7335 | 0.4866 | ## Data Notice This repo does not redistribute raw Xperience-10M videos or raw `annotation.hdf5`. Download the original sample from Ropedia / Hugging Face and follow the dataset terms: - https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample - https://ropedia.com/dataset ## Source GitHub: https://github.com/ChaoYue0307/ropedia-episode-task-suite GitHub Pages: https://chaoyue0307.github.io/ropedia-episode-task-suite/