---
license: other
library_name: pytorch
tags:
  - robotics
  - embodied-ai
  - multimodal
  - ropedia
  - xperience-10m
  - baseline
  - neural-network
  - pytorch
  - linear-model
  - retrieval
metrics:
  - accuracy
  - f1
  - mean-reciprocal-rank
  - mean-squared-error
model-index:
  - name: Xperience-10M Minimal and Neural Task Baselines
    results:
      - task:
          type: robotics
          name: Cross-modal retrieval
        dataset:
          type: ropedia-ai/xperience-10m-sample
          name: Xperience-10M public sample episode
        metrics:
          - type: top_5_accuracy
            value: 0.3764
            name: top-5 retrieval accuracy
          - type: mrr
            value: 0.2634
            name: mean reciprocal rank
      - task:
          type: robotics
          name: Transition detection
        dataset:
          type: ropedia-ai/xperience-10m-sample
          name: Xperience-10M public sample episode
        metrics:
          - type: f1
            value: 0.6552
            name: macro-F1
      - task:
          type: robotics
          name: Temporal order
        dataset:
          type: ropedia-ai/xperience-10m-sample
          name: Xperience-10M public sample episode
        metrics:
          - type: f1
            value: 0.8718
            name: neural MLP F1
---

# Xperience-10M Minimal and Neural Task Baselines

This repo stores the minimal baseline weights, neural MLP task-head checkpoints, and metrics for the 12-task Xperience-10M episode suite. It is meant to be read like a model audit, not advertised as a robot foundation model.

The source Xperience-10M sample spans video, audio, depth, pose, motion
capture, inertial sensing, and language annotation. The committed minimal and
neural task heads use the current 8,378-d feature manifest; audio is documented
in the figures but is not yet extracted into a model input feature block.

The committed heads are intentionally small:

- z-score + linear softmax classifiers,
- dual ridge regression/projection heads,
- sigmoid multi-label logistic regression,
- cosine ranking for retrieval tasks.
- z-score + PyTorch MLP heads for all 12 task definitions.

Their purpose is to make every input/output contract auditable before scaling to many episodes.

## Qwen3-Omni LoRA Boundary

The companion GitHub repo now includes scripts for an A100-to-H20
Xperience-10M relay and a Qwen3-Omni LoRA pilot path. The current LoRA checkpoint
is a technical smoke artifact from one locally available episode and 128 train
windows. It is not a full 32-episode result.

The next real model milestone is a 32-episode held-out-episode LoRA pilot after
Hugging Face access to `ropedia-ai/xperience-10m` is approved. The staging plan
selects 32 complete episodes from 32 different top-level session UUIDs, then
transfers them to H20 for manifest building, training, and evaluation.

## What To Look At First

| Artifact | Why it is useful |
| --- | --- |
| `artifacts/**/model.npz` | stores the exact lightweight weights and scalers |
| `artifacts/episode_task_suite/neural_mlp/**/model.pt` | stores the neural MLP checkpoints |
| `artifacts/**/metrics.json` | records the committed metric values |
| `artifacts/**/feature_manifest.json` | maps feature blocks back to source modalities |
| `artifacts/episode_task_suite/research_directions/` | maps every task to the four Ropedia research directions with minimal-vs-neural readouts |
| `assets/task_architectures.png` | shows the shared pipeline and all 12 heads |
| `assets/task_suite_infographic.png` | presents the 12 heads with public-sample modality thumbnails and verified metrics |

## Included

- `artifacts/**/model.npz`: minimal baseline weights, scalers, and labels
- `artifacts/episode_task_suite/neural_mlp/**/model.pt`: neural MLP task-head checkpoints
- `artifacts/episode_task_suite/neural_mlp/**/history.json`: neural training traces
- `artifacts/**/metrics.json`: committed metrics
- `artifacts/**/feature_manifest.json`: feature block boundaries where relevant
- `artifacts/episode_task_suite/research_directions/*.json|*.csv|*.md`: four-track task taxonomy
- `scripts/*.py`: training and visualization scripts
- `notes/*.md`: interpretation and reproducibility notes

The companion artifact dataset repo stores CSV/JSON predictions and dashboard assets:

https://huggingface.co/datasets/cy0307/ropedia-episode-task-suite-artifacts

The public visual dashboard is here:

https://huggingface.co/spaces/cy0307/ropedia-episode-task-suite

Direct static app:

https://cy0307-ropedia-episode-task-suite.static.hf.space/

The full Hugging Face collection is here:

https://huggingface.co/collections/cy0307/ropedia-episode-task-suite

## Minimal and Neural Architecture

![Minimal 12-task architecture](assets/task_architectures.png)

## Four Research Directions

The baselines are also grouped by the four Ropedia research tracks:

| Direction | Current status | Baseline evidence |
| --- | --- | --- |
| A. Human Modeling & Motion Understanding | partially implemented | hand trajectory forecasting improves from `0.8223` to `0.1116` MPJPE with the neural MLP; contact is degenerate in this sample |
| B. 3D/4D Reconstruction & Neural Rendering | proxy tasks only | cross-modal retrieval, feature reconstruction, and misalignment are prerequisites, not full neural rendering |
| C. Egocentric Vision & Interaction | strongest implemented track | action/subtask/transition/next-action/object/caption tasks plus alignment/order diagnostics |
| D. Scene Reconstruction & World Modeling | early proxy tasks | state, object, retrieval, reconstruction, and temporal tasks are first probes before scene graphs or maps |

Primary taxonomy file:

`artifacts/episode_task_suite/research_directions/research_direction_taxonomy.json`

## Metrics Snapshot

| Task | Neural MLP metric | Minimal metric |
| --- | ---: | ---: |
| `timeline_action` macro-F1 | 0.0263 | 0.0500 |
| `timeline_subtask` macro-F1 | 0.0175 | 0.0495 |
| `transition_detection` macro-F1 | 0.6485 | 0.6552 |
| `next_action` macro-F1 | 0.0235 | 0.0593 |
| `hand_trajectory_forecast` MPJPE, lower is better | 0.1116 | 0.8223 |
| `contact_prediction` macro-F1 | 1.0000 | 1.0000 |
| `object_relevance` micro-F1 | 0.1798 | 0.1839 |
| `caption_grounding` MRR | 0.0178 | 0.0172 |
| `cross_modal_retrieval` MRR | 0.1530 | 0.2634 |
| `modality_reconstruction` R2 | -0.0102 | -0.0160 |
| `temporal_order` F1 | 0.8718 | 0.5487 |
| `misalignment_detection` F1 | 0.7335 | 0.4866 |

## Data Notice

This repo does not redistribute raw Xperience-10M videos or raw `annotation.hdf5`. Download the original sample from Ropedia / Hugging Face and follow the dataset terms:

- https://huggingface.co/datasets/ropedia-ai/xperience-10m-sample
- https://ropedia.com/dataset

## Source

GitHub:

https://github.com/ChaoYue0307/ropedia-episode-task-suite

GitHub Pages:

https://chaoyue0307.github.io/ropedia-episode-task-suite/