File size: 3,594 Bytes

7cc9dbe
 
 
 
 
 
 
1cd1f8d
 
 
 
 
04c0bde
 
 
1cd1f8d
 
 
 
 
 
 
 
 
 
7cc9dbe
 
 
 
 
45c1706
7cc9dbe
 
 
4602161
7cc9dbe
 
 
 
b7a466b
 
 
 
 
 
7cc9dbe

# Project Brief

This project turns the public Ropedia Xperience-10M sample into a concrete
research task lab for embodied AI. It is designed to answer a practical
question: what can be built, measured, and extended from a richly synchronized
egocentric episode before scaling to held-out multi-episode training?

## Research Intent

The public sample is treated as a small but real research system. The project
does not try to inflate one episode into a final benchmark. Instead, it shows
the full path from data inspection to task design, baseline modeling,
evaluation, artifact packaging, and a guarded scale-up plan. A reader should be
able to trace one model input, understand each task, reproduce the public-sample
results, and see what remains before multi-episode model-quality claims.

## Capability Map

| Capability | Evidence in this project |
| --- | --- |
| Data understanding | `feature_manifest.json`, `available_modalities.json`, modality atlas, episode-window HF viewer |
| Task design | 12 task contracts, task cards, case-study walkthroughs, and four research-direction extension probes |
| Evaluation rigor | chronological split, per-task metrics, predictions, confusion matrices, leakage notes, and generated takeaways |
| Scale-up planning | 128-episode selection/relay plan, Qwen3-Omni path, Cosmos 3 branch, and policy-model candidates after action-space conversion |

## What Exists Now

| Layer | Current artifact |
| --- | --- |
| Data unit | 1 public sample episode, 5,821 frames, 1,161 synchronized 20-frame windows |
| Modalities | Video-derived features, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived features |
| Task suite | 12 embodied-AI task contracts with inputs, targets, metrics, predictions, and case-study walkthroughs |
| Models | Minimal linear/ridge/logistic baselines plus compact PyTorch MLP heads for the same 12 tasks |
| Research map | Four Ropedia research directions with direct, proxy, diagnostic, and extension-task coverage |
| Scale-up path | Qwen3-Omni LoRA code path prepared; the gated Xperience-10M dataset is available for a selected 128-episode pilot |

## How To Read It

1. Start with the website or this brief to understand the project shape.
2. Open `RESEARCH_ROADMAP.md` to see how the work scales from the public
   sample to multi-episode modeling.
3. Open `EVALUATION_PROTOCOL.md` before comparing task scores.
4. Use `RESEARCH_TAKEAWAYS.md` for the current metric interpretation.
5. Inspect `results/episode_task_suite/feature_manifest.json` to understand one model input.
6. Use `results/omni_finetune/DATA_ACCESS_STATUS.md` for the multi-episode data status.

## What This Enables

The public sample is enough to build and verify task definitions, feature
contracts, metrics, visualization, and baseline code. It is not enough to
measure final model quality for a general embodied-AI model. The next research
stage is to run the same contracts on held-out episodes, then fine-tune and
evaluate an omni-model with train/test separation at the episode level.

## Best Entry Points

| Entry point | Link |
| --- | --- |
| Visual dashboard | https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/ |
| Interactive HF Space | https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite |
| Derived artifacts | https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts |
| Baseline model bundle | https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines |
| Official Xperience-10M dataset | https://huggingface.co/datasets/ropedia-ai/xperience-10m |