cy0307's picture
Add files using upload-large-folder tool
a49986a verified
|
Raw
History Blame Contribute Delete
4.34 kB

Project Brief

This project turns the public Ropedia Xperience-10M sample into a concrete research task lab for embodied AI. It is designed to answer a practical question: what can be built, measured, and extended from a richly synchronized egocentric episode before scaling to held-out multi-episode training?

Research Intent

The public sample is treated as a small but real research system. The project does not try to inflate one episode into a final benchmark. Instead, it shows the full path from data inspection to task design, baseline modeling, evaluation, artifact packaging, and a guarded scale-up plan. A reader should be able to trace one model input, understand each task, reproduce the public-sample results, and see what remains before multi-episode model-quality claims.

Capability Map

Capability Evidence in this project
Data understanding feature_manifest.json, available_modalities.json, modality atlas, episode-window HF viewer
Task design 20 unified task contracts, task cards, case-study walkthroughs, and four research-direction extension probes
Evaluation rigor chronological split, per-task metrics, predictions, confusion matrices, leakage notes, and generated takeaways
Scale-up planning Final verified 96/16/16 Qwen3-Omni diagnostic result, same-split 128-episode baseline alignment, Cosmos3-Nano compatibility branch, and policy-model candidates after action-space conversion

What Exists Now

Layer Current artifact
Data unit 1 public sample episode, 5,821 frames, 1,161 synchronized 20-frame windows
Modalities Video-derived features, audio, depth, pose/SLAM, mocap, IMU, calibration, and language-derived features
Task suite 20 embodied-AI task contracts with inputs, targets, metrics, predictions, and setup alignment
Models Minimal linear/ridge/logistic baselines plus compact PyTorch MLP heads for the unified 20-task public-sample suite
Research map Four Ropedia research directions with direct, proxy, diagnostic, and extension-task coverage
Scale-up path A selected 96/16/16 Qwen3-Omni LoRA final diagnostic result is verified; strict-JSON validity meets target, while weak action/subtask metrics guide the next error-analysis pass

How To Read It

  1. Start with PUBLIC_READER_MAP.md if you need to choose between GitHub, the website, Hugging Face artifacts, baseline weights, model branches, or release-health files.
  2. Start with the website or this brief to understand the project shape.
  3. Open RESEARCH_ROADMAP.md to see how the work scales from the public sample to multi-episode modeling.
  4. Open EVALUATION_PROTOCOL.md before comparing task scores.
  5. Use RESEARCH_TAKEAWAYS.md for the current metric interpretation.
  6. Inspect results/episode_task_suite/feature_manifest.json to understand one model input.
  7. Use TASK_SUITE_20.md and docs/data/task_suite_20.json to read the unified 20-task suite; the historical docs/data/tier2_task_suite.json path stores the tasks 13-20 result bundle.
  8. Use docs/data/omni_finetune_verified_result.json for the current multi-episode Qwen3-Omni pilot result.

What This Enables

The public sample is enough to build and verify task definitions, feature contracts, metrics, visualization, and baseline code. It is not enough to measure final model quality for a general embodied-AI model. The first multi-episode Qwen3-Omni diagnostic pilot now verifies the held-out training loop with validation loss recorded; the next research stage is to improve JSON-format reliability and error analysis before larger robustness or alternative backbone claims.

Best Entry Points