cy0307's picture
Publish Ropedia Xperience-10M task baseline cards
9d58132 verified
|
Raw
History Blame
18.4 kB
metadata
license: other
library_name: pytorch
tags:
  - robotics
  - embodied-ai
  - multimodal
  - ropedia
  - xperience-10m
  - baseline
  - neural-network
  - pytorch
  - linear-model
  - retrieval
metrics:
  - accuracy
  - f1
  - mean-reciprocal-rank
  - mean-squared-error
model-index:
  - name: Ropedia Xperience-10M Task Baselines
    results:
      - task:
          type: robotics
          name: Cross-modal retrieval
        dataset:
          type: ropedia-ai/xperience-10m-sample
          name: Xperience-10M public sample episode
        metrics:
          - type: top_5_accuracy
            value: 0.3764
            name: top-5 retrieval accuracy
          - type: mrr
            value: 0.2634
            name: mean reciprocal rank
      - task:
          type: robotics
          name: Transition detection
        dataset:
          type: ropedia-ai/xperience-10m-sample
          name: Xperience-10M public sample episode
        metrics:
          - type: f1
            value: 0.6552
            name: macro-F1
      - task:
          type: robotics
          name: Temporal order
        dataset:
          type: ropedia-ai/xperience-10m-sample
          name: Xperience-10M public sample episode
        metrics:
          - type: f1
            value: 0.8718
            name: neural MLP F1

Ropedia Xperience-10M Task Baselines

This repo stores the minimal baseline weights, neural MLP task-head checkpoints, and metrics for the 12-task Xperience-10M episode suite, plus four lightweight direction-extension probes. It is meant to be read like a model audit, not advertised as a robot foundation model.

12-task suite with sample modalities

The source Xperience-10M sample spans video, audio, depth, pose, motion capture, inertial sensing, and language annotation. The committed minimal and neural task heads use the current 8,378-d feature manifest; audio is documented in the figures but is not yet extracted into a model input feature block. The companion dashboard and this model card start with the task-first 12-head map, then mirror the responsive modality atlas metadata in metrics/modality_atlas.json, with standalone derived thumbnails in assets/modalities/.

The model repo also mirrors the official-source alignment artifact at metrics/xperience10m_dataset_card_alignment.json plus XPERIENCE10M_DATASET_CARD_ALIGNMENT.md. That file records the official ropedia-ai/xperience-10m card scope, gated access, full-scale modalities, episode layout, intended uses, and the claims this small baseline repo does not make. It also records the public sample card (cc-by-nc-4.0, HOMIE Toolkit, Rerun 0.29.0 .rrd visualization) and the current HF API listing snapshot: 803 session folders and 12,103 episode folders with annotation.hdf5, plus the live HF 31.9 TB file-size display. The 31.9 TB display is tracked separately from the official card's about-1PB full-scale storage statement. Those are upstream metadata facts, not local downloads, raw-data redistribution, or model-quality evidence. The source note also preserves the official limited in diversity / showcase-quality disclaimer and excludes identity, surveillance, biometric, sensitive-attribute, and safety-critical uses. The source-alignment audit is mirrored at SOURCE_ALIGNMENT_AUDIT.md and metrics/source_alignment_audit.json; it validates the same full-dataset, public sample-card, API-listing, and current-project boundary markers across the repo, website, artifact dataset, Space, and this model card.

For first-pass model review, use REVIEWER_SCORECARD.md and metrics/reviewer_scorecard.json. They state which baseline artifacts are verified, which Omni claims remain data-gated, and which raw data/weights are intentionally excluded. Use EVALUATION_PROTOCOL.md and metrics/evaluation_protocol.json before reading scores; they define the window unit, chronological split, leakage controls, per-task metrics, and unsupported interpretations. Use FIGURE_INDEX.md and metrics/figure_index.json to audit the public figures, charts, modality thumbnails, dimensions, stable hashes, and source scripts mirrored into this model repo.

The committed heads are intentionally small:

  • z-score + linear softmax classifiers,
  • dual ridge regression/projection heads,
  • sigmoid multi-label logistic regression,
  • cosine ranking for retrieval tasks.
  • z-score + PyTorch MLP heads for all 12 task definitions.

The included architecture and suite figures use the same Ropedia-inspired dark visual system as the public dashboard, but the text, dimensions, and metrics are generated from the committed artifacts rather than drawn by hand.

Their purpose is to make every input/output contract auditable before scaling to many episodes.

90-Second Reviewer Path

Step Question Primary artifacts
1 What is actually claimed? REVIEWER_SCORECARD.md, metrics/reviewer_scorecard.json, EVIDENCE_CONTRACT.md, ARTIFACT_GUIDE.md, QUALITY_GATES.md, FIGURE_INDEX.md, metrics/artifact_index.json, metrics/figure_index.json, metrics/live_publication_status.json, metrics/quality_gates.json, metrics/mirror_parity.json, metrics/scope_claims_audit.json, metrics/publication_audit.json, metrics/website_integrity.json, metrics/project_manifest.json
2 Are source facts consistently presented? SOURCE_ALIGNMENT_AUDIT.md, metrics/source_alignment_audit.json, scripts/validate_source_alignment.py
3 How do I reproduce it? REPRODUCIBILITY.md, metrics/reproducibility_matrix.json, companion GitHub notes/reproducibility_audit.md
4 What is one model input? artifacts/episode_task_suite/feature_manifest.json, artifacts/episode_task_suite/available_modalities.json, companion artifact dataset windows.csv
5 Are the task results backed by files? artifacts/episode_task_suite/summary_report.json, artifacts/episode_task_suite/neural_mlp/, metrics/summary_metrics.json
6 What is still pending? companion GitHub results/omni_finetune/DATA_BLOCKER_REPORT.md and A100_HF_RELAY_STATUS.md

Human-readable artifact guide mirror: ARTIFACT_GUIDE.md. Reviewer scorecard mirror: REVIEWER_SCORECARD.md and metrics/reviewer_scorecard.json. Official dataset-card alignment mirror: XPERIENCE10M_DATASET_CARD_ALIGNMENT.md and metrics/xperience10m_dataset_card_alignment.json. Source-alignment audit mirror: SOURCE_ALIGNMENT_AUDIT.md and metrics/source_alignment_audit.json. Publication quality gates mirror: QUALITY_GATES.md and metrics/quality_gates.json. Live publication status mirror: metrics/live_publication_status.json. Machine-readable reviewer packet mirror: metrics/reviewer_packet.json. Source-of-truth artifact index mirror: metrics/artifact_index.json. Source-of-truth figure index mirror: FIGURE_INDEX.md and metrics/figure_index.json.

Evidence Boundary

Claim layer Evidence Boundary
Reviewer scorecard REVIEWER_SCORECARD.md, metrics/reviewer_scorecard.json compact verified/data-gated/not-redistributed decision table
Baseline weights artifacts/**/model.npz lightweight heads only
Neural checkpoints artifacts/episode_task_suite/neural_mlp/**/model.pt same single-episode windows and splits
Metrics artifacts/**/metrics.json, prediction CSV/NPZ files debugging and task-contract evidence
Feature contract artifacts/**/feature_manifest.json audio documented but not featurized
Evaluation protocol EVALUATION_PROTOCOL.md, metrics/evaluation_protocol.json windowing, chronological split, leakage controls, and task metrics
Qwen3-Omni companion blocker and relay reports smoke-only until 32 valid episodes are available
Scope claims guard metrics/scope_claims_audit.json and scripts/validate_scope_claims.py historical 32ep path strings are provenance, not 32-episode results
Mirror parity metrics/mirror_parity.json and scripts/validate_mirror_parity.py prepared repo/HF mirrors carry matching critical data, figures, website HTML, and validator files
Publication hygiene metrics/publication_audit.json and validator script mirror public bundles contain no raw data, generated caches, heavy archives, token strings, or stale public-card figure references
Website integrity metrics/website_integrity.json and validator script mirror local links, anchors, JSON bundles, and referenced images only
Quality gates QUALITY_GATES.md, metrics/quality_gates.json, and scripts/build_quality_gates.py automated release gates plus live post-publish checks
Live publication metrics/live_publication_status.json, scripts/verify_live_publication.py last public GitHub/HF URL verification after upload
Official dataset card alignment XPERIENCE10M_DATASET_CARD_ALIGNMENT.md, metrics/xperience10m_dataset_card_alignment.json official source scope, public sample card, HF API listing, gated access, modality coverage, scale, and this repo's single-episode boundary
Source alignment audit SOURCE_ALIGNMENT_AUDIT.md, metrics/source_alignment_audit.json, scripts/validate_source_alignment.py validates full-dataset facts, sample-card facts, API-listing caveats, and public-card boundary markers
Figure index FIGURE_INDEX.md, metrics/figure_index.json, scripts/build_figure_index.py public figures, charts, modality thumbnails, dimensions, hashes, and generation provenance
Artifact index metrics/artifact_index.json and scripts/build_artifact_index.py compact catalog of the reviewer-critical proof artifacts
Artifact guide ARTIFACT_GUIDE.md human-readable map of proof boundary, task evidence, mirrors, and scale-up status
Reproducibility REPRODUCIBILITY.md, metrics/reproducibility_matrix.json public commands, expected outputs, exact-match audit evidence, and non-reproducible boundaries
Citation metadata GitHub CITATION.cff, codemeta.json, project_manifest.json, and reviewer_packet.json code license remains separate from Xperience-10M dataset terms

Qwen3-Omni LoRA Boundary

The companion GitHub repo now includes scripts for an A100-to-H20 Xperience-10M relay and a Qwen3-Omni LoRA pilot path. The current LoRA checkpoint is a technical smoke artifact from one locally available episode and 128 train windows. It is not a full 32-episode result.

The next real model milestone is a 32-episode held-out-episode LoRA pilot after Hugging Face access to ropedia-ai/xperience-10m is approved. The staging plan selects 32 complete episodes from 32 different top-level session UUIDs, then transfers them to H20 for manifest building, training, and evaluation.

What To Look At First

Artifact Why it is useful
REVIEWER_SCORECARD.md, metrics/reviewer_scorecard.json gives the compact current decision boundary before reading the full audit trail
artifacts/**/model.npz stores the exact lightweight weights and scalers
artifacts/episode_task_suite/neural_mlp/**/model.pt stores the neural MLP checkpoints
artifacts/**/metrics.json records the committed metric values
artifacts/**/feature_manifest.json maps feature blocks back to source modalities
EVALUATION_PROTOCOL.md, metrics/evaluation_protocol.json defines task-unit, split, metric, leakage-control, and unsupported-interpretation rules
artifacts/episode_task_suite/research_directions/ maps every task to the four Ropedia research directions with minimal-vs-neural readouts
artifacts/episode_task_suite/research_direction_extensions/ adds one coded extension probe per research direction
artifacts/episode_task_suite/task_walkthroughs/ explains every task with case study, input, process modules, output, and limitation
assets/task_architectures.png shows the shared pipeline and all 12 heads
assets/task_suite_infographic.png presents the shared processing contract, 12 heads, verified metrics, and public-sample modality thumbnails
assets/modalities/, metrics/modality_atlas.json responsive modality-card thumbnails and metadata for sample inspection
XPERIENCE10M_DATASET_CARD_ALIGNMENT.md, metrics/xperience10m_dataset_card_alignment.json aligns public wording with the official gated Xperience-10M card, sample card, and HF API metadata
SOURCE_ALIGNMENT_AUDIT.md, metrics/source_alignment_audit.json verifies source facts and boundary markers across GitHub, the website, and HF cards
FIGURE_INDEX.md, metrics/figure_index.json verifies public figures, charts, thumbnails, dimensions, hashes, and source scripts
metrics/artifact_index.json indexes proof artifacts with existence, size, and stable-file hashes
metrics/mirror_parity.json verifies prepared repo/HF mirrors have matching critical data, figures, website HTML, and validator files before upload
metrics/scope_claims_audit.json verifies historical 32ep smoke-run identifiers are not presented as real 32-episode results
QUALITY_GATES.md, metrics/quality_gates.json summarizes the automated and post-publish release checks
metrics/live_publication_status.json records the last live public URL verification after upload
metrics/publication_audit.json records the latest public-bundle hygiene and public-card freshness check
metrics/website_integrity.json records the latest local website link, anchor, JSON, and image integrity check
metrics/project_manifest.json mirrors the public URL and citation metadata bundle

Included

  • artifacts/**/model.npz: minimal baseline weights, scalers, and labels
  • artifacts/episode_task_suite/neural_mlp/**/model.pt: neural MLP task-head checkpoints
  • artifacts/episode_task_suite/neural_mlp/**/history.json: neural training traces
  • artifacts/**/metrics.json: committed metrics
  • artifacts/**/feature_manifest.json: feature block boundaries where relevant
  • artifacts/episode_task_suite/research_directions/*.json|*.csv|*.md: four-track task taxonomy
  • artifacts/episode_task_suite/research_direction_extensions/*.json|*.csv|*.md: four extension-probe metrics and predictions
  • artifacts/episode_task_suite/task_walkthroughs/*.json|*.md: beginner walkthroughs for all 12 tasks
  • REVIEWER_SCORECARD.md, metrics/reviewer_scorecard.json: compact current decision table
  • scripts/*.py: training and visualization scripts
  • scripts/validate_mirror_parity.py: prepared mirror parity validator
  • scripts/validate_scope_claims.py: Qwen3-Omni smoke/result claim-boundary validator
  • scripts/validate_publication_package.py: publication hygiene validator
  • scripts/validate_website_integrity.py: website local-reference validator
  • notes/*.md: interpretation and reproducibility notes

The companion artifact dataset repo stores CSV/JSON predictions and dashboard assets:

https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts

The public visual dashboard is here:

https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite

Direct static app:

https://cy0307-ropedia-xperience-10m-task-suite.static.hf.space/

The full Hugging Face collection is here:

https://huggingface.co/collections/cy0307/ropedia-xperience-10m-task-suite

Minimal and Neural Architecture

Minimal 12-task architecture

Four Research Directions

The baselines are also grouped by the four Ropedia research tracks:

Direction Current status Baseline evidence
A. Human Modeling & Motion Understanding partially implemented hand trajectory forecasting improves from 0.8223 to 0.1116 MPJPE with the neural MLP; contact is degenerate in this sample
B. 3D/4D Reconstruction & Neural Rendering proxy tasks only cross-modal retrieval, feature reconstruction, and misalignment are prerequisites, not full neural rendering
C. Egocentric Vision & Interaction strongest implemented track action/subtask/transition/next-action/object/caption tasks plus alignment/order diagnostics
D. Scene Reconstruction & World Modeling early proxy tasks state, object, retrieval, reconstruction, and temporal tasks are first probes before scene graphs or maps

Primary taxonomy file:

artifacts/episode_task_suite/research_directions/research_direction_taxonomy.json

Direction-Extension Probe Snapshot

Direction Extension task Minimal Neural MLP
A. Human Modeling & Motion Understanding body_motion_intensity 0.7827 macro-F1 0.7986 macro-F1
B. 3D/4D Reconstruction & Neural Rendering multi_view_consistency_retrieval 0.5534 MRR 0.3469 MRR
C. Egocentric Vision & Interaction action_phase_progress 0.3416 MAE 0.3038 MAE
D. Scene Reconstruction & World Modeling ego_motion_forecast 0.1989 MAE 0.0989 MAE

These probes reuse the same 1,161-window feature tensor and chronological split style. They are direction-specific diagnostics, not full human-body, neural rendering, intent, or world-model solutions.

Metrics Snapshot

Task Neural MLP metric Minimal metric
timeline_action macro-F1 0.0263 0.0500
timeline_subtask macro-F1 0.0175 0.0495
transition_detection macro-F1 0.6485 0.6552
next_action macro-F1 0.0235 0.0593
hand_trajectory_forecast MPJPE, lower is better 0.1116 0.8223
contact_prediction macro-F1 1.0000 1.0000
object_relevance micro-F1 0.1798 0.1839
caption_grounding MRR 0.0178 0.0172
cross_modal_retrieval MRR 0.1530 0.2634
modality_reconstruction R2 -0.0102 -0.0160
temporal_order F1 0.8718 0.5487
misalignment_detection F1 0.7335 0.4866

Data Notice

This repo does not redistribute raw Xperience-10M videos or raw annotation.hdf5. Download the original sample from Ropedia / Hugging Face and follow the dataset terms:

Source

GitHub:

https://github.com/ChaoYue0307/ropedia-xperience-10m-task-suite

GitHub Pages:

https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/