cy0307's picture
Add files using upload-large-folder tool
bd4e048 verified
|
Raw
History Blame
10.2 kB

Glossary

This glossary defines project terms that can be easy to confuse across the GitHub repo, website, Hugging Face Space, artifact dataset, model repos, and result matrices. Use it with PUBLIC_READER_MAP.md when choosing what to read first, and with docs/data/glossary.json when a tool needs the same terms in machine-readable form.

How To Read The Terms

Category What it clarifies
Dataset and scope Which data is public, which data is gated upstream, and what each evidence line can support.
Files and features How raw sample files, derived windows, feature manifests, and public-safe artifacts relate to each other.
Tasks and metrics What a scored task row means, when a score is direct, and when a compact proxy is being used.
Models and runs How simple/NN baselines, Qwen3-Omni, Cosmos3, LoRA adapters, and full-parameter gates differ.
Public surfaces Which repo or Hub surface owns which part of the public package.

Core Terms

Term Plain meaning In this project Do not confuse with
Xperience-10M The upstream embodied human-interaction dataset. The source dataset behind the public sample, selected-128 features, task suite, and model diagnostics. This repo itself; the repo only redistributes public-safe derived artifacts.
Public sample episode One officially available sample episode. The fully inspectable Line 1 unit used for raw-file browsing, 20-frame windows, task construction, and single-episode baselines. Multi-episode generalization.
Selected 128 episodes A public-safe selected subset of official gated episode paths. Line 2 uses derived windows/features and keeps links back to official episode ids and gated source paths. Redistributed raw MP4/HDF5/RRD data.
Evidence line A claim boundary for a group of results. Line 1 is one public sample episode; Line 2 is selected-128 held-out comparison. Qwen run versions v1-v6, which are model-run lineage, not evidence lines.
Official gated data Upstream files that require official dataset access. Raw Xperience-10M MP4/HDF5/RRD files and full source directories remain outside the public repo. Public-safe metrics, derived features, figures, and manifests.
Public-safe artifact A file that can be mirrored publicly without raw gated content. Metrics, JSON summaries, model cards, figures, derived manifests, and approved lightweight weights/adapters. Raw dataset redistribution.
Episode One recorded interaction sequence. The basic source unit behind windows, labels, and train/val/test splits. A 20-frame window, which is a smaller model input slice.
20-frame window A fixed short clip slice. The sample episode is converted into aligned 20-frame units for features, labels, and many task heads. A full episode or an arbitrary video segment.
Window stride The frame step between neighboring windows. Used to create overlapping examples while preserving chronological order and leakage controls. Video frame rate.
Feature manifest A map from model-input columns to source modalities. results/episode_task_suite/feature_manifest.json explains the feature groups and dimensions. The raw annotation file.
Raw sample file map A human-readable inventory of the sample episode files. docs/data/raw_sample_files.json explains videos, annotations, calibration, motion, and derived previews. A training manifest.
annotation.hdf5 Upstream annotation container for the sample. Contains original labels/metadata; some public derived files expose hashed or processed features rather than every raw text field. summary_report.json or task result JSON.
Interaction text Natural-language interaction/caption content. Used by task 15 and some derived text features; public matrices record when text targets are direct or compact-proxy. Numeric action ids or subtask ids.
Modality A type of signal. Video, audio, depth, pose/SLAM, motion capture, inertial, calibration, and language-derived signals. A task target.
Task contract The definition of one benchmark task. Includes input, target/output, metric, split, source artifact, and limitation. A model architecture.
Unified 20-task suite The current task surface. Tasks 1-12 plus tasks 13-20 are presented together and scored across methods where real artifacts exist. The historical tier-2 label; tasks 13-20 are now part of the same 20-task suite.
Task-method record One method evaluated on one task. 9 methods x 20 tasks gives 180 public result records. A single prediction row.
Direct score A metric computed against the task target directly. The preferred score type in the 20-task matrix. Compact-proxy score.
Compact-proxy score A bounded proxy metric when a direct raw target is not publicly available. Kept explicit in the matrix and gap audit so readers do not over-read it. A direct target measurement.
Gap audit A coverage and source-status audit. docs/data/task_method_20_gap_audit.json explains scored, proxy, and unsupported cells. A performance leaderboard.
Leakage control A split or feature rule that prevents using future/target information unfairly. Chronological splits, held-out splits, and source audits protect task interpretation. Lower training accuracy.
Minimal baseline A simple non-neural task head. Provides a reproducible lower-complexity comparison for task feasibility. The metadata-only baseline family in the selected-128 matrix.
Neural MLP A compact neural task head. Used for single-episode and selected-128 baseline comparisons. Foundation-model fine-tuning.
Metadata baseline A selected-128 baseline using metadata/text-derived public-safe features. Helps compare simple and neural heads on the held-out split. Raw video/depth/audio feature baselines.
Raw-feature baseline A selected-128 baseline using exported public-safe raw-feature groups. Tracks what non-foundation heads can do with richer processed inputs. Raw gated media redistribution.
Qwen3-Omni The multimodal foundation-model family used for the Qwen branch. The current public 20-task Qwen row is Qwen3-Omni v6 LoRA plus task-specific probes. Cosmos3 or the single-episode task-head baselines.
Qwen v1-v6 The Qwen3-Omni run lineage. v1-v4 are earlier pipeline/ablation evidence, v5 is the prior pinned release, and v6 is the current public 20-task row. Six different evidence lines.
Cosmos3-Super The larger Cosmos3-style branch tracked in this project. Published as Reasoner diagnostics and a separate forward-dynamics LoRA adapter/result branch when verified. Cosmos3-Nano.
Cosmos3-Nano A smaller Cosmos3 compatibility/future-window branch. Used for the Nano Future Window row and related diagnostics. Cosmos3-Super fine-tuned adapter.
LoRA adapter A lightweight set of trainable adapter weights. Published only when the package is verified and public-safe. Full base-model weights.
Full-parameter fine-tuning Updating the whole model rather than only adapters. This project records feasibility gates and short pilots, but does not publish full checkpoints. LoRA adapter publication.
Foundation pipeline A high-level training direction. Spatial intelligence, human-video world modeling, and vision-language-action are documented as trainable directions with task mappings. A completed public result row.
Spatial intelligence Learning geometry and spatial reasoning from egocentric data. Uses video, depth, camera pose, and language tasks to target 3D/space reasoning. World-model future prediction.
Human-video world model Learning future frames, actions, and interaction dynamics from human video. Uses temporal prediction, next-action, transition, and object-forecast tasks. Robot policy execution.
Vision-language-action Mapping perception and language to action chunks. A future policy/VLA direction that needs action-target conversion and stronger policy packaging. Qwen3-Omni diagnostic scoring.
HF Space Hugging Face-hosted app/site surface. Mirrors the dashboard and static website assets. HF artifact dataset or model repo.
HF artifact dataset Hugging Face dataset repo for derived evidence. Stores public-safe reports, metrics, website JSON, and sanitized result packages. Original Xperience-10M dataset.
HF baseline model repo Hugging Face model repo for lightweight baseline artifacts. Mirrors baseline weights, figures, metrics, and task artifacts. Qwen/Cosmos adapter-specific repos.
HF weights/results repo Consolidated public-safe model-result bundle. Groups baseline weights, verified Qwen/Cosmos artifacts, analysis files, and manifests. The upstream raw dataset.
Mirror parity A check that public copies match the source files. docs/data/mirror_parity.json records whether GitHub, website, and HF mirrors agree. A model-quality metric.
Publication audit A public-package validation report. Confirms required files exist and forbidden raw/private assets are not included. Scientific peer review.
Verified package A result or artifact bundle that passed local/public validators. Only verified packages are promoted to README, website, and HF surfaces as public evidence. A running or exploratory experiment.

File Entry Points

Need Open
Reader navigation PUBLIC_READER_MAP.md, docs/data/public_reader_map.json
Task definitions TASK_SUITE_20.md, docs/data/task_suite_20.json
Result matrix TASK_METHOD_20_RESULT_MATRIX.md, docs/data/task_method_20_result_matrix.json
Direct/proxy status TASK_METHOD_20_GAP_AUDIT.md, docs/data/task_method_20_gap_audit.json
Qwen lineage QWEN3_OMNI_RUN_LINEAGE.md, docs/data/qwen3_omni_run_lineage.json
128-episode source/features XPERIENCE10M_128_EPISODE_FEATURE_INDEX.md, docs/data/xperience10m_128_episode_feature_index.json
Public mirrors PUBLIC_SURFACE_QA.md, docs/data/mirror_parity.json, docs/data/live_publication_status.json