Add files using upload-large-folder tool

bd4e048 verified 5 days ago

10.2 kB

	# Glossary

	This glossary defines project terms that can be easy to confuse across the
	GitHub repo, website, Hugging Face Space, artifact dataset, model repos, and
	result matrices. Use it with `PUBLIC_READER_MAP.md` when choosing what to read
	first, and with `docs/data/glossary.json` when a tool needs the same terms in
	machine-readable form.

	## How To Read The Terms

	\| Category \| What it clarifies \|
	\| --- \| --- \|
	\| Dataset and scope \| Which data is public, which data is gated upstream, and what each evidence line can support. \|
	\| Files and features \| How raw sample files, derived windows, feature manifests, and public-safe artifacts relate to each other. \|
	\| Tasks and metrics \| What a scored task row means, when a score is direct, and when a compact proxy is being used. \|
	\| Models and runs \| How simple/NN baselines, Qwen3-Omni, Cosmos3, LoRA adapters, and full-parameter gates differ. \|
	\| Public surfaces \| Which repo or Hub surface owns which part of the public package. \|

	## Core Terms

	\| Term \| Plain meaning \| In this project \| Do not confuse with \|
	\| --- \| --- \| --- \| --- \|
	\| Xperience-10M \| The upstream embodied human-interaction dataset. \| The source dataset behind the public sample, selected-128 features, task suite, and model diagnostics. \| This repo itself; the repo only redistributes public-safe derived artifacts. \|
	\| Public sample episode \| One officially available sample episode. \| The fully inspectable Line 1 unit used for raw-file browsing, 20-frame windows, task construction, and single-episode baselines. \| Multi-episode generalization. \|
	\| Selected 128 episodes \| A public-safe selected subset of official gated episode paths. \| Line 2 uses derived windows/features and keeps links back to official episode ids and gated source paths. \| Redistributed raw MP4/HDF5/RRD data. \|
	\| Evidence line \| A claim boundary for a group of results. \| Line 1 is one public sample episode; Line 2 is selected-128 held-out comparison. \| Qwen run versions v1-v6, which are model-run lineage, not evidence lines. \|
	\| Official gated data \| Upstream files that require official dataset access. \| Raw Xperience-10M MP4/HDF5/RRD files and full source directories remain outside the public repo. \| Public-safe metrics, derived features, figures, and manifests. \|
	\| Public-safe artifact \| A file that can be mirrored publicly without raw gated content. \| Metrics, JSON summaries, model cards, figures, derived manifests, and approved lightweight weights/adapters. \| Raw dataset redistribution. \|
	\| Episode \| One recorded interaction sequence. \| The basic source unit behind windows, labels, and train/val/test splits. \| A 20-frame window, which is a smaller model input slice. \|
	\| 20-frame window \| A fixed short clip slice. \| The sample episode is converted into aligned 20-frame units for features, labels, and many task heads. \| A full episode or an arbitrary video segment. \|
	\| Window stride \| The frame step between neighboring windows. \| Used to create overlapping examples while preserving chronological order and leakage controls. \| Video frame rate. \|
	\| Feature manifest \| A map from model-input columns to source modalities. \| `results/episode_task_suite/feature_manifest.json` explains the feature groups and dimensions. \| The raw annotation file. \|
	\| Raw sample file map \| A human-readable inventory of the sample episode files. \| `docs/data/raw_sample_files.json` explains videos, annotations, calibration, motion, and derived previews. \| A training manifest. \|
	\| annotation.hdf5 \| Upstream annotation container for the sample. \| Contains original labels/metadata; some public derived files expose hashed or processed features rather than every raw text field. \| `summary_report.json` or task result JSON. \|
	\| Interaction text \| Natural-language interaction/caption content. \| Used by task 15 and some derived text features; public matrices record when text targets are direct or compact-proxy. \| Numeric action ids or subtask ids. \|
	\| Modality \| A type of signal. \| Video, audio, depth, pose/SLAM, motion capture, inertial, calibration, and language-derived signals. \| A task target. \|
	\| Task contract \| The definition of one benchmark task. \| Includes input, target/output, metric, split, source artifact, and limitation. \| A model architecture. \|
	\| Unified 20-task suite \| The current task surface. \| Tasks 1-12 plus tasks 13-20 are presented together and scored across methods where real artifacts exist. \| The historical tier-2 label; tasks 13-20 are now part of the same 20-task suite. \|
	\| Task-method record \| One method evaluated on one task. \| 9 methods x 20 tasks gives 180 public result records. \| A single prediction row. \|
	\| Direct score \| A metric computed against the task target directly. \| The preferred score type in the 20-task matrix. \| Compact-proxy score. \|
	\| Compact-proxy score \| A bounded proxy metric when a direct raw target is not publicly available. \| Kept explicit in the matrix and gap audit so readers do not over-read it. \| A direct target measurement. \|
	\| Gap audit \| A coverage and source-status audit. \| `docs/data/task_method_20_gap_audit.json` explains scored, proxy, and unsupported cells. \| A performance leaderboard. \|
	\| Leakage control \| A split or feature rule that prevents using future/target information unfairly. \| Chronological splits, held-out splits, and source audits protect task interpretation. \| Lower training accuracy. \|
	\| Minimal baseline \| A simple non-neural task head. \| Provides a reproducible lower-complexity comparison for task feasibility. \| The metadata-only baseline family in the selected-128 matrix. \|
	\| Neural MLP \| A compact neural task head. \| Used for single-episode and selected-128 baseline comparisons. \| Foundation-model fine-tuning. \|
	\| Metadata baseline \| A selected-128 baseline using metadata/text-derived public-safe features. \| Helps compare simple and neural heads on the held-out split. \| Raw video/depth/audio feature baselines. \|
	\| Raw-feature baseline \| A selected-128 baseline using exported public-safe raw-feature groups. \| Tracks what non-foundation heads can do with richer processed inputs. \| Raw gated media redistribution. \|
	\| Qwen3-Omni \| The multimodal foundation-model family used for the Qwen branch. \| The current public 20-task Qwen row is Qwen3-Omni v6 LoRA plus task-specific probes. \| Cosmos3 or the single-episode task-head baselines. \|
	\| Qwen v1-v6 \| The Qwen3-Omni run lineage. \| v1-v4 are earlier pipeline/ablation evidence, v5 is the prior pinned release, and v6 is the current public 20-task row. \| Six different evidence lines. \|
	\| Cosmos3-Super \| The larger Cosmos3-style branch tracked in this project. \| Published as Reasoner diagnostics and a separate forward-dynamics LoRA adapter/result branch when verified. \| Cosmos3-Nano. \|
	\| Cosmos3-Nano \| A smaller Cosmos3 compatibility/future-window branch. \| Used for the Nano Future Window row and related diagnostics. \| Cosmos3-Super fine-tuned adapter. \|
	\| LoRA adapter \| A lightweight set of trainable adapter weights. \| Published only when the package is verified and public-safe. \| Full base-model weights. \|
	\| Full-parameter fine-tuning \| Updating the whole model rather than only adapters. \| This project records feasibility gates and short pilots, but does not publish full checkpoints. \| LoRA adapter publication. \|
	\| Foundation pipeline \| A high-level training direction. \| Spatial intelligence, human-video world modeling, and vision-language-action are documented as trainable directions with task mappings. \| A completed public result row. \|
	\| Spatial intelligence \| Learning geometry and spatial reasoning from egocentric data. \| Uses video, depth, camera pose, and language tasks to target 3D/space reasoning. \| World-model future prediction. \|
	\| Human-video world model \| Learning future frames, actions, and interaction dynamics from human video. \| Uses temporal prediction, next-action, transition, and object-forecast tasks. \| Robot policy execution. \|
	\| Vision-language-action \| Mapping perception and language to action chunks. \| A future policy/VLA direction that needs action-target conversion and stronger policy packaging. \| Qwen3-Omni diagnostic scoring. \|
	\| HF Space \| Hugging Face-hosted app/site surface. \| Mirrors the dashboard and static website assets. \| HF artifact dataset or model repo. \|
	\| HF artifact dataset \| Hugging Face dataset repo for derived evidence. \| Stores public-safe reports, metrics, website JSON, and sanitized result packages. \| Original Xperience-10M dataset. \|
	\| HF baseline model repo \| Hugging Face model repo for lightweight baseline artifacts. \| Mirrors baseline weights, figures, metrics, and task artifacts. \| Qwen/Cosmos adapter-specific repos. \|
	\| HF weights/results repo \| Consolidated public-safe model-result bundle. \| Groups baseline weights, verified Qwen/Cosmos artifacts, analysis files, and manifests. \| The upstream raw dataset. \|
	\| Mirror parity \| A check that public copies match the source files. \| `docs/data/mirror_parity.json` records whether GitHub, website, and HF mirrors agree. \| A model-quality metric. \|
	\| Publication audit \| A public-package validation report. \| Confirms required files exist and forbidden raw/private assets are not included. \| Scientific peer review. \|
	\| Verified package \| A result or artifact bundle that passed local/public validators. \| Only verified packages are promoted to README, website, and HF surfaces as public evidence. \| A running or exploratory experiment. \|

	## File Entry Points

	\| Need \| Open \|
	\| --- \| --- \|
	\| Reader navigation \| `PUBLIC_READER_MAP.md`, `docs/data/public_reader_map.json` \|
	\| Task definitions \| `TASK_SUITE_20.md`, `docs/data/task_suite_20.json` \|
	\| Result matrix \| `TASK_METHOD_20_RESULT_MATRIX.md`, `docs/data/task_method_20_result_matrix.json` \|
	\| Direct/proxy status \| `TASK_METHOD_20_GAP_AUDIT.md`, `docs/data/task_method_20_gap_audit.json` \|
	\| Qwen lineage \| `QWEN3_OMNI_RUN_LINEAGE.md`, `docs/data/qwen3_omni_run_lineage.json` \|
	\| 128-episode source/features \| `XPERIENCE10M_128_EPISODE_FEATURE_INDEX.md`, `docs/data/xperience10m_128_episode_feature_index.json` \|
	\| Public mirrors \| `PUBLIC_SURFACE_QA.md`, `docs/data/mirror_parity.json`, `docs/data/live_publication_status.json` \|

	# Glossary

	This glossary defines project terms that can be easy to confuse across the
	GitHub repo, website, Hugging Face Space, artifact dataset, model repos, and
	result matrices. Use it with `PUBLIC_READER_MAP.md` when choosing what to read
	first, and with `docs/data/glossary.json` when a tool needs the same terms in
	machine-readable form.

	## How To Read The Terms

	\| Category \| What it clarifies \|
	\| --- \| --- \|
	\| Dataset and scope \| Which data is public, which data is gated upstream, and what each evidence line can support. \|
	\| Files and features \| How raw sample files, derived windows, feature manifests, and public-safe artifacts relate to each other. \|
	\| Tasks and metrics \| What a scored task row means, when a score is direct, and when a compact proxy is being used. \|
	\| Models and runs \| How simple/NN baselines, Qwen3-Omni, Cosmos3, LoRA adapters, and full-parameter gates differ. \|
	\| Public surfaces \| Which repo or Hub surface owns which part of the public package. \|

	## Core Terms

	\| Term \| Plain meaning \| In this project \| Do not confuse with \|
	\| --- \| --- \| --- \| --- \|
	\| Xperience-10M \| The upstream embodied human-interaction dataset. \| The source dataset behind the public sample, selected-128 features, task suite, and model diagnostics. \| This repo itself; the repo only redistributes public-safe derived artifacts. \|
	\| Public sample episode \| One officially available sample episode. \| The fully inspectable Line 1 unit used for raw-file browsing, 20-frame windows, task construction, and single-episode baselines. \| Multi-episode generalization. \|
	\| Selected 128 episodes \| A public-safe selected subset of official gated episode paths. \| Line 2 uses derived windows/features and keeps links back to official episode ids and gated source paths. \| Redistributed raw MP4/HDF5/RRD data. \|
	\| Evidence line \| A claim boundary for a group of results. \| Line 1 is one public sample episode; Line 2 is selected-128 held-out comparison. \| Qwen run versions v1-v6, which are model-run lineage, not evidence lines. \|
	\| Official gated data \| Upstream files that require official dataset access. \| Raw Xperience-10M MP4/HDF5/RRD files and full source directories remain outside the public repo. \| Public-safe metrics, derived features, figures, and manifests. \|
	\| Public-safe artifact \| A file that can be mirrored publicly without raw gated content. \| Metrics, JSON summaries, model cards, figures, derived manifests, and approved lightweight weights/adapters. \| Raw dataset redistribution. \|
	\| Episode \| One recorded interaction sequence. \| The basic source unit behind windows, labels, and train/val/test splits. \| A 20-frame window, which is a smaller model input slice. \|
	\| 20-frame window \| A fixed short clip slice. \| The sample episode is converted into aligned 20-frame units for features, labels, and many task heads. \| A full episode or an arbitrary video segment. \|
	\| Window stride \| The frame step between neighboring windows. \| Used to create overlapping examples while preserving chronological order and leakage controls. \| Video frame rate. \|
	\| Feature manifest \| A map from model-input columns to source modalities. \| `results/episode_task_suite/feature_manifest.json` explains the feature groups and dimensions. \| The raw annotation file. \|
	\| Raw sample file map \| A human-readable inventory of the sample episode files. \| `docs/data/raw_sample_files.json` explains videos, annotations, calibration, motion, and derived previews. \| A training manifest. \|
	\| annotation.hdf5 \| Upstream annotation container for the sample. \| Contains original labels/metadata; some public derived files expose hashed or processed features rather than every raw text field. \| `summary_report.json` or task result JSON. \|
	\| Interaction text \| Natural-language interaction/caption content. \| Used by task 15 and some derived text features; public matrices record when text targets are direct or compact-proxy. \| Numeric action ids or subtask ids. \|
	\| Modality \| A type of signal. \| Video, audio, depth, pose/SLAM, motion capture, inertial, calibration, and language-derived signals. \| A task target. \|
	\| Task contract \| The definition of one benchmark task. \| Includes input, target/output, metric, split, source artifact, and limitation. \| A model architecture. \|
	\| Unified 20-task suite \| The current task surface. \| Tasks 1-12 plus tasks 13-20 are presented together and scored across methods where real artifacts exist. \| The historical tier-2 label; tasks 13-20 are now part of the same 20-task suite. \|
	\| Task-method record \| One method evaluated on one task. \| 9 methods x 20 tasks gives 180 public result records. \| A single prediction row. \|
	\| Direct score \| A metric computed against the task target directly. \| The preferred score type in the 20-task matrix. \| Compact-proxy score. \|
	\| Compact-proxy score \| A bounded proxy metric when a direct raw target is not publicly available. \| Kept explicit in the matrix and gap audit so readers do not over-read it. \| A direct target measurement. \|
	\| Gap audit \| A coverage and source-status audit. \| `docs/data/task_method_20_gap_audit.json` explains scored, proxy, and unsupported cells. \| A performance leaderboard. \|
	\| Leakage control \| A split or feature rule that prevents using future/target information unfairly. \| Chronological splits, held-out splits, and source audits protect task interpretation. \| Lower training accuracy. \|
	\| Minimal baseline \| A simple non-neural task head. \| Provides a reproducible lower-complexity comparison for task feasibility. \| The metadata-only baseline family in the selected-128 matrix. \|
	\| Neural MLP \| A compact neural task head. \| Used for single-episode and selected-128 baseline comparisons. \| Foundation-model fine-tuning. \|
	\| Metadata baseline \| A selected-128 baseline using metadata/text-derived public-safe features. \| Helps compare simple and neural heads on the held-out split. \| Raw video/depth/audio feature baselines. \|
	\| Raw-feature baseline \| A selected-128 baseline using exported public-safe raw-feature groups. \| Tracks what non-foundation heads can do with richer processed inputs. \| Raw gated media redistribution. \|
	\| Qwen3-Omni \| The multimodal foundation-model family used for the Qwen branch. \| The current public 20-task Qwen row is Qwen3-Omni v6 LoRA plus task-specific probes. \| Cosmos3 or the single-episode task-head baselines. \|
	\| Qwen v1-v6 \| The Qwen3-Omni run lineage. \| v1-v4 are earlier pipeline/ablation evidence, v5 is the prior pinned release, and v6 is the current public 20-task row. \| Six different evidence lines. \|
	\| Cosmos3-Super \| The larger Cosmos3-style branch tracked in this project. \| Published as Reasoner diagnostics and a separate forward-dynamics LoRA adapter/result branch when verified. \| Cosmos3-Nano. \|
	\| Cosmos3-Nano \| A smaller Cosmos3 compatibility/future-window branch. \| Used for the Nano Future Window row and related diagnostics. \| Cosmos3-Super fine-tuned adapter. \|
	\| LoRA adapter \| A lightweight set of trainable adapter weights. \| Published only when the package is verified and public-safe. \| Full base-model weights. \|
	\| Full-parameter fine-tuning \| Updating the whole model rather than only adapters. \| This project records feasibility gates and short pilots, but does not publish full checkpoints. \| LoRA adapter publication. \|
	\| Foundation pipeline \| A high-level training direction. \| Spatial intelligence, human-video world modeling, and vision-language-action are documented as trainable directions with task mappings. \| A completed public result row. \|
	\| Spatial intelligence \| Learning geometry and spatial reasoning from egocentric data. \| Uses video, depth, camera pose, and language tasks to target 3D/space reasoning. \| World-model future prediction. \|
	\| Human-video world model \| Learning future frames, actions, and interaction dynamics from human video. \| Uses temporal prediction, next-action, transition, and object-forecast tasks. \| Robot policy execution. \|
	\| Vision-language-action \| Mapping perception and language to action chunks. \| A future policy/VLA direction that needs action-target conversion and stronger policy packaging. \| Qwen3-Omni diagnostic scoring. \|
	\| HF Space \| Hugging Face-hosted app/site surface. \| Mirrors the dashboard and static website assets. \| HF artifact dataset or model repo. \|
	\| HF artifact dataset \| Hugging Face dataset repo for derived evidence. \| Stores public-safe reports, metrics, website JSON, and sanitized result packages. \| Original Xperience-10M dataset. \|
	\| HF baseline model repo \| Hugging Face model repo for lightweight baseline artifacts. \| Mirrors baseline weights, figures, metrics, and task artifacts. \| Qwen/Cosmos adapter-specific repos. \|
	\| HF weights/results repo \| Consolidated public-safe model-result bundle. \| Groups baseline weights, verified Qwen/Cosmos artifacts, analysis files, and manifests. \| The upstream raw dataset. \|
	\| Mirror parity \| A check that public copies match the source files. \| `docs/data/mirror_parity.json` records whether GitHub, website, and HF mirrors agree. \| A model-quality metric. \|
	\| Publication audit \| A public-package validation report. \| Confirms required files exist and forbidden raw/private assets are not included. \| Scientific peer review. \|
	\| Verified package \| A result or artifact bundle that passed local/public validators. \| Only verified packages are promoted to README, website, and HF surfaces as public evidence. \| A running or exploratory experiment. \|

	## File Entry Points

	\| Need \| Open \|
	\| --- \| --- \|
	\| Reader navigation \| `PUBLIC_READER_MAP.md`, `docs/data/public_reader_map.json` \|
	\| Task definitions \| `TASK_SUITE_20.md`, `docs/data/task_suite_20.json` \|
	\| Result matrix \| `TASK_METHOD_20_RESULT_MATRIX.md`, `docs/data/task_method_20_result_matrix.json` \|
	\| Direct/proxy status \| `TASK_METHOD_20_GAP_AUDIT.md`, `docs/data/task_method_20_gap_audit.json` \|
	\| Qwen lineage \| `QWEN3_OMNI_RUN_LINEAGE.md`, `docs/data/qwen3_omni_run_lineage.json` \|
	\| 128-episode source/features \| `XPERIENCE10M_128_EPISODE_FEATURE_INDEX.md`, `docs/data/xperience10m_128_episode_feature_index.json` \|
	\| Public mirrors \| `PUBLIC_SURFACE_QA.md`, `docs/data/mirror_parity.json`, `docs/data/live_publication_status.json` \|