# Artifact Guide This guide is the human-readable map for the public Ropedia Xperience-10M task suite artifacts. It complements the machine-readable [`metrics/artifact_index.json`](metrics/artifact_index.json). The project intentionally separates four layers: 1. **Proof boundary:** what is claimed, what is smoke-only, and what remains gated by data access. 2. **Data contract:** how one public Xperience-10M sample episode becomes aligned model windows and feature blocks. 3. **Task evidence:** minimal and neural results for the 12 task contracts plus four research-direction extension probes. 4. **Reproducibility:** public commands, expected outputs, and exact-match audit evidence for the single-episode pipeline. 5. **Scale-up status:** scripts and reports for the planned 32-episode Qwen3-Omni pilot, without claiming those results before data access lands. ## Start Here | Artifact | Why to open it first | | --- | --- | | [`EVIDENCE_CONTRACT.md`](EVIDENCE_CONTRACT.md) | Defines which claims are verified and which are explicitly not claimed. | | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Defines public reproduction commands, expected outputs, and unreproducible boundaries. | | [`metrics/artifact_index.json`](metrics/artifact_index.json) | Lists reviewer-critical files with existence, size, and stable hashes. | | [`metrics/mirror_parity.json`](metrics/mirror_parity.json) | Confirms prepared HF Space, artifact, and model mirrors match the repo for critical files. | | [`metrics/publication_audit.json`](metrics/publication_audit.json) | Confirms public bundles exclude raw data, Python caches, heavy archives, and token strings. | | [`metrics/scope_claims_audit.json`](metrics/scope_claims_audit.json) | Confirms historical `32ep` smoke-run identifiers are not presented as real 32-episode results. | | [`metrics/website_integrity.json`](metrics/website_integrity.json) | Confirms local site links, anchors, JSON bundles, and referenced images resolve. | | [`metrics/reviewer_packet.json`](metrics/reviewer_packet.json) | Gives the shortest machine-readable reviewer route. | ## Data Contract | Artifact | What it proves | | --- | --- | | [`artifacts/episode_task_suite/windows.csv`](artifacts/episode_task_suite/windows.csv) | The sample episode is converted into 1,161 aligned 20-frame windows. | | [`artifacts/episode_task_suite/feature_manifest.json`](artifacts/episode_task_suite/feature_manifest.json) | The current input vector has 8,378 dimensions with explicit feature-block boundaries. | | [`artifacts/episode_task_suite/available_modalities.json`](artifacts/episode_task_suite/available_modalities.json) | The sample modality coverage is recorded, including the current audio-featurization boundary. | | [`metrics/modality_atlas.json`](metrics/modality_atlas.json) | The responsive website modality cards and derived thumbnail assets are documented without redistributing raw data. | | [`assets/modalities/`](assets/modalities/) | Small public-sample thumbnails used by the readable modality atlas. | ## Task Evidence | Artifact | What it proves | | --- | --- | | [`artifacts/episode_task_suite/summary_report.json`](artifacts/episode_task_suite/summary_report.json) | The 12 task contracts, chronological split, and minimal/neural metrics. | | [`artifacts/episode_task_suite/neural_mlp/`](artifacts/episode_task_suite/neural_mlp/) | Matching PyTorch MLP heads for the same task contracts and feature windows. | | [`artifacts/episode_task_suite/research_directions/`](artifacts/episode_task_suite/research_directions/) | Mapping from the 12 tasks to the four Ropedia research directions. | | [`artifacts/episode_task_suite/research_direction_extensions/`](artifacts/episode_task_suite/research_direction_extensions/) | Four additional coded probes, one per research direction. | | [`artifacts/episode_task_suite/task_walkthroughs/`](artifacts/episode_task_suite/task_walkthroughs/) | Junior-friendly case studies explaining input, process modules, output, metric, and limitation. | ## Reproducibility | Artifact | What it proves | | --- | --- | | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) | Public commands, expected outputs, and non-reproducible boundaries are explicit. | | [`metrics/reproducibility_matrix.json`](metrics/reproducibility_matrix.json) | Machine-readable command matrix for website and HF mirrors. | | [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | The last exact metric audit rebuilt the public-sample metrics and matched committed artifacts. | ## Platform Mirrors | Surface | Purpose | | --- | --- | | [GitHub Pages dashboard](https://chaoyue0307.github.io/ropedia-xperience-10m-task-suite/) | Primary public website and visual reviewer flow. | | [Hugging Face Space](https://huggingface.co/spaces/cy0307/ropedia-xperience-10m-task-suite) | Static app mirror for HF users. | | [HF artifact dataset](https://huggingface.co/datasets/cy0307/ropedia-xperience-10m-task-suite-artifacts) | Derived CSV/JSON/Markdown/figure artifacts without raw Xperience-10M data. | | [HF baseline model repo](https://huggingface.co/cy0307/ropedia-xperience-10m-task-baselines) | Lightweight minimal and neural task-head model files. | | [HF collection](https://huggingface.co/collections/cy0307/ropedia-xperience-10m-task-suite) | One grouped landing page for the Space, artifact dataset, and baseline model repo. | ## Scale-Up Boundary | Artifact | Current status | | --- | --- | | Companion GitHub repo: `results/omni_finetune/DATA_BLOCKER_REPORT.md` | Documents why no real 32-episode Qwen3-Omni result is claimed yet. | | Companion GitHub repo: `results/omni_finetune/A100_HF_RELAY_STATUS.md` | Documents the pending A100-to-H20 relay and selected 32-session pilot plan. | | [`scripts/omni/discover_xperience10m_sources.py`](scripts/omni/discover_xperience10m_sources.py) | Discovery gate for valid multi-episode Xperience-10M sources. | | [`scripts/omni/train_qwen3_omni_lora.py`](scripts/omni/train_qwen3_omni_lora.py) | Training entrypoint for the Qwen3-Omni LoRA pilot after the data gate passes. | ## What Is Not Included The public repo and Hugging Face mirrors do not redistribute raw Xperience-10M videos, raw `annotation.hdf5`, gated private dataset files, full Qwen weights, or large full checkpoints. Dataset use remains governed by the official Ropedia/Xperience-10M terms.