cy0307 commited on
Commit
96e4573
·
verified ·
1 Parent(s): d9e465e

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -63,3 +63,5 @@ assets/foundation-pipelines/human-video-world-model-pipeline.png filter=lfs diff
63
  assets/foundation-pipelines/vision-language-action-pipeline.png filter=lfs diff=lfs merge=lfs -text
64
  assets/foundation-pipelines/spatial-intelligence-pipeline.png filter=lfs diff=lfs merge=lfs -text
65
  results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl filter=lfs diff=lfs merge=lfs -text
 
 
 
63
  assets/foundation-pipelines/vision-language-action-pipeline.png filter=lfs diff=lfs merge=lfs -text
64
  assets/foundation-pipelines/spatial-intelligence-pipeline.png filter=lfs diff=lfs merge=lfs -text
65
  results/omni_finetune/xperience10m_128ep_dense_multiscale_hierarchical_v1_20260608/dense_multiscale_windows.jsonl filter=lfs diff=lfs merge=lfs -text
66
+ assets/foundation-pipelines/source-photos/human-video-world-model-source.jpg filter=lfs diff=lfs merge=lfs -text
67
+ assets/foundation-pipelines/source-photos/vision-language-action-source.jpg filter=lfs diff=lfs merge=lfs -text
assets/foundation-pipelines/README.md CHANGED
@@ -1,21 +1,21 @@
1
- # Foundation Pipeline Task-Training Diagrams
2
 
3
- These three bitmap figures are task-training diagrams for the foundation
4
- pipeline tracks documented in `THREE_FOUNDATION_PIPELINES.md` and
 
5
  `docs/data/three_foundation_pipelines.json`.
6
 
7
- They replace the earlier concept-art images. Each diagram spells out the
8
- direction, supported task targets, model-training route, and evaluation gates.
9
- They are still **pipeline communication assets**, not evidence of completed
10
- foundation-model quality. Exact technical claims live in the surrounding
11
- Markdown, JSON, and website labels.
12
 
13
- | Track | Asset |
14
- | --- | --- |
15
- | Spatial intelligence models | `spatial-intelligence-pipeline.png` |
16
- | Human-video world models | `human-video-world-model-pipeline.png` |
17
- | Vision-language-action models | `vision-language-action-pipeline.png` |
18
 
19
- The deterministic rendering script is
20
- `scripts/render_foundation_pipeline_diagrams.py`; prompt and image-generation
21
- notes are in `prompts.md`.
 
1
+ # Foundation Pipeline Presentation Photos
2
 
3
+ These three public images are restored high-resolution photos from the
4
+ foundation-direction presentation slides. They are used for the pipeline tracks
5
+ documented in `THREE_FOUNDATION_PIPELINES.md` and
6
  `docs/data/three_foundation_pipelines.json`.
7
 
8
+ They replace the earlier concept-art images and keep the public visuals tied to
9
+ the original direction slides. They are still **pipeline communication
10
+ assets**, not evidence of completed foundation-model quality. Exact technical
11
+ claims live in the surrounding Markdown, JSON, and website labels.
 
12
 
13
+ | Track | Enhanced asset | Source photo |
14
+ | --- | --- | --- |
15
+ | Spatial intelligence models | `spatial-intelligence-pipeline.png` | `source-photos/spatial-intelligence-source.jpg` |
16
+ | Human-video world models | `human-video-world-model-pipeline.png` | `source-photos/human-video-world-model-source.jpg` |
17
+ | Vision-language-action models | `vision-language-action-pipeline.png` | `source-photos/vision-language-action-source.jpg` |
18
 
19
+ The deterministic restoration script is
20
+ `scripts/render_foundation_pipeline_diagrams.py`; restoration notes and source
21
+ photo mapping are in `prompts.md`.
assets/foundation-pipelines/human-video-world-model-pipeline.png CHANGED

Git LFS Details

  • SHA256: 220d234b91176cdbd904a66a55deaf096805fc955094f529e7c5d8f35b03bab1
  • Pointer size: 131 Bytes
  • Size of remote file: 250 kB

Git LFS Details

  • SHA256: b1fa6c17db40756557dbf45bbfd0bfaf4178cd06f2ddd87d4e03a39da18187c0
  • Pointer size: 132 Bytes
  • Size of remote file: 2.38 MB
assets/foundation-pipelines/prompts.md CHANGED
@@ -1,49 +1,24 @@
1
- # Foundation Pipeline Diagram Prompts
2
 
3
- The first public pass used ChatGPT image-generated concept visuals. The second
4
- pass uses the same direction prompts for visual exploration, then renders the
5
- final public PNGs with `scripts/render_foundation_pipeline_diagrams.py` so the
6
- task names, model-training route, and evaluation gates stay exact and readable.
7
 
8
- ## Spatial Intelligence
 
 
 
 
9
 
10
- Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
11
- Xperience-10M foundation pipeline track. Create a structured diagram, not
12
- concept art, for a spatial intelligence model training direction. Show four
13
- left-to-right zones: inputs, task targets, model training, and evaluation
14
- gates. The content should represent multiview RGB, egocentric video, depth,
15
- camera pose, calibration, object/contact/language cues, spatial QA, object
16
- counting, object permanence, relative location, multiview retrieval, 3D
17
- consistency, spatial-memory encoders, and held-out episode metrics. Use a
18
- premium dark research-product style, high contrast, crisp panels, clean
19
- technical linework, no decorative blobs, no logos, no watermark.
20
 
21
- ## Human-Video World Models
 
 
 
22
 
23
- Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
24
- Xperience-10M foundation pipeline track. Create a structured diagram, not
25
- concept art, for a human-video world-model training direction. Show four
26
- left-to-right zones: observed interaction inputs, future task targets, model
27
- training, and held-out future evaluation. The content should represent
28
- observed video/audio/sensor windows, hand/body motion, camera pose,
29
- object/contact state, action/subtask labels, next action, next subtask, future
30
- object set, contact transition, camera-motion delta, latent future state, Qwen
31
- structured future probes, Cosmos/dynamics branches, rollout or latent
32
- reconstruction, no future leakage, and future-task metrics. Use a premium dark
33
- research-product style, high contrast, crisp panels, clean technical linework,
34
- no decorative blobs, no logos, no watermark.
35
-
36
- ## Vision-Language-Action
37
-
38
- Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
39
- Xperience-10M foundation pipeline track. Create a structured diagram, not
40
- concept art, for a vision-language-action model training direction. Show four
41
- left-to-right zones: observation/language inputs, action task targets,
42
- VLA/policy-compatible training, and held-out action evaluation. The content
43
- should represent egocentric video, captions, objects, contacts, procedures,
44
- hand/body motion windows, subtask labels, action-token vocabulary, next action,
45
- action chunks, object-conditioned action, contact state, subtask transition,
46
- action-space conversion, normalization, leakage and retargeting reports, VLA
47
- or policy heads, and held-out policy/action metrics. Use a premium dark
48
- research-product style, high contrast, crisp panels, clean technical linework,
49
- no decorative blobs, no logos, no watermark.
 
1
+ # Foundation Pipeline Photo Restoration Notes
2
 
3
+ The current public assets are not generated concept art. They are restored
4
+ high-resolution PNGs rebuilt from original presentation photos supplied by the
5
+ project owner. The filename is kept as `prompts.md` because older public
6
+ manifests and mirrors already link here as the provenance note.
7
 
8
+ | Track | Source photo | Enhanced public PNG |
9
+ | --- | --- | --- |
10
+ | Spatial intelligence models | `source-photos/spatial-intelligence-source.jpg` | `spatial-intelligence-pipeline.png` |
11
+ | Human-video world models | `source-photos/human-video-world-model-source.jpg` | `human-video-world-model-pipeline.png` |
12
+ | Vision-language-action models | `source-photos/vision-language-action-source.jpg` | `vision-language-action-pipeline.png` |
13
 
14
+ Restoration is deterministic and local:
 
 
 
 
 
 
 
 
 
15
 
16
+ - EXIF orientation normalization.
17
+ - Autocontrast and moderate brightness/color/contrast correction.
18
+ - Lanczos resize to a 2560-pixel public width.
19
+ - Gentle sharpening and unsharp masking.
20
 
21
+ The restoration script deliberately does not synthesize, redraw, or hallucinate
22
+ slide text. Technical task/training/evaluation claims are maintained in
23
+ `THREE_FOUNDATION_PIPELINES.md` and
24
+ `docs/data/three_foundation_pipelines.json`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
assets/foundation-pipelines/source-photos/human-video-world-model-source.jpg ADDED

Git LFS Details

  • SHA256: 5cc1f72aea8da58a269c02e862b7ac8e473b1bf832e9093b3b40b710906b1552
  • Pointer size: 131 Bytes
  • Size of remote file: 124 kB
assets/foundation-pipelines/source-photos/vision-language-action-source.jpg ADDED

Git LFS Details

  • SHA256: eb5222e6f7be01f1f9e4950a1c30d9216149812e92c54cedbcadcecfbfe901e9
  • Pointer size: 131 Bytes
  • Size of remote file: 117 kB
assets/foundation-pipelines/spatial-intelligence-pipeline.png CHANGED

Git LFS Details

  • SHA256: 61b51641b4d2af8f87f02683fd6d2a578e3fd1ceabda5667c00c968e13b40ee7
  • Pointer size: 131 Bytes
  • Size of remote file: 253 kB

Git LFS Details

  • SHA256: db944bd538ed5dc70e2342fa523ce3543b8ae8017b8c9a572d3423e74e413f1c
  • Pointer size: 132 Bytes
  • Size of remote file: 2.13 MB
assets/foundation-pipelines/vision-language-action-pipeline.png CHANGED

Git LFS Details

  • SHA256: 2efa63a771a9f5abf119207022a6a64a2b6763e529327399dff901d44d9b52d9
  • Pointer size: 131 Bytes
  • Size of remote file: 256 kB

Git LFS Details

  • SHA256: d4704ee28f747067c440845905cf2cacf6cbbf3fd5d17418ba16993f617ade29
  • Pointer size: 132 Bytes
  • Size of remote file: 2.91 MB
docs/data/research_roadmap_interactive.json CHANGED
@@ -2222,7 +2222,7 @@
2222
  ],
2223
  "status": "planning_artifact"
2224
  },
2225
- "generated_at_utc": "2026-06-17T16:20:57+00:00",
2226
  "omni_plan": {
2227
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2228
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
@@ -3307,7 +3307,7 @@
3307
  "diagram_image": "docs/assets/foundation-pipelines/spatial-intelligence-pipeline.png",
3308
  "first_pipeline": "Build a spatial-memory exporter, start with metric depth and pose consistency tasks, then evaluate spatial QA, object permanence, counting, retrieval, and pose-aware consistency.",
3309
  "id": "spatial_intelligence",
3310
- "image_alt": "Task-training diagram for the spatial intelligence pipeline: inputs, spatial task targets, model training route, and evaluation gates.",
3311
  "intermediate_artifacts": [
3312
  "synchronized camera window manifest",
3313
  "pose and depth availability report",
@@ -3386,7 +3386,7 @@
3386
  "diagram_image": "docs/assets/foundation-pipelines/human-video-world-model-pipeline.png",
3387
  "first_pipeline": "Keep Qwen-style structured future probes for task interpretability, keep Cosmos-style dynamics branches separate, and add latent or feature-reconstruction metrics before claiming world-model quality.",
3388
  "id": "human_video_world_models",
3389
- "image_alt": "Task-training diagram for the human-video world model pipeline: observed-window inputs, future targets, model training route, and held-out evaluation gates.",
3390
  "intermediate_artifacts": [
3391
  "observed and future window pairs",
3392
  "future label targets",
@@ -3463,7 +3463,7 @@
3463
  "diagram_image": "docs/assets/foundation-pipelines/vision-language-action-pipeline.png",
3464
  "first_pipeline": "Define the action space, use existing 20-task next-action/contact/object-conditioned tasks first, then add hand-trajectory or policy-compatible action chunks after conversion is traceable.",
3465
  "id": "vision_language_action",
3466
- "image_alt": "Task-training diagram for the vision-language-action pipeline: observation and language inputs, action targets, VLA training route, and action evaluation gates.",
3467
  "intermediate_artifacts": [
3468
  "action-token vocabulary",
3469
  "action-chunk windows",
 
2222
  ],
2223
  "status": "planning_artifact"
2224
  },
2225
+ "generated_at_utc": "2026-06-18T08:24:55+00:00",
2226
  "omni_plan": {
2227
  "adapter": "LoRA rank 16, alpha 32, dropout 0.05",
2228
  "backbone": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
 
3307
  "diagram_image": "docs/assets/foundation-pipelines/spatial-intelligence-pipeline.png",
3308
  "first_pipeline": "Build a spatial-memory exporter, start with metric depth and pose consistency tasks, then evaluate spatial QA, object permanence, counting, retrieval, and pose-aware consistency.",
3309
  "id": "spatial_intelligence",
3310
+ "image_alt": "Restored presentation photo showing the Spatial intelligence models direction slide for Xperience-10M.",
3311
  "intermediate_artifacts": [
3312
  "synchronized camera window manifest",
3313
  "pose and depth availability report",
 
3386
  "diagram_image": "docs/assets/foundation-pipelines/human-video-world-model-pipeline.png",
3387
  "first_pipeline": "Keep Qwen-style structured future probes for task interpretability, keep Cosmos-style dynamics branches separate, and add latent or feature-reconstruction metrics before claiming world-model quality.",
3388
  "id": "human_video_world_models",
3389
+ "image_alt": "Restored presentation photo showing the Human-video world models direction slide for Xperience-10M.",
3390
  "intermediate_artifacts": [
3391
  "observed and future window pairs",
3392
  "future label targets",
 
3463
  "diagram_image": "docs/assets/foundation-pipelines/vision-language-action-pipeline.png",
3464
  "first_pipeline": "Define the action space, use existing 20-task next-action/contact/object-conditioned tasks first, then add hand-trajectory or policy-compatible action chunks after conversion is traceable.",
3465
  "id": "vision_language_action",
3466
+ "image_alt": "Restored presentation photo showing the Vision-language-action models direction slide for Xperience-10M.",
3467
  "intermediate_artifacts": [
3468
  "action-token vocabulary",
3469
  "action-chunk windows",
docs/data/scope_claims_audit.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "status": "pass",
3
- "generated_at_utc": "2026-06-18T07:15:50+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,
 
1
  {
2
  "status": "pass",
3
+ "generated_at_utc": "2026-06-18T08:27:01+00:00",
4
  "summary": {
5
  "qwen3_omni_verified_diagnostic_pilot": true,
6
  "dataset_manifest_num_episodes": 119,