File size: 2,921 Bytes

# Foundation Pipeline Diagram Prompts

The first public pass used ChatGPT image-generated concept visuals. The second
pass uses the same direction prompts for visual exploration, then renders the
final public PNGs with `scripts/render_foundation_pipeline_diagrams.py` so the
task names, model-training route, and evaluation gates stay exact and readable.

## Spatial Intelligence

Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
Xperience-10M foundation pipeline track. Create a structured diagram, not
concept art, for a spatial intelligence model training direction. Show four
left-to-right zones: inputs, task targets, model training, and evaluation
gates. The content should represent multiview RGB, egocentric video, depth,
camera pose, calibration, object/contact/language cues, spatial QA, object
counting, object permanence, relative location, multiview retrieval, 3D
consistency, spatial-memory encoders, and held-out episode metrics. Use a
premium dark research-product style, high contrast, crisp panels, clean
technical linework, no decorative blobs, no logos, no watermark.

## Human-Video World Models

Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
Xperience-10M foundation pipeline track. Create a structured diagram, not
concept art, for a human-video world-model training direction. Show four
left-to-right zones: observed interaction inputs, future task targets, model
training, and held-out future evaluation. The content should represent
observed video/audio/sensor windows, hand/body motion, camera pose,
object/contact state, action/subtask labels, next action, next subtask, future
object set, contact transition, camera-motion delta, latent future state, Qwen
structured future probes, Cosmos/dynamics branches, rollout or latent
reconstruction, no future leakage, and future-task metrics. Use a premium dark
research-product style, high contrast, crisp panels, clean technical linework,
no decorative blobs, no logos, no watermark.

## Vision-Language-Action

Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia
Xperience-10M foundation pipeline track. Create a structured diagram, not
concept art, for a vision-language-action model training direction. Show four
left-to-right zones: observation/language inputs, action task targets,
VLA/policy-compatible training, and held-out action evaluation. The content
should represent egocentric video, captions, objects, contacts, procedures,
hand/body motion windows, subtask labels, action-token vocabulary, next action,
action chunks, object-conditioned action, contact state, subtask transition,
action-space conversion, normalization, leakage and retargeting reports, VLA
or policy heads, and held-out policy/action metrics. Use a premium dark
research-product style, high contrast, crisp panels, clean technical linework,
no decorative blobs, no logos, no watermark.