Foundation Pipeline Diagram Prompts

The first public pass used ChatGPT image-generated concept visuals. The second pass uses the same direction prompts for visual exploration, then renders the final public PNGs with scripts/render_foundation_pipeline_diagrams.py so the task names, model-training route, and evaluation gates stay exact and readable.

Spatial Intelligence

Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia Xperience-10M foundation pipeline track. Create a structured diagram, not concept art, for a spatial intelligence model training direction. Show four left-to-right zones: inputs, task targets, model training, and evaluation gates. The content should represent multiview RGB, egocentric video, depth, camera pose, calibration, object/contact/language cues, spatial QA, object counting, object permanence, relative location, multiview retrieval, 3D consistency, spatial-memory encoders, and held-out episode metrics. Use a premium dark research-product style, high contrast, crisp panels, clean technical linework, no decorative blobs, no logos, no watermark.

Human-Video World Models

Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia Xperience-10M foundation pipeline track. Create a structured diagram, not concept art, for a human-video world-model training direction. Show four left-to-right zones: observed interaction inputs, future task targets, model training, and held-out future evaluation. The content should represent observed video/audio/sensor windows, hand/body motion, camera pose, object/contact state, action/subtask labels, next action, next subtask, future object set, contact transition, camera-motion delta, latent future state, Qwen structured future probes, Cosmos/dynamics branches, rollout or latent reconstruction, no future leakage, and future-task metrics. Use a premium dark research-product style, high contrast, crisp panels, clean technical linework, no decorative blobs, no logos, no watermark.

Vision-Language-Action

Use case: infographic-diagram. Asset type: 16:9 website figure for Ropedia Xperience-10M foundation pipeline track. Create a structured diagram, not concept art, for a vision-language-action model training direction. Show four left-to-right zones: observation/language inputs, action task targets, VLA/policy-compatible training, and held-out action evaluation. The content should represent egocentric video, captions, objects, contacts, procedures, hand/body motion windows, subtask labels, action-token vocabulary, next action, action chunks, object-conditioned action, contact state, subtask transition, action-space conversion, normalization, leakage and retargeting reports, VLA or policy heads, and held-out policy/action metrics. Use a premium dark research-product style, high contrast, crisp panels, clean technical linework, no decorative blobs, no logos, no watermark.