File size: 6,685 Bytes
28821af b7334ff 367c357 b7334ff 540e67a 367c357 1e688c9 b7334ff 45c1706 b7334ff 367c357 45c1706 367c357 45c1706 367c357 b7334ff 1e688c9 b7334ff 45c1706 b7334ff 367c357 45c1706 367c357 45c1706 367c357 b7334ff 1e688c9 b7334ff a8124a8 45c1706 b7334ff 367c357 a8124a8 367c357 45c1706 367c357 b7334ff 1e688c9 b7334ff a8124a8 45c1706 b7334ff 367c357 a8124a8 367c357 45c1706 b7334ff 367c357 b7334ff 540e67a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | <svg xmlns="http://www.w3.org/2000/svg" width="1420" height="920" viewBox="0 0 1420 920">
<rect width="1420" height="920" fill="#020502"/>
<rect x="28" y="28" width="1364" height="864" rx="18" fill="#050905" stroke="#ccffa0" stroke-opacity="0.24"/>
<text x="66" y="88" font-size="32" font-weight="760" fill="#f4f8ef">Ropedia Xperience-10M: four direction extension probes</text>
<text x="66" y="122" font-size="17" font-weight="500" fill="#a5afa2">Data-backed from the same 1,161-window public sample feature tensor; extension probes for later held-out studies.</text>
<rect x="66" y="166" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="66" y="166" width="10" height="160" rx="5" fill="#ccffa0"/>
<circle cx="108" cy="206" r="24" fill="#ccffa0" opacity="0.14"/>
<text x="98" y="214" font-size="21" font-weight="760" fill="#ccffa0">A</text>
<text x="142" y="201" font-size="20" font-weight="760" fill="#f4f8ef">Body and Hand Motion Intensity</text>
<text x="142" y="228" font-size="13" font-weight="650" fill="#a5afa2">Human Modeling & Motion Understanding</text>
<text x="142" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.7658 macro-F1</text>
<text x="366" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.8254 macro-F1</text>
<text x="142" y="291" font-size="13" font-weight="500" fill="#dce8d7">Binary label: high_motion or low_motion.</text>
<rect x="142" y="304" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="304" width="337.0" height="8" rx="4" fill="#ccffa0" opacity="0.72"/>
<rect x="142" y="316" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="316" width="363.2" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<rect x="730" y="166" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="730" y="166" width="10" height="160" rx="5" fill="#7ae5c3"/>
<circle cx="772" cy="206" r="24" fill="#7ae5c3" opacity="0.14"/>
<text x="762" y="214" font-size="21" font-weight="760" fill="#7ae5c3">B</text>
<text x="806" y="201" font-size="20" font-weight="760" fill="#f4f8ef">Multi-View Consistency Retrieval</text>
<text x="806" y="228" font-size="13" font-weight="650" fill="#a5afa2">3D/4D Reconstruction & Neural Rendering</text>
<text x="806" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.5534 MRR</text>
<text x="1030" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.3469 MRR</text>
<text x="806" y="291" font-size="13" font-weight="500" fill="#dce8d7">Ranked candidate windows; the correct synchronized view should rank near the top.</text>
<rect x="806" y="304" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="304" width="243.5" height="8" rx="4" fill="#7ae5c3" opacity="0.72"/>
<rect x="806" y="316" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="316" width="152.6" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<rect x="66" y="360" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="66" y="360" width="10" height="160" rx="5" fill="#d8f4a5"/>
<circle cx="108" cy="400" r="24" fill="#d8f4a5" opacity="0.14"/>
<text x="98" y="408" font-size="21" font-weight="760" fill="#d8f4a5">C</text>
<text x="142" y="395" font-size="20" font-weight="760" fill="#f4f8ef">Action Phase Progress Estimation</text>
<text x="142" y="422" font-size="13" font-weight="650" fill="#a5afa2">Egocentric Vision & Interaction</text>
<text x="142" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.3267 MAE</text>
<text x="366" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.3015 MAE</text>
<text x="142" y="485" font-size="13" font-weight="500" fill="#dce8d7">A scalar progress value between 0.0 and 1.0 for the current action segment.</text>
<rect x="142" y="498" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="498" width="296.2" height="8" rx="4" fill="#d8f4a5" opacity="0.72"/>
<rect x="142" y="510" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="510" width="307.3" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<rect x="730" y="360" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="730" y="360" width="10" height="160" rx="5" fill="#9bdfff"/>
<circle cx="772" cy="400" r="24" fill="#9bdfff" opacity="0.14"/>
<text x="762" y="408" font-size="21" font-weight="760" fill="#9bdfff">D</text>
<text x="806" y="395" font-size="20" font-weight="760" fill="#f4f8ef">Short-Horizon Ego-Motion Forecasting</text>
<text x="806" y="422" font-size="13" font-weight="650" fill="#a5afa2">Scene Reconstruction & World Modeling</text>
<text x="806" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.1700 MAE</text>
<text x="1030" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.0833 MAE</text>
<text x="806" y="485" font-size="13" font-weight="500" fill="#dce8d7">A future camera-translation delta vector.</text>
<rect x="806" y="498" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="498" width="365.2" height="8" rx="4" fill="#9bdfff" opacity="0.72"/>
<rect x="806" y="510" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="510" width="403.3" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<text x="66" y="570" font-size="24" font-weight="760" fill="#f4f8ef">How to read this</text>
<text x="66" y="604" font-size="16" font-weight="500" fill="#dce8d7">Each card adds one concrete task to a research direction using existing sample modalities.</text>
<text x="66" y="632" font-size="16" font-weight="500" fill="#dce8d7">Colored bar: minimal baseline normalized score. White bar: neural MLP normalized score. Lower-is-better MAE is shown as 1 - MAE for bar length only.</text>
<line x1="66" y1="675" x2="1354" y2="675" stroke="#ccffa0" stroke-opacity="0.18"/>
<text x="66" y="724" font-size="22" font-weight="760" fill="#f4f8ef">Implementation boundary</text>
<text x="66" y="758" font-size="16" font-weight="500" fill="#dce8d7">A: motion-energy proxy, not a full human body model. B: view-feature retrieval, not neural rendering.</text>
<text x="66" y="786" font-size="16" font-weight="500" fill="#dce8d7">C: phase-progress regression, not open-world intent. D: ego-motion forecast, not a persistent map.</text>
<text x="66" y="835" font-size="16" font-weight="700" fill="#f4f8ef">All metrics are computed from held-out chronological windows of the same public sample episode.</text>
</svg> |