File size: 6,685 Bytes
28821af
b7334ff
367c357
b7334ff
540e67a
367c357
 
 
 
1e688c9
b7334ff
45c1706
 
b7334ff
367c357
45c1706
367c357
45c1706
367c357
b7334ff
 
 
1e688c9
b7334ff
45c1706
 
b7334ff
367c357
45c1706
367c357
45c1706
367c357
b7334ff
 
 
1e688c9
b7334ff
a8124a8
45c1706
b7334ff
367c357
a8124a8
367c357
45c1706
367c357
b7334ff
 
 
1e688c9
b7334ff
a8124a8
45c1706
b7334ff
367c357
a8124a8
367c357
45c1706
b7334ff
 
 
367c357
b7334ff
 
 
 
540e67a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
<svg xmlns="http://www.w3.org/2000/svg" width="1420" height="920" viewBox="0 0 1420 920">
<rect width="1420" height="920" fill="#020502"/>
<rect x="28" y="28" width="1364" height="864" rx="18" fill="#050905" stroke="#ccffa0" stroke-opacity="0.24"/>
<text x="66" y="88" font-size="32" font-weight="760" fill="#f4f8ef">Ropedia Xperience-10M: four direction extension probes</text>
<text x="66" y="122" font-size="17" font-weight="500" fill="#a5afa2">Data-backed from the same 1,161-window public sample feature tensor; extension probes for later held-out studies.</text>
<rect x="66" y="166" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="66" y="166" width="10" height="160" rx="5" fill="#ccffa0"/>
<circle cx="108" cy="206" r="24" fill="#ccffa0" opacity="0.14"/>
<text x="98" y="214" font-size="21" font-weight="760" fill="#ccffa0">A</text>
<text x="142" y="201" font-size="20" font-weight="760" fill="#f4f8ef">Body and Hand Motion Intensity</text>
<text x="142" y="228" font-size="13" font-weight="650" fill="#a5afa2">Human Modeling &amp; Motion Understanding</text>
<text x="142" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.7658 macro-F1</text>
<text x="366" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.8254 macro-F1</text>
<text x="142" y="291" font-size="13" font-weight="500" fill="#dce8d7">Binary label: high_motion or low_motion.</text>
<rect x="142" y="304" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="304" width="337.0" height="8" rx="4" fill="#ccffa0" opacity="0.72"/>
<rect x="142" y="316" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="316" width="363.2" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<rect x="730" y="166" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="730" y="166" width="10" height="160" rx="5" fill="#7ae5c3"/>
<circle cx="772" cy="206" r="24" fill="#7ae5c3" opacity="0.14"/>
<text x="762" y="214" font-size="21" font-weight="760" fill="#7ae5c3">B</text>
<text x="806" y="201" font-size="20" font-weight="760" fill="#f4f8ef">Multi-View Consistency Retrieval</text>
<text x="806" y="228" font-size="13" font-weight="650" fill="#a5afa2">3D/4D Reconstruction &amp; Neural Rendering</text>
<text x="806" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.5534 MRR</text>
<text x="1030" y="260" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.3469 MRR</text>
<text x="806" y="291" font-size="13" font-weight="500" fill="#dce8d7">Ranked candidate windows; the correct synchronized view should rank near the top.</text>
<rect x="806" y="304" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="304" width="243.5" height="8" rx="4" fill="#7ae5c3" opacity="0.72"/>
<rect x="806" y="316" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="316" width="152.6" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<rect x="66" y="360" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="66" y="360" width="10" height="160" rx="5" fill="#d8f4a5"/>
<circle cx="108" cy="400" r="24" fill="#d8f4a5" opacity="0.14"/>
<text x="98" y="408" font-size="21" font-weight="760" fill="#d8f4a5">C</text>
<text x="142" y="395" font-size="20" font-weight="760" fill="#f4f8ef">Action Phase Progress Estimation</text>
<text x="142" y="422" font-size="13" font-weight="650" fill="#a5afa2">Egocentric Vision &amp; Interaction</text>
<text x="142" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.3267 MAE</text>
<text x="366" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.3015 MAE</text>
<text x="142" y="485" font-size="13" font-weight="500" fill="#dce8d7">A scalar progress value between 0.0 and 1.0 for the current action segment.</text>
<rect x="142" y="498" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="498" width="296.2" height="8" rx="4" fill="#d8f4a5" opacity="0.72"/>
<rect x="142" y="510" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="142" y="510" width="307.3" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<rect x="730" y="360" width="620" height="160" rx="10" fill="#071207" stroke="#ccffa0" stroke-opacity="0.22"/>
<rect x="730" y="360" width="10" height="160" rx="5" fill="#9bdfff"/>
<circle cx="772" cy="400" r="24" fill="#9bdfff" opacity="0.14"/>
<text x="762" y="408" font-size="21" font-weight="760" fill="#9bdfff">D</text>
<text x="806" y="395" font-size="20" font-weight="760" fill="#f4f8ef">Short-Horizon Ego-Motion Forecasting</text>
<text x="806" y="422" font-size="13" font-weight="650" fill="#a5afa2">Scene Reconstruction &amp; World Modeling</text>
<text x="806" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Minimal: 0.1700 MAE</text>
<text x="1030" y="454" font-size="16" font-weight="700" fill="#f4f8ef">Neural MLP: 0.0833 MAE</text>
<text x="806" y="485" font-size="13" font-weight="500" fill="#dce8d7">A future camera-translation delta vector.</text>
<rect x="806" y="498" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="498" width="365.2" height="8" rx="4" fill="#9bdfff" opacity="0.72"/>
<rect x="806" y="510" width="440" height="8" rx="4" fill="#ccffa0" opacity="0.14"/>
<rect x="806" y="510" width="403.3" height="8" rx="4" fill="#ffffff" opacity="0.78"/>
<text x="66" y="570" font-size="24" font-weight="760" fill="#f4f8ef">How to read this</text>
<text x="66" y="604" font-size="16" font-weight="500" fill="#dce8d7">Each card adds one concrete task to a research direction using existing sample modalities.</text>
<text x="66" y="632" font-size="16" font-weight="500" fill="#dce8d7">Colored bar: minimal baseline normalized score. White bar: neural MLP normalized score. Lower-is-better MAE is shown as 1 - MAE for bar length only.</text>
<line x1="66" y1="675" x2="1354" y2="675" stroke="#ccffa0" stroke-opacity="0.18"/>
<text x="66" y="724" font-size="22" font-weight="760" fill="#f4f8ef">Implementation boundary</text>
<text x="66" y="758" font-size="16" font-weight="500" fill="#dce8d7">A: motion-energy proxy, not a full human body model. B: view-feature retrieval, not neural rendering.</text>
<text x="66" y="786" font-size="16" font-weight="500" fill="#dce8d7">C: phase-progress regression, not open-world intent. D: ego-motion forecast, not a persistent map.</text>
<text x="66" y="835" font-size="16" font-weight="700" fill="#f4f8ef">All metrics are computed from held-out chronological windows of the same public sample episode.</text>
</svg>