cy0307's picture
Add files using upload-large-folder tool
965d0da verified
|
Raw
History Blame
3.36 kB

Unified 20-Task Provenance Baselines

This historical result bundle is part of the unified 20-task public-sample suite. The rows here reuse the same 20-frame windows, 5-frame stride, shared feature tensor, chronological split, and minimal/neural baseline discipline as the rest of the suite.

The file and directory names still contain tier2_task_suite for backwards-compatible public links, but this is not a separate benchmark tier.

Setup Alignment

  • Unified task contracts: 20
  • Provenance rows in this historical bundle: 8
  • Long-horizon offset: 100 frames, about 5.0 seconds at 20 FPS
  • Raw public-sample HDF5 is required to regenerate the interaction/object targets; raw media/HDF5 files are not redistributed.

Results

# Task Input Output Minimal Neural MLP Meaning
13 Long-Horizon Next-Action Forecasting Current 20-frame non-caption multimodal window. Action label five seconds later. 0.0750 macro-F1 0.0655 macro-F1 Tests whether the current state carries enough procedure context to forecast beyond the one-second core next-action task.
14 Long-Horizon Next-Subtask Forecasting Current 20-frame non-caption multimodal window. Procedure subtask label five seconds later. 0.0455 macro-F1 0.0507 macro-F1 Moves from immediate action anticipation to higher-level procedure-state prediction.
15 Interaction Text Prediction Current 20-frame sensor window with caption-text features removed. Raw annotation interaction phrase for the same window. 0.0444 macro-F1 0.0381 macro-F1 Uses the raw caption JSON interaction field as a language target instead of only the hashed text feature.
16 Action-Object Relation Prediction Current 20-frame sensor window with caption-text features removed. Joint action plus active object-set relation. 0.0000 macro-F1 0.0000 macro-F1 Evaluates whether a model can bind what action is happening to which objects are involved.
17 Future Object-Set Forecasting Current 20-frame sensor window with caption-text features removed. Object set active five seconds later. 0.1694 micro-F1 0.1972 micro-F1 Predicts which objects will become relevant soon, not only which objects are relevant now.
18 IMU-to-Hand Pose Reconstruction Current IMU acceleration/gyroscope feature block only. Current left/right hand joint feature blocks. 0.0420 MAE 0.0426 MAE A sensor-bridge probe for how much hand configuration can be recovered from inertial motion alone.
19 Camera-View Synchronization Retrieval Fisheye camera-1 feature query projected into fisheye camera-3 feature space. The synchronized held-out camera-3 window. 0.4943 MRR 0.2409 MRR Stress-tests multi-camera time alignment beyond the core cross-modal retrieval task.
20 Time-to-Next-Transition Regression Current 20-frame non-caption multimodal window. Frames until the next action-label boundary, capped at 200 frames. 10.5374 MAE frames 10.5545 MAE frames Turns boundary detection into a continuous timing estimate for procedural control.

Interpretation Boundary

These sample-level baselines are part of the same unified public-sample suite. They prove that the sample can support richer task contracts, but they do not prove cross-episode model quality.