ropedia-xperience-10m-task-baselines / TASK_SUITE_ENHANCEMENT_128.md
cy0307's picture
Publish Ropedia Xperience-10M task baseline cards
7c58b77 verified
|
Raw
History Blame
3.43 kB

128-Episode Task Suite Enhancement Pack

Run id: task_suite_enhancement_128_v1_20260608

This non-overwriting enhancement pack records how to push the current 128-episode task suite harder without adding more raw episodes.

Current Evidence

  • Current public export windows: 3808
  • Window split counts: train 2848 / val 512 / test 448
  • Selected episode split: train 96 / val 16 / test 16
  • Windowed episode ids in baseline CSV: train 89 / val 16 / test 14
  • Qwen3 v4 JSON validity: 1.0000
  • Qwen3 v4 action macro-F1: 0.001868
  • Qwen3 v4 subtask accuracy: 0.000000
  • Qwen3 v4 unseen-label sample share: 0.7076

Dense-Window Scenarios

scenario estimated windows multiplier role
current_export 3808 1.0 current public 128-episode JSON-task export
dense_20f_stride20 30422 7.99 non-overlap dense coverage over each observed episode frame span
dense_20f_stride10 60725 15.95 2x overlap action/subtask densification
dense_20f_stride5 121331 31.86 high-overlap action boundary and transition stress setting
medium_40f_stride20 30303 7.96 subtask/procedure context window
long_80f_stride40 15067 3.96 procedure and world-model context window
multiscale_20s10_40s20_80s40 106095 27.86 recommended no-new-episode v5 export: short action windows plus medium/long procedure context

Highest-Priority Bottlenecks

task priority simple score bottleneck next action
Next-Action Prediction highest 0.000200 fine-grained label explosion and held-out unseen labels add hierarchical action/subtask families plus label-normalized scoring
Action Recognition highest 0.000175 fine-grained label explosion and held-out unseen labels add hierarchical action/subtask families plus label-normalized scoring
Procedure Step Recognition highest 0.000000 fine-grained label explosion and held-out unseen labels add hierarchical action/subtask families plus label-normalized scoring
Cross-Modal Retrieval high missing raw 128-episode feature blocks export compact raw-feature shards for this task before model comparison
Hand Trajectory Forecasting high missing raw 128-episode feature blocks export compact raw-feature shards for this task before model comparison
Multimodal Synchronization Detection high missing raw 128-episode feature blocks export compact raw-feature shards for this task before model comparison
Cross-Modal Reconstruction high missing raw 128-episode feature blocks export compact raw-feature shards for this task before model comparison
Language Grounding medium 0.012786 weak public-safe metadata/text baseline add dense windows and stronger fusion baselines before interpreting model quality

Recommended Next Run

Use multiscale_20s10_40s20_80s40 as the next export target, then train a Qwen3 v5 hierarchical-target LoRA/partial-unfreeze run against the unchanged 96/16/16 episode split.

In parallel, export compact raw 128-episode feature shards for trajectory, retrieval, reconstruction, and synchronization tasks so the simple and neural baselines can be fully aligned beyond the JSON-supported labels.

The current artifacts remain the baseline; future runs should write new run ids and publish separate verified packages.