ropedia-xperience-10m-task-baselines / artifacts /min_all_modalities_subtask_model /README_model.txt
| This is an all-modality lightweight baseline. | |
| RGB/stereo/fisheye/depth/point-cloud/calibration/text are compressed into handcrafted features. | |
| It is not a deep multimodal model. | |
| Do not treat random windows from one episode as a final generalization benchmark. | |
| Label text was not included as input; only objects and interaction text were used. | |