cy0307's picture
Add files using upload-large-folder tool
557a248 verified
|
Raw
History Blame
1.51 kB

Existing Model-Output Task Probes

Generated: 2026-06-18T22:52:18+00:00

This package scores only task targets already present in verified held-out prediction JSON. It does not run new inference and does not infer targets that are absent from a model branch.

Method ID Status Scored tasks Task 13 macro-F1 Task 14 macro-F1 Task 16 macro-F1 Task 17 micro-F1 Task 20 MAE Task 8 IoU Evidence
Qwen3-Omni v6 LoRA qwen3_omni_v6_lora scored action_object_relation n/a n/a 0.000222 n/a n/a n/a results/omni_finetune/verified_public/xperience10m_qwen3_omni_128ep_multiscale_cap96_v6_rank64_lr5e5_full8gpu_lora_eval_test_full/eval/predictions.jsonl
Cosmos3-Super Reasoner cosmos3_super_reasoner scored action_object_relation, caption_grounding, long_horizon_next_action, time_to_transition 0.008808 n/a 0.000000 n/a 52.946 0.306399 results/omni_finetune/verified_public/xperience10m_cosmos3_super_reasoner_128ep_test_full_20260607/eval/predictions.jsonl
Cosmos3-Nano Future Window cosmos3_nano_future_window scored action_object_relation, long_horizon_next_action, modality_reconstruction, next_subtask_forecast, object_set_forecast, time_to_transition 0.002491 0.006615 0.002794 0.017820 33.810 n/a results/omni_finetune/verified_public/xperience10m_cosmos3_nano_128ep_future_window_h5_compat_adapter_eval_test_full/eval/future_predictions.jsonl