rslxcvg commited on
Commit
8b50a82
·
verified ·
1 Parent(s): e0ccbf0

Add eval artifact molmoact2_overnight_20260519_fullcoverage_v1_goal_audit_latest.md

Browse files
eval/molmoact2_overnight_20260519_fullcoverage_v1_goal_audit_latest.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MolmoAct2 Overnight Goal Audit
2
+
3
+ main_runs=7 supplemental_runs=1 final_pairs=8
4
+ eligible_final_candidates=0
5
+ best=molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 ckpt=010000 eligible=False low_t=0.8888888888888888 pan_rank=1.0 pan5=18.936976432800293 outside=0.0 h_grip=1.2048346996307373
6
+ ARTIFACTS_COMPLETE=true
7
+ FINAL_RECOMMENDATION_READY=true
8
+ ROBOT_TEST_READY=false
9
+ GOAL_COMPLETE=true
10
+
11
+ | check | status | evidence |
12
+ |---|---:|---|
13
+ | main run list has 7 runs | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_main7_runs.txt` |
14
+ | supplemental run list has 1 run | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_supplemental_runs.txt` |
15
+ | main experiment plan has 7 rows | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_main7_plan.tsv` |
16
+ | supplemental experiment plan has 1 row | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_supplemental_plan.tsv` |
17
+ | main plan matches main run list | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_main7_plan.tsv` |
18
+ | supplemental plan matches supplemental run list | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_supplemental_plan.tsv` |
19
+ | all planned runs use no-replacement frame sampling | pass | `frame_replacement=false` |
20
+ | all planned runs rank shoulder-pan at action index 5 | pass | `rank_action_index=5` |
21
+ | rank-loss variants include control and positive rank losses | pass | `0.0,3.0,5.0` |
22
+ | pan/action weighting variants are planned | pass | `[5.0,2.0,2.0,1.0,1.0,1.0],[8.0,3.0,3.0,1.0,1.0,1.0]` |
23
+ | final pair list has 8 run/checkpoint rows | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs.tsv` |
24
+ | coverage audit file exists | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_coverage_audit.md` |
25
+ | main final samples cover full dataset | pass | `10000 * 16 >= 150084` |
26
+ | supplemental final samples cover full dataset | pass | `5000 * 32 >= 150084` |
27
+ | supplemental plan uses batch size 32 | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_supplemental_plan.tsv` |
28
+ | supplemental plan trains at least 5000 steps | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_supplemental_plan.tsv` |
29
+ | all training logs contain 'replacement': False | pass | `'replacement': False` |
30
+ | all training logs contain 'weight_column': 'sample_weight' | pass | `'weight_column': 'sample_weight'` |
31
+ | all training logs contain 'normalize_batch_mean': False | pass | `'normalize_batch_mean': False` |
32
+ | all training logs contain 'weight_column': 'loss_weight_normalized' | pass | `'weight_column': 'loss_weight_normalized'` |
33
+ | all training logs contain 'low_t_min': 0.001 | pass | `'low_t_min': 0.001` |
34
+ | all training logs contain 'low_t_max': 0.1 | pass | `'low_t_max': 0.1` |
35
+ | all training logs contain 'same_noise_across_colors': True | pass | `'same_noise_across_colors': True` |
36
+ | all training logs contain 'rank_action_index': 5 | pass | `'rank_action_index': 5` |
37
+ | all training logs contain 'rank_action_dim': 0 | pass | `'rank_action_dim': 0` |
38
+ | all training logs contain 'enable_lora_action_expert': True | pass | `'enable_lora_action_expert': True` |
39
+ | all training logs contain 'enable_lora_vlm': True | pass | `'enable_lora_vlm': True` |
40
+ | all training logs contain 'train_action_expert_only': False | pass | `'train_action_expert_only': False` |
41
+ | all training logs contain 'normalize_gripper': True | pass | `'normalize_gripper': True` |
42
+ | all training logs contain 'norm_tag': 'so100_so101_molmoact2' | pass | `'norm_tag': 'so100_so101_molmoact2'` |
43
+ | all training logs contain dataset.num_frames=150084 | pass | `dataset.num_frames=150084` |
44
+ | all training logs contain prompt_contrast: | pass | `prompt_contrast:` |
45
+ | all training logs have no fatal errors or non-finite loss fields | pass | `fatal_hits=0 nonfinite_loss_hits=0` |
46
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
47
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 checkpoint 010000 | pass | `ok` |
48
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
49
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 010000 | pass | `` |
50
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
51
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 010000 | pass | `` |
52
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
53
+ | molmoact2_overnight_prod_r128_c010_rw3_w8_gpu0_20260519_fullcoverage_v1 final probe offline_retention 010000 | pass | `` |
54
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
55
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 checkpoint 010000 | pass | `ok` |
56
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
57
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 010000 | pass | `` |
58
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
59
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 010000 | pass | `` |
60
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
61
+ | molmoact2_overnight_control_r64_c010_rw0_w5_gpu1_20260519_fullcoverage_v1 final probe offline_retention 010000 | pass | `` |
62
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
63
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 checkpoint 010000 | pass | `ok` |
64
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
65
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 010000 | pass | `` |
66
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
67
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 010000 | pass | `` |
68
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
69
+ | molmoact2_overnight_contrast020_r128_rw3_w8_gpu2_20260519_fullcoverage_v1 final probe offline_retention 010000 | pass | `` |
70
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
71
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 checkpoint 010000 | pass | `ok` |
72
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
73
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 010000 | pass | `` |
74
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
75
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 010000 | pass | `` |
76
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
77
+ | molmoact2_overnight_contrast005_r128_rw3_w8_gpu3_20260519_fullcoverage_v1 final probe offline_retention 010000 | pass | `` |
78
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
79
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 checkpoint 010000 | pass | `ok` |
80
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
81
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 010000 | pass | `` |
82
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
83
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 010000 | pass | `` |
84
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
85
+ | molmoact2_overnight_rank5_r128_c010_w8_gpu4_20260519_fullcoverage_v1 final probe offline_retention 010000 | pass | `` |
86
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
87
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 checkpoint 010000 | pass | `ok` |
88
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
89
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 010000 | pass | `` |
90
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
91
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 010000 | pass | `` |
92
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
93
+ | molmoact2_overnight_frombase_prod_r128_c010_rw3_w8_gpu5_20260519_fullcoverage_v1 final probe offline_retention 010000 | pass | `` |
94
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
95
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 checkpoint 010000 | pass | `ok` |
96
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
97
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 010000 | pass | `` |
98
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
99
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 010000 | pass | `` |
100
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
101
+ | molmoact2_overnight_frombase_control_r64_c010_rw0_w5_gpu6_20260519_fullcoverage_v1 final probe offline_retention 010000 | pass | `` |
102
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 checkpoint 001000 | pass | `ok` |
103
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 checkpoint 005000 | pass | `ok` |
104
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 progress probe decision_prompt_probe_branch_onset 001000 | pass | `` |
105
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 final probe decision_prompt_probe_branch_onset 005000 | pass | `` |
106
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 progress probe decision_low_t_prompt_loss_branch_onset 001000 | pass | `` |
107
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 final probe decision_low_t_prompt_loss_branch_onset 005000 | pass | `` |
108
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 progress probe offline_retention 001000 | pass | `` |
109
+ | molmoact2_overnight_frombase_contrast020_r128_rw3_w8_b32_gpu7_20260519_fullcoverage_v1 final probe offline_retention 005000 | pass | `` |
110
+ | mixed final artifact molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_rank_mixed.json | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_rank_mixed.json` |
111
+ | mixed final artifact molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_rank_mixed.md | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_rank_mixed.md` |
112
+ | mixed final artifact molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_completion_audit_mixed.json | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_completion_audit_mixed.json` |
113
+ | mixed final artifact molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_completion_audit_mixed.md | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_completion_audit_mixed.md` |
114
+ | coverage audit json exists | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_coverage_audit.json` |
115
+ | coverage audit has all planned run rows | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_coverage_audit.json` |
116
+ | coverage audit frame count matches dataset | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_coverage_audit.json` |
117
+ | coverage audit decision frame count present | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_coverage_audit.json` |
118
+ | coverage audit proves full dataset coverage for every run | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_coverage_audit.json` |
119
+ | mixed final rank has 8 rows | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_rank_mixed.json` |
120
+ | mixed final rank rows have complete probe metrics | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_rank_mixed.json` |
121
+ | mixed final rank rows include all selection gate metrics | pass | `low_t_acc,pan_rank,pan5_range,outside_stats_pct,horizon_gripper_mae` |
122
+ | mixed final markdown states robot-test decision | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_rank_mixed.md` |
123
+ | mixed final audit covers 8 runs | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_completion_audit_mixed.json` |
124
+ | mixed final audit artifacts complete | pass | `/mnt/vla_picknplace/outputs/molmoact2/molmoact2_overnight_20260519_fullcoverage_v1_final_selection_pairs_completion_audit_mixed.json` |