Robotics
ONNX
English
Chinese
real-world
dual-arm
whole body control
manipulation
lllliuxiao23 commited on
Commit
005eeea
·
verified ·
1 Parent(s): fc7d852

update g0tiny handover

Browse files
G0Tiny_handover/config.yaml ADDED
@@ -0,0 +1,315 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ tags: null
2
+ seed: 7
3
+ resume_ckpt: null
4
+ output_dir: ${hydra:runtime.output_dir}
5
+ dataset_stats_cache_dir: ${oc.env:GALAXEA_FM_DATASET_STATS_CACHE_DIR}
6
+ min_batch_size: 1
7
+ max_batch_size: 256
8
+ num_test_steps: 3
9
+ checkpointing_steps: 5000
10
+ logger:
11
+ type: swanlab
12
+ log_steps: 10
13
+ task: ${hydra:runtime.choices.task}
14
+ project: ${split:${logger.task},0}
15
+ experiment_name: ${split:${logger.task},-1}
16
+ mode: cloud
17
+ workspace: Galaxea-AI
18
+ dir: null
19
+ batch_size_val: 16
20
+ eval_episodes_num: 1
21
+ ckpt_path: null
22
+ env: R1ProBlocksStackEasy
23
+ target_controller_type: bimanual_relaxed_ik
24
+ edp:
25
+ card: null
26
+ training_time: ${now:%Y-%m-%d}_${now:%H-%M-%S}
27
+ git_branch: null
28
+ git_commit: null
29
+ root: null
30
+ repo_ids: null
31
+ save_dir: ${output_dir}
32
+ tags: ${tags}
33
+ max_steps: ${model.max_steps}
34
+ batch_size: ${model.batch_size}
35
+ EVALUATION:
36
+ task_suite_names:
37
+ - libero_10
38
+ - libero_spatial
39
+ - libero_object
40
+ - libero_goal
41
+ num_steps_wait: 10
42
+ replan_steps: 5
43
+ num_trials: 50
44
+ output_dir: ${output_dir}
45
+ run_id_note: null
46
+ env_num: 50
47
+ data:
48
+ dataset:
49
+ _target_: galaxea_fm.data.galaxea_lerobot_dataset.GalaxeaLerobotDataset
50
+ dataset_dirs:
51
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data_1225/Beijing_Demo_Handover_Gift_And_Box_Delay_Hands_v2.0_251224_6011_B1_007/Beijing_Demo_Handover_Gift_And_Box_Delay_Hands_v2.0_251224_6011_B1_007
52
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data_1225/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251224_v2.0_6011_B1_007/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251224_v2.0_6011_B1_007
53
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data1225/Beijing_Demo_Handover_Gift_And_Box_Delay_Hands_v2.0_251225_6011_B1_007_v20251226_101622
54
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data1225/Beijing_Demo_Handover_Gift_And_Box_Moving_Hands_251225_v2.0_6011_B1_007_v20251226_101627
55
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data_1226/Beijing_Demo_Handover_Gift_And_Box_Moving_Gifts_251226_v2.0_6011_B1_007/Beijing_Demo_Handover_Gift_And_Box_Moving_Gifts_251226_v2.0_6011_B1_007
56
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data_1227/Beijing_Demo_Handover_Gift_And_Box_Fallen_Gifts_251227_v2.0_6011_B1_007/Beijing_Demo_Handover_Gift_And_Box_Fallen_Gifts_251227_v2.0_6011_B1_007
57
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data_1229/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251229_v2.0_6011_B1_007/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251229_v2.0_6011_B1_007
58
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data_1230/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251230_v2.0_6011_B1_007/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251230_v2.0_6011_B1_007
59
+ - /efm-nas/efm-nas/group-pxj/hairui.ren/dataset/gift/data_1230/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251230_v2.0_6011_B1_007-2/Beijing_Demo_Handover_Gift_And_Box_Normal_Grab_Gifts_251230_v2.0_6011_B1_007-2
60
+ shape_meta:
61
+ action:
62
+ - key: left_ee_pose
63
+ raw_shape: 7
64
+ shape: 9
65
+ - key: left_gripper
66
+ raw_shape: 1
67
+ shape: 1
68
+ - key: right_ee_pose
69
+ raw_shape: 7
70
+ shape: 9
71
+ - key: right_gripper
72
+ raw_shape: 1
73
+ shape: 1
74
+ - key: torso
75
+ raw_shape: 4
76
+ shape: 4
77
+ state:
78
+ - key: left_ee_pose
79
+ raw_shape: 7
80
+ shape: 9
81
+ - key: left_gripper
82
+ raw_shape: 1
83
+ shape: 1
84
+ - key: right_ee_pose
85
+ raw_shape: 7
86
+ shape: 9
87
+ - key: right_gripper
88
+ raw_shape: 1
89
+ shape: 1
90
+ - key: torso
91
+ raw_shape: 4
92
+ shape: 4
93
+ images:
94
+ - key: head_rgb
95
+ raw_shape:
96
+ - 3
97
+ - 360
98
+ - 640
99
+ shape:
100
+ - 3
101
+ - ${model.model_arch.input_image_size.0}
102
+ - ${model.model_arch.input_image_size.1}
103
+ - key: left_wrist_rgb
104
+ raw_shape:
105
+ - 3
106
+ - 480
107
+ - 640
108
+ shape:
109
+ - 3
110
+ - ${model.model_arch.input_image_size.0}
111
+ - ${model.model_arch.input_image_size.1}
112
+ - key: right_wrist_rgb
113
+ raw_shape:
114
+ - 3
115
+ - 480
116
+ - 640
117
+ shape:
118
+ - 3
119
+ - ${model.model_arch.input_image_size.0}
120
+ - ${model.model_arch.input_image_size.1}
121
+ action_size: 32
122
+ past_action_size: 0
123
+ obs_size: 1
124
+ ee_start_moving_thresh: 0.002
125
+ val_set_proportion: 0.05
126
+ processor:
127
+ _target_: galaxea_fm.processors.base_processor.BaseProcessor
128
+ shape_meta: ${data.dataset.shape_meta}
129
+ num_obs_steps: ${data.dataset.obs_size}
130
+ action_state_transforms:
131
+ - _target_: galaxea_fm.transforms.relative_action.RelativePoseTransform
132
+ keys:
133
+ - left_ee_pose
134
+ - right_ee_pose
135
+ - _target_: galaxea_fm.transforms.relative_action.RelativeJointTransform
136
+ keys:
137
+ - torso
138
+ - _target_: galaxea_fm.transforms.rotation.PoseRotationTransform
139
+ rotation_type: rotation_6d
140
+ category_keys:
141
+ action:
142
+ - left_ee_pose
143
+ - right_ee_pose
144
+ state:
145
+ - left_ee_pose
146
+ - right_ee_pose
147
+ use_stepwise_action_norm: true
148
+ norm_default_mode: q01/q99
149
+ norm_exception_mode:
150
+ action:
151
+ left_gripper: 0/100
152
+ right_gripper: 0/100
153
+ action_state_merger:
154
+ _target_: galaxea_fm.transforms.action_state_merger.ConcatLeftAlign
155
+ train_transforms:
156
+ head_rgb:
157
+ - _target_: torchvision.transforms.Resize
158
+ size: ${model.model_arch.input_image_size}
159
+ - _target_: galaxea_fm.transforms.image.ToTensor
160
+ - _target_: torchvision.transforms.Normalize
161
+ mean:
162
+ - 0.5
163
+ - 0.5
164
+ - 0.5
165
+ std:
166
+ - 0.5
167
+ - 0.5
168
+ - 0.5
169
+ left_wrist_rgb: ${data.processor.train_transforms.head_rgb}
170
+ right_wrist_rgb: ${data.processor.train_transforms.head_rgb}
171
+ val_transforms:
172
+ head_rgb:
173
+ - _target_: torchvision.transforms.Resize
174
+ size: ${model.model_arch.input_image_size}
175
+ - _target_: galaxea_fm.transforms.image.ToTensor
176
+ - _target_: torchvision.transforms.Normalize
177
+ mean:
178
+ - 0.5
179
+ - 0.5
180
+ - 0.5
181
+ std:
182
+ - 0.5
183
+ - 0.5
184
+ - 0.5
185
+ left_wrist_rgb: ${data.processor.val_transforms.head_rgb}
186
+ right_wrist_rgb: ${data.processor.val_transforms.head_rgb}
187
+ drop_high_level_prob: 1.0
188
+ use_zh_instruction: false
189
+ num_output_images: 3
190
+ action_output_dim: 24
191
+ proprio_output_dim: 24
192
+ model:
193
+ pretrained_ckpt: /efm-nas/efm-nas/group-yaq/ziyang.jiao/model_res/real/r1pro_g0tiny_pretrain/2026-01-20_10-12-35/checkpoints/step_390000.pt
194
+ use_pretrained_norm_stats: true
195
+ model_weights_to_bf16: false
196
+ enable_bf16_training: true
197
+ use_torch_compile: false
198
+ find_unused_parameters: true
199
+ batch_size: 20
200
+ num_workers: 12
201
+ pin_memory: true
202
+ persistent_workers: true
203
+ max_epochs: null
204
+ max_steps: 50000
205
+ grad_accumulation_steps: 1
206
+ use_8bit_optimizer: false
207
+ learning_rate: 6.0e-05
208
+ weight_decay: 0.001
209
+ betas:
210
+ - 0.9
211
+ - 0.95
212
+ lr_scheduler_type: cosine
213
+ warmup_steps: 480
214
+ max_grad_norm: 1.0
215
+ use_ema: false
216
+ ema:
217
+ update_after_step: 0
218
+ power: 0.67
219
+ use_sync_bn: false
220
+ model_arch:
221
+ _target_: galaxea_fm.models.galaxea_zero.galaxea_zero_policy.GalaxeaZeroPolicy
222
+ model_name: galaxea_fm.models.galaxea_zero.galaxea_zero_policy.GalaxeaZero
223
+ tokenizer:
224
+ _target_: galaxea_fm.models.vla_tiny.smolvlm2.tokenizer.SmolVLM2Tokenizer
225
+ tokenizer_params:
226
+ pretrained_model_name_or_path: /efm-nas/efm-nas/efm-shared/pretrained_model/smolvlm2-500m-video-instruct
227
+ local_files_only: true
228
+ pad_token_id: ${model.model_arch.pad_token_id}
229
+ image_token_index: ${model.model_arch.image_token_index}
230
+ max_text_tokens: ${model.model_arch.max_text_tokens}
231
+ num_tokens_per_image: ${model.model_arch.vision.num_image_tokens}
232
+ num_input_images: ${model.model_arch.num_input_images}
233
+ pretrained_model_path: /efm-nas/efm-nas/efm-shared/pretrained_model/smolvlm2-500m-video-instruct
234
+ vla_training_strategy: vla-full-train
235
+ backbone_lr_multiplier: 0.1
236
+ image_token_index: 49190
237
+ pad_token_id: 2
238
+ vocab_size: 49280
239
+ fill_padded_with_token: true
240
+ embed_token_key_prefix: model.text_model.embed_tokens
241
+ cond_steps: ${data.dataset.obs_size}
242
+ horizon_steps: ${data.dataset.action_size}
243
+ max_text_tokens: 55
244
+ max_image_text_tokens: ${eval:'${model.model_arch.num_input_images} * (${model.model_arch.vision.num_image_tokens}
245
+ + 3) + ${model.model_arch.max_text_tokens}'}
246
+ num_input_images: ${eval:'${model.model_arch.cond_steps} * ${data.processor.num_output_images}'}
247
+ input_image_size:
248
+ - ${model.model_arch.vision.image_size}
249
+ - ${model.model_arch.vision.image_size}
250
+ final_action_clip_value: null
251
+ action_dim: ${data.processor.action_output_dim}
252
+ proprio_dim: ${data.processor.proprio_output_dim}
253
+ action_decoder_layers: 2
254
+ action_expert_adaptive_mode: null
255
+ flow_sampling: beta
256
+ num_inference_steps: 10
257
+ vision:
258
+ name: galaxea_fm.models.vla_tiny.smolvlm2.smolvlm2_vision.SmolVLMVisionTransformer
259
+ key_prefix: model.vision_model
260
+ hidden_size: 768
261
+ intermediate_size: 3072
262
+ num_hidden_layers: 12
263
+ num_attention_heads: 12
264
+ num_channels: 3
265
+ image_size: 512
266
+ patch_size: 16
267
+ layer_norm_eps: 1.0e-06
268
+ attention_dropout: 0.0
269
+ num_image_tokens: 64
270
+ vision_projector:
271
+ name: galaxea_fm.models.vla_tiny.smolvlm2.modules.SmolVLMConnector
272
+ key_prefix: model.connector
273
+ vision_config:
274
+ scale_factor: 4
275
+ hidden_size: 768
276
+ projection_dim: ${model.model_arch.joint.mixture.vlm.hidden_size}
277
+ num_input_images: ${model.model_arch.num_input_images}
278
+ text_config:
279
+ hidden_size: ${model.model_arch.joint.mixture.vlm.hidden_size}
280
+ joint:
281
+ name: galaxea_fm.models.galaxea_zero.joint_model.JointModel
282
+ key_prefix: model.text_model
283
+ action_expert_adaptive_mode: null
284
+ module_names:
285
+ mlp: galaxea_fm.models.vla_tiny.smolvlm2.modules.SmolVLMTextMLP
286
+ norm: galaxea_fm.models.vla_tiny.smolvlm2.modules.SmolVLMTextRMSNorm
287
+ rope: galaxea_fm.models.vla_tiny.smolvlm2.modules.SmolVLMTextRotaryEmbedding
288
+ mixture:
289
+ vlm:
290
+ hidden_size: 960
291
+ intermediate_size: 2560
292
+ use_final_norm: true
293
+ cache: true
294
+ proprio:
295
+ hidden_size: 720
296
+ intermediate_size: 2048
297
+ use_final_norm: true
298
+ cache: true
299
+ adaptive_mode: null
300
+ action:
301
+ hidden_size: 720
302
+ intermediate_size: 2048
303
+ use_final_norm: true
304
+ cache: false
305
+ adaptive_mode: null
306
+ time_hidden_size: 256
307
+ num_hidden_layers: 16
308
+ num_attention_heads: 15
309
+ num_key_value_heads: 5
310
+ head_dim: 64
311
+ max_position_embeddings: 8192
312
+ rms_norm_eps: 1.0e-05
313
+ rope_theta: 100000.0
314
+ attention_bias: false
315
+ attention_dropout: 0.0
G0Tiny_handover/dataset_stats.json ADDED
The diff for this file is too large to render. See raw diff
 
G0Tiny_handover/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b59d254062860d20dc079f0d0a4262dfcf80d940cd50dbdaa00f8c5fe2379e0e
3
+ size 1621350495