--- license: apache-2.0 base_model: nvidia/GR00T-N1.6-3B pipeline_tag: robotics tags: [robotics, vla, gr00t, gr00t-n1.6, manipulation, maniguard, franka] --- # GR00T-N1.6 - Lid-Transport-Food (joint, 3-cam) NVIDIA Isaac **GR00T-N1.6-3B** fine-tuned on the ManiGuard **lid-transport-food** base task (sim Franka Panda). Part of the ManiGuard VLA benchmark - GR00T vs pi0.5 on the same task families with identical data, cameras, and controller. ## Model - **Base:** [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) - Cosmos-Reason VLM + flow-matching DiT action head - **Embodiment:** NEW_EMBODIMENT - Franka Panda, **8-D joint** state/action (7 arm joints + 1 gripper) - **Cameras (3):** image_left, image_right, wrist (256x256) - **Action:** arm = state-relative chunks, gripper = absolute; 16-step horizon; NON_EEF (joint space) - **Tuning:** GR00T-N1.6 default - VLM (LLM + visual) **frozen**, train projector + diffusion action head (**no LoRA**) ## Training - 1x H100, bf16, global batch 64, 1924 steps (~10 epochs over 12,312 frames), cosine LR (peak 1e-4) - Data: [IDEAS-Lab-Northwestern/sim-lid-transport-food-30-joint-3cam](https://huggingface.co/datasets/IDEAS-Lab-Northwestern/sim-lid-transport-food-30-joint-3cam); videos decoded as H.264 for GR00T's torchcodec loader ## Usage Load with `Gr00tPolicy` from [Isaac-GR00T (n1.6-release)](https://github.com/NVIDIA/Isaac-GR00T/tree/n1.6-release), `--embodiment-tag NEW_EMBODIMENT`. The included `experiment_cfg/` carries the modality config + normalization stats. > WARNING - Convention (must match at eval): joint-space JointController (absolute joint targets, NON_EEF) + 3 cameras (image_left, image_right, wrist). A mismatched controller or camera set silently feeds an out-of-distribution input.