--- license: apache-2.0 base_model: nvidia/GR00T-N1.6-3B tags: - robotics - gr00t - vla - manipulation - diffusion-policy datasets: - hi-space/SO-ARM101-PICK-BANANA library_name: gr00t pipeline_tag: robotics --- # GR00T-N1.6-3B-Pick-Banana-Real A fine-tuned version of [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) for a banana pick-and-place task on a real SO-ARM101 robot, trained on the [hi-space/SO-ARM101-PICK-BANANA](https://huggingface.co/datasets/hi-space/SO-ARM101-PICK-BANANA) dataset. ## Model Description GR00T-N1.6 (Gr00tN1d6) is a vision-language-action (VLA) model for robot manipulation. This checkpoint is fine-tuned for a pick-and-place task where the robot picks up a banana and places it on a plate using a real SO-ARM101 robot arm. - **Architecture:** Gr00tN1d6 with Eagle-Block2A-2B-v2 vision-language backbone + diffusion policy action head - **Base model:** nvidia/GR00T-N1.6-3B - **Task:** Pick banana and place on plate (real robot) - **Robot:** SO-ARM101 - **Action horizon:** 50 steps - **Inference timesteps:** 4 (diffusion) - **Model dtype:** bfloat16 ### Fine-tuning Configuration | Parameter | Value | |-----------|-------| | Tuned components | Diffusion model, projector, top 4 LLM layers, VL-LN | | Frozen components | Vision encoder, LLM backbone | | Training steps | 6000 | | Epochs | 1 | | Final training loss | ~0.017 | | Action representation | Relative actions | | Attention | Flash Attention 2 | ## Training Details - **Dataset:** [hi-space/SO-ARM101-PICK-BANANA](https://huggingface.co/datasets/hi-space/SO-ARM101-PICK-BANANA) - **Episodes:** 58 - **Total frames:** 35,859 - **Max steps:** 6000 (1 epoch) - **Loss curve:** Started at ~1.13, converged to ~0.017 ## Usage ```python from gr00t.model.gr00t_n1 import GR00TPolicy policy = GR00TPolicy.from_pretrained("hi-space/GR00T-N1.6-3B-Pick-Banana-Real") ``` Refer to the [NVIDIA Isaac GR00T](https://github.com/NVIDIA/Isaac-GR00T) repository for full inference and deployment instructions. ## Intended Use This model is fine-tuned for a real-world robotic banana pick-and-place task using the SO-ARM101 robot arm. It is trained on real robot demonstrations and intended for deployment on the same hardware setup. ## License This model inherits the license from the base model [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B). Please refer to NVIDIA's terms for usage restrictions.