GR00T-N1.6-3B-Pick-Banana-Real
A fine-tuned version of nvidia/GR00T-N1.6-3B for a banana pick-and-place task on a real SO-ARM101 robot, trained on the hi-space/SO-ARM101-PICK-BANANA dataset.
Model Description
GR00T-N1.6 (Gr00tN1d6) is a vision-language-action (VLA) model for robot manipulation. This checkpoint is fine-tuned for a pick-and-place task where the robot picks up a banana and places it on a plate using a real SO-ARM101 robot arm.
- Architecture: Gr00tN1d6 with Eagle-Block2A-2B-v2 vision-language backbone + diffusion policy action head
- Base model: nvidia/GR00T-N1.6-3B
- Task: Pick banana and place on plate (real robot)
- Robot: SO-ARM101
- Action horizon: 50 steps
- Inference timesteps: 4 (diffusion)
- Model dtype: bfloat16
Fine-tuning Configuration
| Parameter | Value |
|---|---|
| Tuned components | Diffusion model, projector, top 4 LLM layers, VL-LN |
| Frozen components | Vision encoder, LLM backbone |
| Training steps | 6000 |
| Epochs | 1 |
| Final training loss | ~0.017 |
| Action representation | Relative actions |
| Attention | Flash Attention 2 |
Training Details
- Dataset: hi-space/SO-ARM101-PICK-BANANA
- Episodes: 58
- Total frames: 35,859
- Max steps: 6000 (1 epoch)
- Loss curve: Started at ~1.13, converged to ~0.017
Usage
from gr00t.model.gr00t_n1 import GR00TPolicy
policy = GR00TPolicy.from_pretrained("hi-space/GR00T-N1.6-3B-Pick-Banana-Real")
Refer to the NVIDIA Isaac GR00T repository for full inference and deployment instructions.
Intended Use
This model is fine-tuned for a real-world robotic banana pick-and-place task using the SO-ARM101 robot arm. It is trained on real robot demonstrations and intended for deployment on the same hardware setup.
License
This model inherits the license from the base model nvidia/GR00T-N1.6-3B. Please refer to NVIDIA's terms for usage restrictions.
- Downloads last month
- 50
Model tree for hi-space/GR00T-N1.6-3B-Pick-Banana-Real
Base model
nvidia/GR00T-N1.6-3B