GR00T-N1.6-3B-Pick-Banana-Real

A fine-tuned version of nvidia/GR00T-N1.6-3B for a banana pick-and-place task on a real SO-ARM101 robot, trained on the hi-space/SO-ARM101-PICK-BANANA dataset.

Model Description

GR00T-N1.6 (Gr00tN1d6) is a vision-language-action (VLA) model for robot manipulation. This checkpoint is fine-tuned for a pick-and-place task where the robot picks up a banana and places it on a plate using a real SO-ARM101 robot arm.

Architecture: Gr00tN1d6 with Eagle-Block2A-2B-v2 vision-language backbone + diffusion policy action head
Base model: nvidia/GR00T-N1.6-3B
Task: Pick banana and place on plate (real robot)
Robot: SO-ARM101
Action horizon: 50 steps
Inference timesteps: 4 (diffusion)
Model dtype: bfloat16

Fine-tuning Configuration

Parameter	Value
Tuned components	Diffusion model, projector, top 4 LLM layers, VL-LN
Frozen components	Vision encoder, LLM backbone
Training steps	6000
Epochs	1
Final training loss	~0.017
Action representation	Relative actions
Attention	Flash Attention 2

Training Details

Dataset: hi-space/SO-ARM101-PICK-BANANA
Episodes: 58
Total frames: 35,859
Max steps: 6000 (1 epoch)
Loss curve: Started at ~1.13, converged to ~0.017

Usage

from gr00t.model.gr00t_n1 import GR00TPolicy

policy = GR00TPolicy.from_pretrained("hi-space/GR00T-N1.6-3B-Pick-Banana-Real")

Refer to the NVIDIA Isaac GR00T repository for full inference and deployment instructions.

Intended Use

This model is fine-tuned for a real-world robotic banana pick-and-place task using the SO-ARM101 robot arm. It is trained on real robot demonstrations and intended for deployment on the same hardware setup.

License

This model inherits the license from the base model nvidia/GR00T-N1.6-3B. Please refer to NVIDIA's terms for usage restrictions.

Downloads last month: 50

Safetensors

Model size

3B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for hi-space/GR00T-N1.6-3B-Pick-Banana-Real

Base model

nvidia/GR00T-N1.6-3B

Finetuned

(25)

this model

hi-space
/

GR00T-N1.6-3B-Pick-Banana-Real