---
license: apache-2.0
base_model: nvidia/GR00T-N1.6-3B
tags:
  - robotics
  - gr00t
  - vla
  - manipulation
  - diffusion-policy
datasets:
  - hi-space/SO-ARM101-PICK-BANANA
library_name: gr00t
pipeline_tag: robotics
---

# GR00T-N1.6-3B-Pick-Banana-Real

A fine-tuned version of [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) for a banana pick-and-place task on a real SO-ARM101 robot, trained on the [hi-space/SO-ARM101-PICK-BANANA](https://huggingface.co/datasets/hi-space/SO-ARM101-PICK-BANANA) dataset.

## Model Description

GR00T-N1.6 (Gr00tN1d6) is a vision-language-action (VLA) model for robot manipulation. This checkpoint is fine-tuned for a pick-and-place task where the robot picks up a banana and places it on a plate using a real SO-ARM101 robot arm.

- **Architecture:** Gr00tN1d6 with Eagle-Block2A-2B-v2 vision-language backbone + diffusion policy action head
- **Base model:** nvidia/GR00T-N1.6-3B
- **Task:** Pick banana and place on plate (real robot)
- **Robot:** SO-ARM101
- **Action horizon:** 50 steps
- **Inference timesteps:** 4 (diffusion)
- **Model dtype:** bfloat16

### Fine-tuning Configuration

| Parameter | Value |
|-----------|-------|
| Tuned components | Diffusion model, projector, top 4 LLM layers, VL-LN |
| Frozen components | Vision encoder, LLM backbone |
| Training steps | 6000 |
| Epochs | 1 |
| Final training loss | ~0.017 |
| Action representation | Relative actions |
| Attention | Flash Attention 2 |

## Training Details

- **Dataset:** [hi-space/SO-ARM101-PICK-BANANA](https://huggingface.co/datasets/hi-space/SO-ARM101-PICK-BANANA)
- **Episodes:** 58
- **Total frames:** 35,859
- **Max steps:** 6000 (1 epoch)
- **Loss curve:** Started at ~1.13, converged to ~0.017

## Usage

```python
from gr00t.model.gr00t_n1 import GR00TPolicy

policy = GR00TPolicy.from_pretrained("hi-space/GR00T-N1.6-3B-Pick-Banana-Real")
```

Refer to the [NVIDIA Isaac GR00T](https://github.com/NVIDIA/Isaac-GR00T) repository for full inference and deployment instructions.

## Intended Use

This model is fine-tuned for a real-world robotic banana pick-and-place task using the SO-ARM101 robot arm. It is trained on real robot demonstrations and intended for deployment on the same hardware setup.

## License

This model inherits the license from the base model [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B). Please refer to NVIDIA's terms for usage restrictions.