---
license: other
library_name: gr00t
pipeline_tag: robotics
base_model: nvidia/GR00T-N1.6-3B
datasets:
  - LightwheelAI/leisaac-pick-orange
language:
  - en
tags:
  - gr00t
  - gr00t-n1.6
  - nvidia
  - eagle
  - rectified-flow
  - so101
  - leisaac
  - pick-and-place
  - isaac-sim
---

# GR00T-N1.6-3B-PickOrange (self-trained, ckpt-6500)

针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务从 [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) (Eagle 2.5 VLM + Cross-attention DiT action head, ~3B params) 微调的 GR00T 策略。

_A NVIDIA GR00T N1.6 (Eagle 2.5 VLM + cross-attention DiT, ~3B) policy fine-tuned from [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) for the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task._

**🔗 项目仓库 / Project repos**：
- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）
- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）

## Highlights

![ckpt-6500 successful pick-and-place](gr00t-n1.6-ckpt-6500.jpg)
_ckpt-6500: 3/3 oranges placed, robot returned to rest pose — env reports success_

![ckpt-3500 awkward early-phase failure](gr00t-n1.6-ckpt-3500.jpg)
_ckpt-3500 (earlier checkpoint, kept on `ckpt-3500` branch for reference): policy is still finding the placement — orange dropped off edge_

## TL;DR

- **Task**: SO-101 single-arm picks 3 oranges sequentially and places each in a plate (LeIsaac PickOrange).
- **Architecture**: GR00T N1.6 — Eagle 2.5 VLM (frozen) + cross-attention DiT action head (trainable). chunk_size=50, n_action_steps=16, 4-step rectified-flow denoising.
- **Training**: 6500 step / batch=16 (per-step=2 × grad_accum=8) / adafactor / bf16 / gradient_checkpointing with `use_reentrant=False`.
- **Hardware**: single **RTX 4090 24GB** (with `DISABLE_ADDMM_CUDA_LT=1`, watchdog auto-resume on intermittent CUDA assert).
- **🏆 Benchmark-aligned eval (3 round × 120s sim × 180s wall_cap)** vs LeIsaac leaderboard:

| Model | Strict rounds | Oranges placed |
|---|---|---|
| hi-space N1.6 (公开 SOTA) | 2/3 | 6/9 |
| ACT | 1/3 | 6/9 |
| X-VLA best | 0/3 | 4/9 |
| **🏆 This ckpt-6500** | **2/3** | **8/9** ⭐ |

## Architecture / training recipe

```
base_model              nvidia/GR00T-N1.6-3B
tune_llm                False
tune_visual             False
tune_projector          True
tune_diffusion_model    True
tune_top_llm_layers     4 (default, kept)
backbone_trainable_params_fp32   False     ← 4090 squeeze
optim                   adafactor          ← 4090 squeeze
gradient_checkpointing  True (use_reentrant=False, custom monkey-patch)
bf16                    True
DISABLE_ADDMM_CUDA_LT   1                  ← workaround torch 2.7.1 cublasLt bf16 bug
global_batch_size       16
gradient_accumulation_steps   8            ← per-step micro-batch = 2
max_steps               8000 (best ckpt at step 6500)
save_steps              100 (with custom keep-multiples-of-500 prune callback)
```

## Training notes / known issues

- **4090 24GB is the hard limit**: N1.6 N1.6 全参 FT on 24GB requires every memory hack stacked: bf16 + grad-ckpt with `use_reentrant=False` + adafactor + `backbone_trainable_params_fp32=False` + `DISABLE_ADDMM_CUDA_LT=1`. Without any of these we hit either OOM or `RuntimeError: d.is_cuda() INTERNAL ASSERT FAILED at CUDAGuardImpl.h:34`.
- **Random CUDA assert** still happens every ~500-700 step despite the patches. We wrap training in a watchdog that auto-resumes from the latest checkpoint after each crash; net throughput ~70% of crash-free.
- **Score variance**: per-checkpoint quality oscillates wildly (e.g. ckpt-5000 = 16/18 in one 6-round eval, ckpt-5500 = 0/18 in the next). We attribute this to the optimization being run at the absolute memory edge — gradients and optim states may quantize inconsistently. The 8/9 result here is benchmark-aligned single 3-round run; expect ±20% noise on any individual run.

## Inference

Use [Isaac-GR00T's `run_gr00t_server.py`](https://github.com/NVIDIA/Isaac-GR00T) directly:

```bash
cd /path/to/Isaac-GR00T
uv run --extra=gpu python gr00t/eval/run_gr00t_server.py \
    --embodiment-tag NEW_EMBODIMENT \
    --model-path wsagi/GR00T-N1.6-PickOrange \
    --host 0.0.0.0 --port 5555
```

Then on the Isaac Sim eval side (LeIsaac):

```bash
POLICY_PORT=5555 \
ACTION_HORIZON=16 \
EVAL_ROUNDS=3 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \
PROMPT="Pick up the orange and put it in the plate" \
bash server/eval_gr00t.sh
```

## Branches

| branch | step | benchmark (3-round) | notes |
|---|---|---|---|
| **main** | **6500** | **2/3 strict, 8/9 oranges, 115s avg** | best |
| `ckpt-3500` | 3500 | 0/3, 2/9, 180s | first transition out of destruction phase |
| `ckpt-5000` | 5000 | 0/3, 4/9, 180s | strong 6-round (16/18) but volatile under 3-round |
| `ckpt-7000` | 7000 | 1/3, 6/9, 146s | secondary peak |

## License

Apache-2.0 / NVIDIA Open Model License (inherited from base nvidia/GR00T-N1.6-3B). See base model card.