---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
model_name: pi05
base_model: lerobot/pi05_base
datasets:
- CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps
tags:
- robotics
- lerobot
- pi05
- vision-language-action
- imitation-learning
- safetensors
- ur7e
---

# Model Card for π0.5 — UR7e PickandPlace (30 epoch)

**π₀.₅ (Pi05) Policy**

π₀.₅ is a Vision-Language-Action model with open-world generalization, from
Physical Intelligence. The LeRobot implementation is adapted from their open
source OpenPI repository. See the
[Physical Intelligence π₀.₅ blog post](https://www.physicalintelligence.company/blog/pi05).

This checkpoint is a **fine-tune of [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base)**
on the [`CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps)
dataset for a UR7e single-arm pick-and-place task.

This policy has been trained and pushed to the Hub using
[LeRobot](https://github.com/huggingface/lerobot). See the full documentation at
[LeRobot Docs](https://huggingface.co/docs/lerobot/index).

---

## Training Summary

| Field | Value |
|---|---|
| Base model | `lerobot/pi05_base` |
| Dataset | `CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps` (100 eps, 35,878 frames, 10 fps) |
| Robot | UR7e single-arm, 7-DoF (6 joints + gripper) |
| Cameras | `realsense_topview`, `realsense_wrist` (renamed → `base_0_rgb`/`left_wrist_0_rgb`) |
| Steps | 4,300  (≈ 30 epoch · 35878 × 30 / 256) |
| Batch | 32 × 2 GPU × 4 grad_accum = 256 per optimizer-step samples |
| VLM / Action expert | PaliGemma `gemma_2b` / `gemma_300m`, `bfloat16` |
| Optimizer | AdamW (lr 1e-4, betas (0.9, 0.95), wd 1e-10), cosine decay w/ warmup 1000 |
| Chunk / Action steps | 50 / 50 |
| Memory | `gradient_checkpointing=true`, `compile_model=false` |
| Normalization | ACTION/STATE = `MEAN_STD`, VISUAL = `IDENTITY` |
| Image augmentation | brightness, contrast, saturation, hue, sharpness, affine (max 3, random order) |
| Hardware | 2× NVIDIA RTX PRO 6000 Blackwell |

`action`/`observation.state` dim 은 7 이며, π0.5 의 `max_action_dim=32`, `max_state_dim=32` 으로 자동 zero-pad 됩니다.

---

## How to Get Started

### Inference (load + step)

```python
import torch
from lerobot.policies.pi05.modeling_pi05 import PI05Policy

policy = PI05Policy.from_pretrained("CoRL2026-CSI/pi05-UR7e-PickandPlace-30epoch")
policy.to("cuda").eval()

# observation 의 카메라 키는 학습 시 사용한 이름(`observation.images.base_0_rgb`,
# `observation.images.left_wrist_0_rgb`) 과 동일해야 합니다.
with torch.inference_mode():
    action = policy.select_action(observation)
```

### Continue fine-tuning

```bash
lerobot-train \
  --policy.path=CoRL2026-CSI/pi05-UR7e-PickandPlace-30epoch \
  --dataset.repo_id=CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps \
  --output_dir=outputs/train/pi05_ur7e_pickandplace_ft \
  --job_name=pi05_ur7e_pickandplace_ft \
  --batch_size=32 --gradient_accumulation_steps=4 --steps=1000 \
  --policy.device=cuda --policy.dtype=bfloat16 \
  --policy.gradient_checkpointing=true --wandb.enable=true
```

원본 학습 스크립트는 `scripts/cap/pi05_cap_ur7e_pickandplace.sh` 이며,
정확한 hyperparameter 는 이 리포의 `train_config.json` 으로도 재구성 가능합니다.

---

## Model Details

- **License:** apache-2.0
- **Base model:** [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base)
- **Library:** [LeRobot](https://github.com/huggingface/lerobot)
- **Trained by:** CoRL2026-CSI