Robotics
LeRobot
Safetensors
pi05
vision-language-action
imitation-learning
ur7e
File size: 3,602 Bytes
4b47817
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
model_name: pi05
base_model: lerobot/pi05_base
datasets:
- CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps
tags:
- robotics
- lerobot
- pi05
- vision-language-action
- imitation-learning
- safetensors
- ur7e
---

# Model Card for Ο€0.5 β€” UR7e PickandPlace (30 epoch)

**Ο€β‚€.β‚… (Pi05) Policy**

Ο€β‚€.β‚… is a Vision-Language-Action model with open-world generalization, from
Physical Intelligence. The LeRobot implementation is adapted from their open
source OpenPI repository. See the
[Physical Intelligence Ο€β‚€.β‚… blog post](https://www.physicalintelligence.company/blog/pi05).

This checkpoint is a **fine-tune of [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base)**
on the [`CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps)
dataset for a UR7e single-arm pick-and-place task.

This policy has been trained and pushed to the Hub using
[LeRobot](https://github.com/huggingface/lerobot). See the full documentation at
[LeRobot Docs](https://huggingface.co/docs/lerobot/index).

---

## Training Summary

| Field | Value |
|---|---|
| Base model | `lerobot/pi05_base` |
| Dataset | `CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps` (100 eps, 35,878 frames, 10 fps) |
| Robot | UR7e single-arm, 7-DoF (6 joints + gripper) |
| Cameras | `realsense_topview`, `realsense_wrist` (renamed β†’ `base_0_rgb`/`left_wrist_0_rgb`) |
| Steps | 4,300  (β‰ˆ 30 epoch Β· 35878 Γ— 30 / 256) |
| Batch | 32 Γ— 2 GPU Γ— 4 grad_accum = 256 per optimizer-step samples |
| VLM / Action expert | PaliGemma `gemma_2b` / `gemma_300m`, `bfloat16` |
| Optimizer | AdamW (lr 1e-4, betas (0.9, 0.95), wd 1e-10), cosine decay w/ warmup 1000 |
| Chunk / Action steps | 50 / 50 |
| Memory | `gradient_checkpointing=true`, `compile_model=false` |
| Normalization | ACTION/STATE = `MEAN_STD`, VISUAL = `IDENTITY` |
| Image augmentation | brightness, contrast, saturation, hue, sharpness, affine (max 3, random order) |
| Hardware | 2Γ— NVIDIA RTX PRO 6000 Blackwell |

`action`/`observation.state` dim 은 7 이며, Ο€0.5 의 `max_action_dim=32`, `max_state_dim=32` 으둜 μžλ™ zero-pad λ©λ‹ˆλ‹€.

---

## How to Get Started

### Inference (load + step)

```python
import torch
from lerobot.policies.pi05.modeling_pi05 import PI05Policy

policy = PI05Policy.from_pretrained("CoRL2026-CSI/pi05-UR7e-PickandPlace-30epoch")
policy.to("cuda").eval()

# observation 의 카메라 ν‚€λŠ” ν•™μŠ΅ μ‹œ μ‚¬μš©ν•œ 이름(`observation.images.base_0_rgb`,
# `observation.images.left_wrist_0_rgb`) κ³Ό 동일해야 ν•©λ‹ˆλ‹€.
with torch.inference_mode():
    action = policy.select_action(observation)
```

### Continue fine-tuning

```bash
lerobot-train \
  --policy.path=CoRL2026-CSI/pi05-UR7e-PickandPlace-30epoch \
  --dataset.repo_id=CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps \
  --output_dir=outputs/train/pi05_ur7e_pickandplace_ft \
  --job_name=pi05_ur7e_pickandplace_ft \
  --batch_size=32 --gradient_accumulation_steps=4 --steps=1000 \
  --policy.device=cuda --policy.dtype=bfloat16 \
  --policy.gradient_checkpointing=true --wandb.enable=true
```

원본 ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈλŠ” `scripts/cap/pi05_cap_ur7e_pickandplace.sh` 이며,
μ •ν™•ν•œ hyperparameter λŠ” 이 리포의 `train_config.json` μœΌλ‘œλ„ μž¬κ΅¬μ„± κ°€λŠ₯ν•©λ‹ˆλ‹€.

---

## Model Details

- **License:** apache-2.0
- **Base model:** [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base)
- **Library:** [LeRobot](https://github.com/huggingface/lerobot)
- **Trained by:** CoRL2026-CSI