--- license: apache-2.0 library_name: lerobot pipeline_tag: robotics model_name: pi05 base_model: lerobot/pi05_base datasets: - CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps tags: - robotics - lerobot - pi05 - vision-language-action - imitation-learning - safetensors - ur7e --- # Model Card for π0.5 — UR7e PickandPlace (30 epoch) **π₀.₅ (Pi05) Policy** π₀.₅ is a Vision-Language-Action model with open-world generalization, from Physical Intelligence. The LeRobot implementation is adapted from their open source OpenPI repository. See the [Physical Intelligence π₀.₅ blog post](https://www.physicalintelligence.company/blog/pi05). This checkpoint is a **fine-tune of [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base)** on the [`CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps) dataset for a UR7e single-arm pick-and-place task. This policy has been trained and pushed to the Hub using [LeRobot](https://github.com/huggingface/lerobot). See the full documentation at [LeRobot Docs](https://huggingface.co/docs/lerobot/index). --- ## Training Summary | Field | Value | |---|---| | Base model | `lerobot/pi05_base` | | Dataset | `CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps` (100 eps, 35,878 frames, 10 fps) | | Robot | UR7e single-arm, 7-DoF (6 joints + gripper) | | Cameras | `realsense_topview`, `realsense_wrist` (renamed → `base_0_rgb`/`left_wrist_0_rgb`) | | Steps | 4,300 (≈ 30 epoch · 35878 × 30 / 256) | | Batch | 32 × 2 GPU × 4 grad_accum = 256 per optimizer-step samples | | VLM / Action expert | PaliGemma `gemma_2b` / `gemma_300m`, `bfloat16` | | Optimizer | AdamW (lr 1e-4, betas (0.9, 0.95), wd 1e-10), cosine decay w/ warmup 1000 | | Chunk / Action steps | 50 / 50 | | Memory | `gradient_checkpointing=true`, `compile_model=false` | | Normalization | ACTION/STATE = `MEAN_STD`, VISUAL = `IDENTITY` | | Image augmentation | brightness, contrast, saturation, hue, sharpness, affine (max 3, random order) | | Hardware | 2× NVIDIA RTX PRO 6000 Blackwell | `action`/`observation.state` dim 은 7 이며, π0.5 의 `max_action_dim=32`, `max_state_dim=32` 으로 자동 zero-pad 됩니다. --- ## How to Get Started ### Inference (load + step) ```python import torch from lerobot.policies.pi05.modeling_pi05 import PI05Policy policy = PI05Policy.from_pretrained("CoRL2026-CSI/pi05-UR7e-PickandPlace-30epoch") policy.to("cuda").eval() # observation 의 카메라 키는 학습 시 사용한 이름(`observation.images.base_0_rgb`, # `observation.images.left_wrist_0_rgb`) 과 동일해야 합니다. with torch.inference_mode(): action = policy.select_action(observation) ``` ### Continue fine-tuning ```bash lerobot-train \ --policy.path=CoRL2026-CSI/pi05-UR7e-PickandPlace-30epoch \ --dataset.repo_id=CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps \ --output_dir=outputs/train/pi05_ur7e_pickandplace_ft \ --job_name=pi05_ur7e_pickandplace_ft \ --batch_size=32 --gradient_accumulation_steps=4 --steps=1000 \ --policy.device=cuda --policy.dtype=bfloat16 \ --policy.gradient_checkpointing=true --wandb.enable=true ``` 원본 학습 스크립트는 `scripts/cap/pi05_cap_ur7e_pickandplace.sh` 이며, 정확한 hyperparameter 는 이 리포의 `train_config.json` 으로도 재구성 가능합니다. --- ## Model Details - **License:** apache-2.0 - **Base model:** [`lerobot/pi05_base`](https://huggingface.co/lerobot/pi05_base) - **Library:** [LeRobot](https://github.com/huggingface/lerobot) - **Trained by:** CoRL2026-CSI