Upload README.md with huggingface_hub

41c6951 verified 5 months ago

5.61 kB

license: apache-2.0
tags:
  - robotics
  - lerobot
  - pi0
  - vla
  - imitation-learning
  - so101
datasets:
  - abdul004/so101_ball_in_cup_v5
pipeline_tag: robotics

SO-101 Ball-in-Cup Pi0.5 Policy

A fine-tuned Pi0.5 (π₀.₅) Vision-Language-Action model for the ball-in-cup task using the SO-101 robot arm.

Task Description

Goal: Pick up an orange ball from the table and place it into a pink cup.

Robot: SO-101 - 6-DOF robot arm with gripper

Cameras: Dual camera setup (overhead + wrist-mounted)

Model Architecture

Pi0.5 is a Vision-Language-Action (VLA) model from Physical Intelligence:

Component	Description
Vision Encoder	SigLIP 400M - processes camera images
Language Model	Gemma 2B - scene understanding & task grounding
Action Expert	Flow Matching head - generates smooth action trajectories
Total Parameters	~3B

The model takes natural language instructions + camera images → outputs continuous joint actions.

Training Details

Parameter	Value
Base Model	Pi0.5 (Physical Intelligence)
Dataset	abdul004/so101_ball_in_cup_v5
Episodes	72 teleoperated demonstrations
Frames	25,045
Fine-tuning Steps	5,000
Hardware	A100 80GB on RunPod
Training Time	~3-4 hours
Cost	~$6-8 USD
Framework	OpenPi (JAX/Flax)

Inference Performance

JPEG Compression Optimization

We implemented JPEG compression to reduce network transfer time for remote inference:

Location	Raw Images	JPEG (Q80)	Speedup
EU Spot	1448ms	375ms	3.9x
US On-Demand	600ms	270ms	2.2x

Metric	Before	After
Payload Size	1.8 MB	71 KB
Control Rate (US)	1.7 Hz	3.7 Hz
Compression Ratio	-	25x

Architecture

[RunPod GPU Server]              [Robot Mac]
┌─────────────────┐              ┌──────────────┐
│ Pi0.5 Model     │◄── WSS ────►│ run_pi05.py  │
│ (RTX 4090)      │   JPEG      │ (Robot ctrl) │
└─────────────────┘              └──────────────┘

Demo

With JPEG Compression (~270ms latency)

Side-by-side: Overhead camera (left) + Wrist camera (right) - Smooth 3.7 Hz control

Without JPEG Compression (~600ms latency)

Side-by-side: Same task but with raw image transfer - 1.7 Hz control

Sample Evaluation

JPEG Compression (Fast)

5-frame composite: Start → Approach → Grasp → Transport → Final

Raw Images (Slow)

Same task without JPEG optimization

Usage

Server Setup (RunPod)

# Clone OpenPi fork with JPEG support
git clone https://github.com/abdulrahman004/openpi.git
cd openpi
uv sync

# Download checkpoint
uv run huggingface-cli download abdul004/pi05_so101_checkpoint \
    --include "4999/**" \
    --local-dir checkpoints/pi05_so101

# Start server
uv run scripts/serve_policy.py --port 8000 \
    policy:checkpoint \
    --policy.config=pi05_so101 \
    --policy.dir=checkpoints/pi05_so101/4999

Client (Robot Mac)

pip install openpi-client

# Run inference with JPEG compression
python run_pi05.py --server wss://YOUR-POD-8000.proxy.runpod.net

# Or without compression (slower)
python run_pi05.py --server wss://YOUR-POD-8000.proxy.runpod.net --no-jpeg

Comparison with ACT Policy

Trained on the same dataset:

Policy	Architecture	Inference	Grasp	Generalization
Pi0.5	VLA (3B params)	Remote GPU	✅	✅ Edge positions
ACT	Transformer (25M)	Local	✅	⚠️ Center only

Key advantage: Pi0.5 successfully picks up ball from edge positions that ACT couldn't handle - demonstrates better generalization from VLA pre-training.

Infrastructure Notes

Remote Inference Setup:

Server: RunPod RTX 4090 24GB (~$0.40/hr on-demand)
Client: Mac Mini M4 controlling SO-101 robot
Protocol: WebSocket with msgpack serialization
Optimization: JPEG compression reduces 1.8MB → 71KB per inference

Known Issues:

RTX 4090 is borderline for memory - occasional OOM during model loading
US datacenters preferred (2x faster than EU for network transfer)
First inference takes 30-60s (JAX JIT compilation)

Limitations

Requires GPU server for inference (not yet optimized for edge deployment)
Sensitive to lighting changes
72 training episodes may limit extreme edge case handling

Citation

@misc{so101_pi05_ball_in_cup,
  author = {Abdul},
  title = {SO-101 Ball-in-Cup Pi0.5 Fine-tuning},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/abdul004/pi05_so101_checkpoint}
}

Acknowledgments

Physical Intelligence for Pi0.5 and OpenPi
LeRobot by Hugging Face
SO-101 robot design community