Robotics
LeRobot
Safetensors
act

Observations and Actions

This policy is an ACT model trained for SC connector insertion using 3 RGB cameras and a compact robot/task state.

Observation

At each control step, the model receives:

1. Multi-view RGB images

  • observation.images.left_camera
  • observation.images.center_camera
  • observation.images.right_camera

Each image has shape:

  • 3 x 1024 x 1152

These views provide visual information about:

  • the cable and plug
  • the task board and target port
  • the robot end-effector relative to the insertion target

2. Low-dimensional state

observation.state has 16 dimensions:

  1. tcp_pose.position.x
  2. tcp_pose.position.y
  3. tcp_pose.position.z
  4. tcp_pose.orientation.x
  5. tcp_pose.orientation.y
  6. tcp_pose.orientation.z
  7. tcp_pose.orientation.w
  8. task.target_valid
  9. task.cable_type_id
  10. task.plug_type_id
  11. task.port_type_id
  12. task.target_module_id
  13. task.target_port_id
  14. task.target_module_index
  15. task.target_port_index
  16. task.time_limit

This state provides:

  • the current tool-center-point pose
  • numeric task conditioning describing what to insert and where

Action

The model predicts an action vector of 7 dimensions:

  1. cartesian.pose.position.x
  2. cartesian.pose.position.y
  3. cartesian.pose.position.z
  4. cartesian.pose.orientation.x
  5. cartesian.pose.orientation.y
  6. cartesian.pose.orientation.z
  7. cartesian.pose.orientation.w

Action semantics

This is an absolute Cartesian pose policy.

The model directly predicts:

  • the target Cartesian position of the tool center point
  • the target orientation as a quaternion

So instead of outputting a correction relative to the previous command, the model outputs a complete target pose in the robot base frame.

Why absolute pose actions?

This formulation is simple and direct:

  • the model observes the current scene
  • the model predicts where the end effector should go next
  • the controller receives that target pose directly

This can work well when:

  • the task geometry is consistent
  • the frame definition is stable
  • demonstrations are precise and repeatable

Control interpretation

During deployment:

  1. the model predicts a 7D Cartesian pose target
  2. the pose is interpreted as an absolute command
  3. that pose is sent directly to the robot controller

So the policy acts as a vision-conditioned Cartesian pose predictor for the insertion task.

Model Card for act

Action Chunking with Transformers (ACT) is an imitation-learning method that predicts short action chunks instead of single steps. It learns from teleoperated data and often achieves high success rates.

This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.


How to Get Started with the Model

For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval:

Train from scratch

lerobot-train \
  --dataset.repo_id=${HF_USER}/<dataset> \
  --policy.type=act \
  --output_dir=outputs/train/<desired_policy_repo_id> \
  --job_name=lerobot_training \
  --policy.device=cuda \
  --policy.repo_id=${HF_USER}/<desired_policy_repo_id>
  --wandb.enable=true

Writes checkpoints to outputs/train/<desired_policy_repo_id>/checkpoints/.

Evaluate the policy/run inference

lerobot-record \
  --robot.type=so100_follower \
  --dataset.repo_id=<hf_user>/eval_<dataset> \
  --policy.path=<hf_user>/<desired_policy_repo_id> \
  --episodes=10

Prefix the dataset repo with eval_ and supply --policy.path pointing to a local or hub checkpoint.


Model Details

  • License: apache-2.0
Downloads last month
6
Safetensors
Model size
51.6M params
Tensor type
F32
·
Video Preview
loading

Paper for rangers-intrinsic/SC-only-connector-insertion-72successes-simplified