Observations and Actions

This policy is an ACT model trained for SC connector insertion using 3 RGB cameras and a compact robot/task state.

Observation

At each control step, the model receives:

1. Multi-view RGB images

observation.images.left_camera
observation.images.center_camera
observation.images.right_camera

Each image has shape:

3 x 1024 x 1152

These views provide visual information about:

the cable and plug
the task board and target port
the robot end-effector relative to the insertion target

2. Low-dimensional state

observation.state has 16 dimensions:

tcp_pose.position.x
tcp_pose.position.y
tcp_pose.position.z
tcp_pose.orientation.x
tcp_pose.orientation.y
tcp_pose.orientation.z
tcp_pose.orientation.w
task.target_valid
task.cable_type_id
task.plug_type_id
task.port_type_id
task.target_module_id
task.target_port_id
task.target_module_index
task.target_port_index
task.time_limit

This state provides:

the current tool-center-point pose
numeric task conditioning describing what to insert and where

Action

The model predicts an action vector of 7 dimensions:

cartesian.pose.position.x
cartesian.pose.position.y
cartesian.pose.position.z
cartesian.pose.orientation.x
cartesian.pose.orientation.y
cartesian.pose.orientation.z
cartesian.pose.orientation.w

Action semantics

This is an absolute Cartesian pose policy.

The model directly predicts:

the target Cartesian position of the tool center point
the target orientation as a quaternion

So instead of outputting a correction relative to the previous command, the model outputs a complete target pose in the robot base frame.

Why absolute pose actions?

This formulation is simple and direct:

the model observes the current scene
the model predicts where the end effector should go next
the controller receives that target pose directly

This can work well when:

the task geometry is consistent
the frame definition is stable
demonstrations are precise and repeatable

Control interpretation

During deployment:

the model predicts a 7D Cartesian pose target
the pose is interpreted as an absolute command
that pose is sent directly to the robot controller

So the policy acts as a vision-conditioned Cartesian pose predictor for the insertion task.

Model Card for act

Action Chunking with Transformers (ACT) is an imitation-learning method that predicts short action chunks instead of single steps. It learns from teleoperated data and often achieves high success rates.

This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.

How to Get Started with the Model

For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval:

Train from scratch

lerobot-train \
  --dataset.repo_id=${HF_USER}/<dataset> \
  --policy.type=act \
  --output_dir=outputs/train/<desired_policy_repo_id> \
  --job_name=lerobot_training \
  --policy.device=cuda \
  --policy.repo_id=${HF_USER}/<desired_policy_repo_id>
  --wandb.enable=true

Writes checkpoints to outputs/train/<desired_policy_repo_id>/checkpoints/.

Evaluate the policy/run inference

lerobot-record \
  --robot.type=so100_follower \
  --dataset.repo_id=<hf_user>/eval_<dataset> \
  --policy.path=<hf_user>/<desired_policy_repo_id> \
  --episodes=10

Prefix the dataset repo with eval_ and supply --policy.path pointing to a local or hub checkpoint.

Model Details

License: apache-2.0

Downloads last month: 6

Safetensors

Model size

51.6M params

Tensor type

F32

Video Preview

Robotics

Paper for rangers-intrinsic/SC-only-connector-insertion-72successes-simplified

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Paper • 2304.13705 • Published Apr 23, 2023 • 7