--- license: apache-2.0 tags: - reinforcement-learning - robotics - quadruped - manipulation - recovery - isaac-lab - ppo - rsl-rl library_name: rsl-rl pipeline_tag: reinforcement-learning --- # Go2+Z1 Standup Recovery Policy (RL, full 18-DOF) PPO policy that lets the **Unitree Go2 + Z1** composite robot recover from arbitrary fallen poses by using its **legs and Z1 arm together as a self-righting kinematic chain**, then automatically folds the Z1 back to the carry pose once the trunk is upright. ## Behaviour 1. Reset spawns the robot at `trunk_z = 0.18 m` with random orientation (full quaternion sampled by `reset_root_state_with_random_orientation`) 2. Policy commands all 18 joints (12 leg + 6 arm) for up to 5 s 3. Reward favours: trunk lifted, projected gravity aligned with -Z, Z1 close to its `Z1_FOLDED_DEFAULT` pose, sparse +10 success bonus when all three are satisfied 4. Episode ends if `trunk_z < 0.05 m` (collapsed) or time-out ## Highlights - 4096 parallel envs × 3000 PPO iters - Mean reward climbs from 0.6 (random) → 89+ (final) - `standup_success` sparse bonus reaches 8.6 / episode (≈86 % of timesteps satisfy the success criterion) - `trunk_collapsed` termination rate ≈ 0 — robot does not give up ## Architecture Same rsl-rl actor-critic shape as our walking policies (3-layer MLP 512-256-128 ELU). Action dim = 18. ## Reward composition ```python trunk_height_reward weight +5.0 # clamp(z / 0.32, 0, 1) upright_alignment weight +3.0 # clamp(-projected_gravity_b[2], 0, 1) z1_fold weight +2.0 # exp(-||z1_pos - Z1_FOLDED_DEFAULT||) standup_success (sparse) weight +10.0 # 1 if z>0.28 ∧ upright>0.92 ∧ fold_err<0.3 action_rate_l2 weight -0.005 joint_acc_l2 weight -2.5e-7 joint_torques_l2 weight -1e-5 ``` ## Files - `standup_v1.pt` — rsl-rl `OnPolicyRunner` checkpoint ## Usage ```python import torch, torch.nn as nn state = torch.load("standup_v1.pt", map_location="cuda:0", weights_only=False) sd = state["actor_state_dict"] h, obs_dim = sd["mlp.0.weight"].shape[0], sd["mlp.0.weight"].shape[1] act_dim = sd["mlp.6.weight"].shape[0] # 18 for full-body recovery actor = nn.Sequential( nn.Linear(obs_dim, h), nn.ELU(), nn.Linear(h, h), nn.ELU(), nn.Linear(h, h), nn.ELU(), nn.Linear(h, act_dim), ).cuda().eval() actor.load_state_dict({k.replace("mlp.", ""): v for k, v in sd.items() if k.startswith("mlp.")}) ``` For a full integration example (drop fallen robot into warehouse, run standup, verify Z1 auto-folds), see [`stage4_joint_eval/standup_recovery.py`](https://github.com/aws300/go2_z1_warehouse/blob/main/go2_z1_warehouse/stage4_joint_eval/standup_recovery.py). ## Training data On-policy RL — no offline dataset. The Isaac Lab task is registered as `Isaac-Standup-Go2Z1-v0` and lives at: - Repo: - Task config: `go2_z1_warehouse/stage5_standup/standup_env_cfg.py` - Training launcher: `go2_z1_warehouse/stage5_standup/train_launcher.py` ## Citation ```bibtex @misc{go2z1-standup-v1, title = {Go2+Z1 Standup Recovery Policy (RL, full 18-DOF)}, author = {m3}, year = {2026}, url = {https://huggingface.co/m3/go2z1-standup-rl-v1} } ```