---
license: apache-2.0
tags:
  - reinforcement-learning
  - robotics
  - quadruped
  - manipulation
  - recovery
  - isaac-lab
  - ppo
  - rsl-rl
library_name: rsl-rl
pipeline_tag: reinforcement-learning
---

# Go2+Z1 Standup Recovery Policy (RL, full 18-DOF)

PPO policy that lets the **Unitree Go2 + Z1** composite robot recover from arbitrary fallen poses by using its **legs and Z1 arm together as a self-righting kinematic chain**, then automatically folds the Z1 back to the carry pose once the trunk is upright.

## Behaviour

1. Reset spawns the robot at `trunk_z = 0.18 m` with random orientation (full quaternion sampled by `reset_root_state_with_random_orientation`)
2. Policy commands all 18 joints (12 leg + 6 arm) for up to 5 s
3. Reward favours: trunk lifted, projected gravity aligned with -Z, Z1 close to its `Z1_FOLDED_DEFAULT` pose, sparse +10 success bonus when all three are satisfied
4. Episode ends if `trunk_z < 0.05 m` (collapsed) or time-out

## Highlights

- 4096 parallel envs × 3000 PPO iters
- Mean reward climbs from 0.6 (random) → 89+ (final)
- `standup_success` sparse bonus reaches 8.6 / episode (≈86 % of timesteps satisfy the success criterion)
- `trunk_collapsed` termination rate ≈ 0 — robot does not give up

## Architecture

Same rsl-rl actor-critic shape as our walking policies (3-layer MLP 512-256-128 ELU). Action dim = 18.

## Reward composition

```python
trunk_height_reward      weight  +5.0   # clamp(z / 0.32, 0, 1)
upright_alignment        weight  +3.0   # clamp(-projected_gravity_b[2], 0, 1)
z1_fold                  weight  +2.0   # exp(-||z1_pos - Z1_FOLDED_DEFAULT||)
standup_success (sparse) weight +10.0   # 1 if z>0.28 ∧ upright>0.92 ∧ fold_err<0.3
action_rate_l2           weight -0.005
joint_acc_l2             weight -2.5e-7
joint_torques_l2         weight -1e-5
```

## Files

- `standup_v1.pt` — rsl-rl `OnPolicyRunner` checkpoint

## Usage

```python
import torch, torch.nn as nn

state = torch.load("standup_v1.pt", map_location="cuda:0", weights_only=False)
sd = state["actor_state_dict"]
h, obs_dim = sd["mlp.0.weight"].shape[0], sd["mlp.0.weight"].shape[1]
act_dim = sd["mlp.6.weight"].shape[0]   # 18 for full-body recovery
actor = nn.Sequential(
    nn.Linear(obs_dim, h), nn.ELU(),
    nn.Linear(h, h), nn.ELU(),
    nn.Linear(h, h), nn.ELU(),
    nn.Linear(h, act_dim),
).cuda().eval()
actor.load_state_dict({k.replace("mlp.", ""): v for k, v in sd.items() if k.startswith("mlp.")})
```

For a full integration example (drop fallen robot into warehouse, run standup, verify Z1 auto-folds), see [`stage4_joint_eval/standup_recovery.py`](https://github.com/aws300/go2_z1_warehouse/blob/main/go2_z1_warehouse/stage4_joint_eval/standup_recovery.py).

## Training data

On-policy RL — no offline dataset. The Isaac Lab task is registered as `Isaac-Standup-Go2Z1-v0` and lives at:

- Repo: <https://github.com/aws300/go2_z1_warehouse>
- Task config: `go2_z1_warehouse/stage5_standup/standup_env_cfg.py`
- Training launcher: `go2_z1_warehouse/stage5_standup/train_launcher.py`

## Citation

```bibtex
@misc{go2z1-standup-v1,
  title  = {Go2+Z1 Standup Recovery Policy (RL, full 18-DOF)},
  author = {m3},
  year   = {2026},
  url    = {https://huggingface.co/m3/go2z1-standup-rl-v1}
}
```