---
license: mit
tags:
  - reinforcement-learning
  - robotics
  - isaac-lab
  - amp
  - humanoid
  - motion-imitation
  - unitree-g1
library_name: skrl
pipeline_tag: reinforcement-learning
---

# G1 Humanoid Motion Imitation (AMP) - Step 2 Physics Calibration

Reinforcement learning policy for the **Unitree G1 humanoid robot** to imitate 494 human motion capture sequences from the AMASS dataset, trained using **Adversarial Motion Priors (AMP)** in **Isaac Lab**.

This is the **Step 2 (Physics Calibration)** checkpoint — pure tracking mode with curriculum domain randomization, before style injection.

## Model Details

| Parameter | Value |
|---|---|
| **Robot** | Unitree G1 (37 DOFs, 23 active via DOF mask) |
| **Algorithm** | AMP (Adversarial Motion Priors) via skrl |
| **Framework** | Isaac Lab 2.3.0 / Isaac Sim 5.1.0 |
| **Motion Dataset** | 494 AMASS motions (~113 min, 196,642 frames) |
| **Training Mode** | Pure tracking (tracking=1.0, style=0.0, discriminator OFF) |
| **Training Hardware** | NVIDIA RTX 4080 SUPER (16GB VRAM) |
| **Training Duration** | ~5.5 days (~12.4M timesteps total, best at 3.85M) |

## Performance (Best Checkpoint)

| Metric | Value |
|---|---|
| **Total Reward (mean)** | 160.29 |
| **Total Reward (max)** | 285.12 |
| **Episode Length (mean)** | 396.9 / 400 steps |
| **Best Checkpoint Step** | 3,850,240 |

### Tracking Reward

The tracking reward is an exponential kernel (`exp(-error / 0.2)`) over a weighted combination of pose errors. Range is 0.0 (poor) to 1.0 (perfect match).

| Metric | At Best Checkpoint | Peak (all time) |
|---|---|---|
| **Instantaneous Reward (mean)** | 0.400 | 0.402 (step 5.5M) |
| **Instantaneous Reward (max)** | 0.596 | 0.673 (step 8.8M) |
| **Tracking Reward (mean)** | 0.317 | 0.576 (step 0 — easy motions) |
| **Tracking Reward (max)** | 0.585 | 0.623 (step 12M) |

The instantaneous reward mean of ~0.40 indicates the policy tracks the average motion with reasonable fidelity across all 494 diverse motions. The max of ~0.60 shows strong tracking on easier motions.

## Architecture

- **Policy**: Gaussian MLP (1024 → 512 → 23), fixed log_std = -2.9
- **Value**: Deterministic MLP (1024 → 512 → 1)
- **Discriminator**: Deterministic MLP (1024 → 512 → 1) with ELU (disabled in Step 2)
- **Observation Space**: 216 dimensions (joint pos/vel, root state, future reference targets)
- **Action Space**: 23 dimensions (joint position targets, scaled by 0.5)

## Training Configuration

- **Environments**: 1024 parallel
- **Rollouts**: 16 steps
- **Learning Rate**: 2.5e-5
- **Discount Factor**: 0.99
- **GAE Lambda**: 0.95
- **Mini-batches**: 2
- **Learning Epochs**: 6
- **PPO Clip**: 0.2
- **Physics dt**: 0.005s (200Hz), decimation=4, 50Hz control

### Domain Randomization (Curriculum)

Linearly interpolated from initial to target ranges over 240k iterations:

| Parameter | Initial Range | Target Range |
|---|---|---|
| Mass | (0.95, 1.05) | (0.8, 1.2) |
| Friction | (0.9, 1.1) | (0.6, 1.4) |
| PD Gains | (0.9, 1.1) | (0.7, 1.3) |
| Action Delay | (0, 1) | (0, 2) steps |

### Reward Weights

| Component | Weight |
|---|---|
| Tracking | 1.0 |
| Action Rate Penalty | 0.01 |
| Termination (height < 0.6m) | -200.0 |
| Style (discriminator) | 0.0 (disabled) |

### Tracking Metric Weights

| Component | Weight |
|---|---|
| Root Rotation | 0.4 |
| End Effector Position | 0.3 |
| Joint Position | 0.2 |
| Root Position XY | 0.1 |

## Usage

### Evaluation

```bash
# Inside Isaac Lab Docker container
cd /workspace/isaaclab
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt
```

### Record Video

```bash
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt \
    --video --video_length 500
```

### Resume Training

```bash
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/train.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 1024 --headless \
    --checkpoint /path/to/best_agent.pt
```

## Files

```
├── best_agent.pt          # Full checkpoint — policy + value + discriminator + optimizer (25 MB)
├── policy_jit.pt          # JIT-traced policy only — for deployment/inference (2.9 MB)
├── params/
│   ├── agent.yaml         # skrl agent configuration
│   └── env.yaml           # Environment configuration
└── README.md              # This model card
```

### JIT Model

`policy_jit.pt` is a TorchScript-traced policy network (input: 216-dim observation, output: 23-dim joint targets). It runs without skrl or Isaac Lab dependencies:

```python
import torch
model = torch.jit.load("policy_jit.pt")
obs = torch.randn(1, 216)  # [batch, obs_dim]
actions = model(obs)        # [batch, 23] — joint position targets (scale by 0.5)
```

Use `best_agent.pt` to resume training. Use `policy_jit.pt` for deployment or sim-to-real transfer.

## Three-Step Training Strategy

This checkpoint is from **Step 2** of a three-step curriculum:

| Step | Goal | Discriminator | Status |
|---|---|---|---|
| 1. Verification | Physics check (50 easy motions) | OFF | Complete |
| **2. Physics Calibration** | **Master all 494 motions** | **OFF** | **This checkpoint** |
| 3. Style Injection | Add natural motion style | ON | Pending |

## Citation

```bibtex
@misc{pathonai2026g1imitate,
  title={G1 Humanoid Motion Imitation with AMP in Isaac Lab},
  author={PathOn-AI},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/PathOn-AI/g1-imitate-isaaclab-amp}
}
```

## License

MIT