File size: 5,812 Bytes
a3a9758 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | ---
license: mit
tags:
- reinforcement-learning
- robotics
- isaac-lab
- amp
- humanoid
- motion-imitation
- unitree-g1
library_name: skrl
pipeline_tag: reinforcement-learning
---
# G1 Humanoid Motion Imitation (AMP) - Step 2 Physics Calibration
Reinforcement learning policy for the **Unitree G1 humanoid robot** to imitate 494 human motion capture sequences from the AMASS dataset, trained using **Adversarial Motion Priors (AMP)** in **Isaac Lab**.
This is the **Step 2 (Physics Calibration)** checkpoint β pure tracking mode with curriculum domain randomization, before style injection.
## Model Details
| Parameter | Value |
|---|---|
| **Robot** | Unitree G1 (37 DOFs, 23 active via DOF mask) |
| **Algorithm** | AMP (Adversarial Motion Priors) via skrl |
| **Framework** | Isaac Lab 2.3.0 / Isaac Sim 5.1.0 |
| **Motion Dataset** | 494 AMASS motions (~113 min, 196,642 frames) |
| **Training Mode** | Pure tracking (tracking=1.0, style=0.0, discriminator OFF) |
| **Training Hardware** | NVIDIA RTX 4080 SUPER (16GB VRAM) |
| **Training Duration** | ~5.5 days (~12.4M timesteps total, best at 3.85M) |
## Performance (Best Checkpoint)
| Metric | Value |
|---|---|
| **Total Reward (mean)** | 160.29 |
| **Total Reward (max)** | 285.12 |
| **Episode Length (mean)** | 396.9 / 400 steps |
| **Best Checkpoint Step** | 3,850,240 |
### Tracking Reward
The tracking reward is an exponential kernel (`exp(-error / 0.2)`) over a weighted combination of pose errors. Range is 0.0 (poor) to 1.0 (perfect match).
| Metric | At Best Checkpoint | Peak (all time) |
|---|---|---|
| **Instantaneous Reward (mean)** | 0.400 | 0.402 (step 5.5M) |
| **Instantaneous Reward (max)** | 0.596 | 0.673 (step 8.8M) |
| **Tracking Reward (mean)** | 0.317 | 0.576 (step 0 β easy motions) |
| **Tracking Reward (max)** | 0.585 | 0.623 (step 12M) |
The instantaneous reward mean of ~0.40 indicates the policy tracks the average motion with reasonable fidelity across all 494 diverse motions. The max of ~0.60 shows strong tracking on easier motions.
## Architecture
- **Policy**: Gaussian MLP (1024 β 512 β 23), fixed log_std = -2.9
- **Value**: Deterministic MLP (1024 β 512 β 1)
- **Discriminator**: Deterministic MLP (1024 β 512 β 1) with ELU (disabled in Step 2)
- **Observation Space**: 216 dimensions (joint pos/vel, root state, future reference targets)
- **Action Space**: 23 dimensions (joint position targets, scaled by 0.5)
## Training Configuration
- **Environments**: 1024 parallel
- **Rollouts**: 16 steps
- **Learning Rate**: 2.5e-5
- **Discount Factor**: 0.99
- **GAE Lambda**: 0.95
- **Mini-batches**: 2
- **Learning Epochs**: 6
- **PPO Clip**: 0.2
- **Physics dt**: 0.005s (200Hz), decimation=4, 50Hz control
### Domain Randomization (Curriculum)
Linearly interpolated from initial to target ranges over 240k iterations:
| Parameter | Initial Range | Target Range |
|---|---|---|
| Mass | (0.95, 1.05) | (0.8, 1.2) |
| Friction | (0.9, 1.1) | (0.6, 1.4) |
| PD Gains | (0.9, 1.1) | (0.7, 1.3) |
| Action Delay | (0, 1) | (0, 2) steps |
### Reward Weights
| Component | Weight |
|---|---|
| Tracking | 1.0 |
| Action Rate Penalty | 0.01 |
| Termination (height < 0.6m) | -200.0 |
| Style (discriminator) | 0.0 (disabled) |
### Tracking Metric Weights
| Component | Weight |
|---|---|
| Root Rotation | 0.4 |
| End Effector Position | 0.3 |
| Joint Position | 0.2 |
| Root Position XY | 0.1 |
## Usage
### Evaluation
```bash
# Inside Isaac Lab Docker container
cd /workspace/isaaclab
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
--task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
--algorithm AMP --num_envs 16 \
--checkpoint /path/to/best_agent.pt
```
### Record Video
```bash
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
--task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
--algorithm AMP --num_envs 16 \
--checkpoint /path/to/best_agent.pt \
--video --video_length 500
```
### Resume Training
```bash
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/train.py \
--task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
--algorithm AMP --num_envs 1024 --headless \
--checkpoint /path/to/best_agent.pt
```
## Files
```
βββ best_agent.pt # Full checkpoint β policy + value + discriminator + optimizer (25 MB)
βββ policy_jit.pt # JIT-traced policy only β for deployment/inference (2.9 MB)
βββ params/
β βββ agent.yaml # skrl agent configuration
β βββ env.yaml # Environment configuration
βββ README.md # This model card
```
### JIT Model
`policy_jit.pt` is a TorchScript-traced policy network (input: 216-dim observation, output: 23-dim joint targets). It runs without skrl or Isaac Lab dependencies:
```python
import torch
model = torch.jit.load("policy_jit.pt")
obs = torch.randn(1, 216) # [batch, obs_dim]
actions = model(obs) # [batch, 23] β joint position targets (scale by 0.5)
```
Use `best_agent.pt` to resume training. Use `policy_jit.pt` for deployment or sim-to-real transfer.
## Three-Step Training Strategy
This checkpoint is from **Step 2** of a three-step curriculum:
| Step | Goal | Discriminator | Status |
|---|---|---|---|
| 1. Verification | Physics check (50 easy motions) | OFF | Complete |
| **2. Physics Calibration** | **Master all 494 motions** | **OFF** | **This checkpoint** |
| 3. Style Injection | Add natural motion style | ON | Pending |
## Citation
```bibtex
@misc{pathonai2026g1imitate,
title={G1 Humanoid Motion Imitation with AMP in Isaac Lab},
author={PathOn-AI},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/PathOn-AI/g1-imitate-isaaclab-amp}
}
```
## License
MIT
|