File size: 5,812 Bytes
a3a9758
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
license: mit
tags:
  - reinforcement-learning
  - robotics
  - isaac-lab
  - amp
  - humanoid
  - motion-imitation
  - unitree-g1
library_name: skrl
pipeline_tag: reinforcement-learning
---

# G1 Humanoid Motion Imitation (AMP) - Step 2 Physics Calibration

Reinforcement learning policy for the **Unitree G1 humanoid robot** to imitate 494 human motion capture sequences from the AMASS dataset, trained using **Adversarial Motion Priors (AMP)** in **Isaac Lab**.

This is the **Step 2 (Physics Calibration)** checkpoint β€” pure tracking mode with curriculum domain randomization, before style injection.

## Model Details

| Parameter | Value |
|---|---|
| **Robot** | Unitree G1 (37 DOFs, 23 active via DOF mask) |
| **Algorithm** | AMP (Adversarial Motion Priors) via skrl |
| **Framework** | Isaac Lab 2.3.0 / Isaac Sim 5.1.0 |
| **Motion Dataset** | 494 AMASS motions (~113 min, 196,642 frames) |
| **Training Mode** | Pure tracking (tracking=1.0, style=0.0, discriminator OFF) |
| **Training Hardware** | NVIDIA RTX 4080 SUPER (16GB VRAM) |
| **Training Duration** | ~5.5 days (~12.4M timesteps total, best at 3.85M) |

## Performance (Best Checkpoint)

| Metric | Value |
|---|---|
| **Total Reward (mean)** | 160.29 |
| **Total Reward (max)** | 285.12 |
| **Episode Length (mean)** | 396.9 / 400 steps |
| **Best Checkpoint Step** | 3,850,240 |

### Tracking Reward

The tracking reward is an exponential kernel (`exp(-error / 0.2)`) over a weighted combination of pose errors. Range is 0.0 (poor) to 1.0 (perfect match).

| Metric | At Best Checkpoint | Peak (all time) |
|---|---|---|
| **Instantaneous Reward (mean)** | 0.400 | 0.402 (step 5.5M) |
| **Instantaneous Reward (max)** | 0.596 | 0.673 (step 8.8M) |
| **Tracking Reward (mean)** | 0.317 | 0.576 (step 0 β€” easy motions) |
| **Tracking Reward (max)** | 0.585 | 0.623 (step 12M) |

The instantaneous reward mean of ~0.40 indicates the policy tracks the average motion with reasonable fidelity across all 494 diverse motions. The max of ~0.60 shows strong tracking on easier motions.

## Architecture

- **Policy**: Gaussian MLP (1024 β†’ 512 β†’ 23), fixed log_std = -2.9
- **Value**: Deterministic MLP (1024 β†’ 512 β†’ 1)
- **Discriminator**: Deterministic MLP (1024 β†’ 512 β†’ 1) with ELU (disabled in Step 2)
- **Observation Space**: 216 dimensions (joint pos/vel, root state, future reference targets)
- **Action Space**: 23 dimensions (joint position targets, scaled by 0.5)

## Training Configuration

- **Environments**: 1024 parallel
- **Rollouts**: 16 steps
- **Learning Rate**: 2.5e-5
- **Discount Factor**: 0.99
- **GAE Lambda**: 0.95
- **Mini-batches**: 2
- **Learning Epochs**: 6
- **PPO Clip**: 0.2
- **Physics dt**: 0.005s (200Hz), decimation=4, 50Hz control

### Domain Randomization (Curriculum)

Linearly interpolated from initial to target ranges over 240k iterations:

| Parameter | Initial Range | Target Range |
|---|---|---|
| Mass | (0.95, 1.05) | (0.8, 1.2) |
| Friction | (0.9, 1.1) | (0.6, 1.4) |
| PD Gains | (0.9, 1.1) | (0.7, 1.3) |
| Action Delay | (0, 1) | (0, 2) steps |

### Reward Weights

| Component | Weight |
|---|---|
| Tracking | 1.0 |
| Action Rate Penalty | 0.01 |
| Termination (height < 0.6m) | -200.0 |
| Style (discriminator) | 0.0 (disabled) |

### Tracking Metric Weights

| Component | Weight |
|---|---|
| Root Rotation | 0.4 |
| End Effector Position | 0.3 |
| Joint Position | 0.2 |
| Root Position XY | 0.1 |

## Usage

### Evaluation

```bash
# Inside Isaac Lab Docker container
cd /workspace/isaaclab
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt
```

### Record Video

```bash
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt \
    --video --video_length 500
```

### Resume Training

```bash
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/train.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 1024 --headless \
    --checkpoint /path/to/best_agent.pt
```

## Files

```
β”œβ”€β”€ best_agent.pt          # Full checkpoint β€” policy + value + discriminator + optimizer (25 MB)
β”œβ”€β”€ policy_jit.pt          # JIT-traced policy only β€” for deployment/inference (2.9 MB)
β”œβ”€β”€ params/
β”‚   β”œβ”€β”€ agent.yaml         # skrl agent configuration
β”‚   └── env.yaml           # Environment configuration
└── README.md              # This model card
```

### JIT Model

`policy_jit.pt` is a TorchScript-traced policy network (input: 216-dim observation, output: 23-dim joint targets). It runs without skrl or Isaac Lab dependencies:

```python
import torch
model = torch.jit.load("policy_jit.pt")
obs = torch.randn(1, 216)  # [batch, obs_dim]
actions = model(obs)        # [batch, 23] β€” joint position targets (scale by 0.5)
```

Use `best_agent.pt` to resume training. Use `policy_jit.pt` for deployment or sim-to-real transfer.

## Three-Step Training Strategy

This checkpoint is from **Step 2** of a three-step curriculum:

| Step | Goal | Discriminator | Status |
|---|---|---|---|
| 1. Verification | Physics check (50 easy motions) | OFF | Complete |
| **2. Physics Calibration** | **Master all 494 motions** | **OFF** | **This checkpoint** |
| 3. Style Injection | Add natural motion style | ON | Pending |

## Citation

```bibtex
@misc{pathonai2026g1imitate,
  title={G1 Humanoid Motion Imitation with AMP in Isaac Lab},
  author={PathOn-AI},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/PathOn-AI/g1-imitate-isaaclab-amp}
}
```

## License

MIT