--- license: mit tags: - reinforcement-learning - robotics - isaac-lab - amp - humanoid - motion-imitation - unitree-g1 library_name: skrl pipeline_tag: reinforcement-learning --- # G1 Humanoid Motion Imitation (AMP) - Step 2 Physics Calibration Reinforcement learning policy for the **Unitree G1 humanoid robot** to imitate 494 human motion capture sequences from the AMASS dataset, trained using **Adversarial Motion Priors (AMP)** in **Isaac Lab**. This is the **Step 2 (Physics Calibration)** checkpoint — pure tracking mode with curriculum domain randomization, before style injection. ## Model Details | Parameter | Value | |---|---| | **Robot** | Unitree G1 (37 DOFs, 23 active via DOF mask) | | **Algorithm** | AMP (Adversarial Motion Priors) via skrl | | **Framework** | Isaac Lab 2.3.0 / Isaac Sim 5.1.0 | | **Motion Dataset** | 494 AMASS motions (~113 min, 196,642 frames) | | **Training Mode** | Pure tracking (tracking=1.0, style=0.0, discriminator OFF) | | **Training Hardware** | NVIDIA RTX 4080 SUPER (16GB VRAM) | | **Training Duration** | ~5.5 days (~12.4M timesteps total, best at 3.85M) | ## Performance (Best Checkpoint) | Metric | Value | |---|---| | **Total Reward (mean)** | 160.29 | | **Total Reward (max)** | 285.12 | | **Episode Length (mean)** | 396.9 / 400 steps | | **Best Checkpoint Step** | 3,850,240 | ### Tracking Reward The tracking reward is an exponential kernel (`exp(-error / 0.2)`) over a weighted combination of pose errors. Range is 0.0 (poor) to 1.0 (perfect match). | Metric | At Best Checkpoint | Peak (all time) | |---|---|---| | **Instantaneous Reward (mean)** | 0.400 | 0.402 (step 5.5M) | | **Instantaneous Reward (max)** | 0.596 | 0.673 (step 8.8M) | | **Tracking Reward (mean)** | 0.317 | 0.576 (step 0 — easy motions) | | **Tracking Reward (max)** | 0.585 | 0.623 (step 12M) | The instantaneous reward mean of ~0.40 indicates the policy tracks the average motion with reasonable fidelity across all 494 diverse motions. The max of ~0.60 shows strong tracking on easier motions. ## Architecture - **Policy**: Gaussian MLP (1024 → 512 → 23), fixed log_std = -2.9 - **Value**: Deterministic MLP (1024 → 512 → 1) - **Discriminator**: Deterministic MLP (1024 → 512 → 1) with ELU (disabled in Step 2) - **Observation Space**: 216 dimensions (joint pos/vel, root state, future reference targets) - **Action Space**: 23 dimensions (joint position targets, scaled by 0.5) ## Training Configuration - **Environments**: 1024 parallel - **Rollouts**: 16 steps - **Learning Rate**: 2.5e-5 - **Discount Factor**: 0.99 - **GAE Lambda**: 0.95 - **Mini-batches**: 2 - **Learning Epochs**: 6 - **PPO Clip**: 0.2 - **Physics dt**: 0.005s (200Hz), decimation=4, 50Hz control ### Domain Randomization (Curriculum) Linearly interpolated from initial to target ranges over 240k iterations: | Parameter | Initial Range | Target Range | |---|---|---| | Mass | (0.95, 1.05) | (0.8, 1.2) | | Friction | (0.9, 1.1) | (0.6, 1.4) | | PD Gains | (0.9, 1.1) | (0.7, 1.3) | | Action Delay | (0, 1) | (0, 2) steps | ### Reward Weights | Component | Weight | |---|---| | Tracking | 1.0 | | Action Rate Penalty | 0.01 | | Termination (height < 0.6m) | -200.0 | | Style (discriminator) | 0.0 (disabled) | ### Tracking Metric Weights | Component | Weight | |---|---| | Root Rotation | 0.4 | | End Effector Position | 0.3 | | Joint Position | 0.2 | | Root Position XY | 0.1 | ## Usage ### Evaluation ```bash # Inside Isaac Lab Docker container cd /workspace/isaaclab /isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \ --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \ --algorithm AMP --num_envs 16 \ --checkpoint /path/to/best_agent.pt ``` ### Record Video ```bash /isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \ --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \ --algorithm AMP --num_envs 16 \ --checkpoint /path/to/best_agent.pt \ --video --video_length 500 ``` ### Resume Training ```bash /isaac-sim/python.sh scripts/reinforcement_learning/skrl/train.py \ --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \ --algorithm AMP --num_envs 1024 --headless \ --checkpoint /path/to/best_agent.pt ``` ## Files ``` ├── best_agent.pt # Full checkpoint — policy + value + discriminator + optimizer (25 MB) ├── policy_jit.pt # JIT-traced policy only — for deployment/inference (2.9 MB) ├── params/ │ ├── agent.yaml # skrl agent configuration │ └── env.yaml # Environment configuration └── README.md # This model card ``` ### JIT Model `policy_jit.pt` is a TorchScript-traced policy network (input: 216-dim observation, output: 23-dim joint targets). It runs without skrl or Isaac Lab dependencies: ```python import torch model = torch.jit.load("policy_jit.pt") obs = torch.randn(1, 216) # [batch, obs_dim] actions = model(obs) # [batch, 23] — joint position targets (scale by 0.5) ``` Use `best_agent.pt` to resume training. Use `policy_jit.pt` for deployment or sim-to-real transfer. ## Three-Step Training Strategy This checkpoint is from **Step 2** of a three-step curriculum: | Step | Goal | Discriminator | Status | |---|---|---|---| | 1. Verification | Physics check (50 easy motions) | OFF | Complete | | **2. Physics Calibration** | **Master all 494 motions** | **OFF** | **This checkpoint** | | 3. Style Injection | Add natural motion style | ON | Pending | ## Citation ```bibtex @misc{pathonai2026g1imitate, title={G1 Humanoid Motion Imitation with AMP in Isaac Lab}, author={PathOn-AI}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/PathOn-AI/g1-imitate-isaaclab-amp} } ``` ## License MIT