m3 commited on
Commit
f3dbc05
·
verified ·
1 Parent(s): 7d1a128

Add English README with intro, usage, and training data link

Browse files
Files changed (1) hide show
  1. README.md +89 -13
README.md CHANGED
@@ -1,19 +1,95 @@
1
  ---
2
- license: bsd-3-clause
3
  tags:
4
- - reinforcement-learning
5
- - isaac-lab
6
- - unitree-go2
7
- - unitree-z1
8
- - locomotion
 
 
 
 
9
  ---
10
 
11
- # Go2+Z1 Walking Policy (RSL-RL PPO, 1500 iters)
12
 
13
- Trained in NVIDIA Isaac Sim 6.0.0.0 + Isaac Lab develop with PPO on
14
- `Isaac-Velocity-Flat-Go2Z1-v0`. Robot: Unitree Go2 quadruped with a Z1 6-DOF
15
- arm folded on its back (startFlat pose).
16
 
17
- - 4096 parallel envs
18
- - 1500 iters, ~33 min wall time on RTX Pro 6000 Blackwell (96 GB)
19
- - Final mean reward 5.4, success rate ~100% at iter 700+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
+ - reinforcement-learning
5
+ - robotics
6
+ - quadruped
7
+ - locomotion
8
+ - isaac-lab
9
+ - ppo
10
+ - rsl-rl
11
+ library_name: rsl-rl
12
+ pipeline_tag: reinforcement-learning
13
  ---
14
 
15
+ # Go2+Z1 Walking Policy (V1, state-only PPO)
16
 
17
+ PPO walking policy for the **Unitree Go2 + Z1** composite robot (12 leg DOFs + 6 arm DOFs = 18 DOF), trained in Isaac Lab on flat ground while holding the Z1 arm folded on the back.
 
 
18
 
19
+ ## Highlights
20
+
21
+ - Backbone: rsl-rl `OnPolicyRunner` actor-critic (MLP 512-256-128, ELU)
22
+ - Task: `Isaac-Velocity-Flat-Go2Z1-v0` (forward/lateral linear vel + small yaw rate commands)
23
+ - 4096 parallel envs × 1500 PPO iters on a single RTX PRO 6000 Blackwell (96 GB)
24
+ - Z1 arm forced to remain in the folded "startFlat" pose during locomotion
25
+ - Verified: walks 10 m inside the real `Simple_Warehouse/warehouse.usd` (3/3 episodes)
26
+
27
+ ## Files
28
+
29
+ - `model_*.pt` — checkpoint dictionaries with `actor_state_dict` / `critic_state_dict`
30
+
31
+ ## Architecture
32
+
33
+ ```
34
+ Actor MLP : Linear(obs→512) ELU Linear(512→256) ELU Linear(256→128) ELU Linear(128→12)
35
+ Critic MLP: same shape, single value head
36
+ Inputs : base lin_vel + ang_vel + projected_gravity + commands + joint_pos + joint_vel + last_action
37
+ Outputs : 12 leg joint position deltas (Go2 hip/thigh/calf × 4)
38
+ ```
39
+
40
+ ## Usage
41
+
42
+ ```python
43
+ import torch, torch.nn as nn
44
+
45
+ # Load checkpoint
46
+ state = torch.load("model_1499.pt", map_location="cuda:0", weights_only=False)
47
+ sd = state["actor_state_dict"]
48
+
49
+ # Rebuild actor (3 hidden layers + output)
50
+ h, obs_dim = sd["mlp.0.weight"].shape[0], sd["mlp.0.weight"].shape[1]
51
+ act_dim = sd["mlp.6.weight"].shape[0]
52
+ actor = nn.Sequential(
53
+ nn.Linear(obs_dim, h), nn.ELU(),
54
+ nn.Linear(h, h), nn.ELU(),
55
+ nn.Linear(h, h), nn.ELU(),
56
+ nn.Linear(h, act_dim),
57
+ ).cuda().eval()
58
+ actor.load_state_dict({k.replace("mlp.", ""): v for k, v in sd.items() if k.startswith("mlp.")})
59
+
60
+ # obs comes from Isaac Lab's Isaac-Velocity-Flat-Go2Z1-Play-v0 env
61
+ with torch.inference_mode():
62
+ action = actor(obs)
63
+ ```
64
+
65
+ For end-to-end inference inside Isaac Sim, see [`stage4_joint_eval/walk_in_real_warehouse.py`](https://github.com/aws300/go2_z1_warehouse/blob/main/go2_z1_warehouse/stage4_joint_eval/walk_in_real_warehouse.py).
66
+
67
+ ## Training data
68
+
69
+ This is an **on-policy RL** model — no offline dataset is used. The policy is trained from scratch by interacting with the simulator. The full task definition (rewards, observations, terminations) lives in:
70
+
71
+ - Repo: <https://github.com/aws300/go2_z1_warehouse>
72
+ - Task config: `go2_z1_warehouse/stage1_walking/{flat_env_cfg.py, rough_env_cfg.py}`
73
+
74
+ ## Eval results
75
+
76
+ | Scenario | Episodes | Success | Mean traveled |
77
+ |---|---|---|---|
78
+ | Flat plane | 10 | 100 % | — |
79
+ | 4 cuboid shelves | 5 | 80 % | 11.21 m |
80
+ | Real `warehouse.usd` | 3 | 100 % | 10.00 m |
81
+
82
+ ## Citation
83
+
84
+ ```bibtex
85
+ @misc{go2z1-walking-v1,
86
+ title = {Go2+Z1 Warehouse Walking Policy V1 (state-only PPO)},
87
+ author = {m3},
88
+ year = {2026},
89
+ url = {https://huggingface.co/m3/go2z1-walking-rsl-rl-v1}
90
+ }
91
+ ```
92
+
93
+ ## Successor
94
+
95
+ - V2 (rotation-capable + heading-tracking): [m3/go2z1-walking-rsl-rl-v2](https://huggingface.co/m3/go2z1-walking-rsl-rl-v2)