File size: 2,860 Bytes
94c56bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c33a9e2
94c56bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
tags:
  - reinforcement-learning
  - robotics
  - mujoco
  - locomotion
  - unitree
  - g1
  - humanoid
  - sac
  - stable-baselines3
  - strands-robots
library_name: stable-baselines3
model-index:
  - name: SAC-Unitree-G1-MuJoCo
    results:
      - task:
          type: reinforcement-learning
          name: Humanoid Locomotion
        dataset:
          type: custom
          name: MuJoCo LocomotionEnv
        metrics:
          - type: mean_reward
            value: 530
            name: Best Mean Reward
          - type: mean_distance
            value: 2.65
            name: Mean Forward Distance (m)
---

# SAC Unitree G1 — MuJoCo Locomotion Policy

A **Soft Actor-Critic (SAC)** policy trained for the Unitree G1 humanoid in MuJoCo simulation. Currently **learning to balance** — stays upright ~4 seconds and stumbles forward.

Trained entirely on a MacBook (CPU, no GPU, no Isaac Gym) using [strands-robots](https://github.com/cagataycali/strands-gtc-nvidia).

## Results

| Metric | Value |
|--------|-------|
| Algorithm | SAC (Soft Actor-Critic) |
| Training steps | 1.91M |
| Training time | ~60 min (MacBook M-series, CPU) |
| Parallel envs | 8 |
| Network | MLP [256, 256] |
| Best reward | **530** |
| Mean distance | **2.65m** |
| Episode length | ~200/1,000 (~4 seconds upright) |
| Status | Balancing + stumbling forward |

## Demo Video

<video src="https://huggingface.co/cagataydev/sac-unitree-g1-mujoco/resolve/main/g1_balancing.mp4" controls autoplay loop muted></video>

## Why It's Hard

The G1 has **29 DOF** vs Go2's 12. Bipedal balance is fundamentally harder — the robot must coordinate hip, knee, ankle, and torso simultaneously while maintaining a tiny support polygon.

With more training (~5-10M steps, ~3 hours), it should learn to walk.

## Usage

```python
from stable_baselines3 import SAC

model = SAC.load("best/best_model")

obs, _ = env.reset()
for _ in range(1000):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)
```

## Reward Function

```
reward = forward_vel × 5.0       # primary: move forward
       + alive_bonus × 1.0       # stay upright
       + upright_reward × 0.3    # orientation bonus
       - ctrl_cost × 0.001       # minimize energy
       - lateral_penalty × 0.3   # don't drift sideways
       - smoothness × 0.0001     # discourage jerky motion
```

## Files

- `best/best_model.zip` — Best checkpoint
- `checkpoints/` — All 100K-step checkpoints
- `logs/evaluations.npz` — Evaluation metrics
- `g1_balancing.mp4` — Demo video

## Environment

- **Simulator**: MuJoCo (via mujoco-python)
- **Robot**: Unitree G1 (29 DOF) from MuJoCo Menagerie
- **Observation**: joint positions, velocities, torso orientation, height (87-dim)
- **Action**: joint torques (29-dim, continuous)

## License

Apache-2.0