Reinforcement Learning
stable-baselines3
robotics
mujoco
locomotion
unitree
g1
humanoid
sac
strands-robots
Eval Results (legacy)
Instructions to use cagataydev/sac-unitree-g1-mujoco with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use cagataydev/sac-unitree-g1-mujoco with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="cagataydev/sac-unitree-g1-mujoco", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
File size: 2,860 Bytes
94c56bc c33a9e2 94c56bc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | ---
tags:
- reinforcement-learning
- robotics
- mujoco
- locomotion
- unitree
- g1
- humanoid
- sac
- stable-baselines3
- strands-robots
library_name: stable-baselines3
model-index:
- name: SAC-Unitree-G1-MuJoCo
results:
- task:
type: reinforcement-learning
name: Humanoid Locomotion
dataset:
type: custom
name: MuJoCo LocomotionEnv
metrics:
- type: mean_reward
value: 530
name: Best Mean Reward
- type: mean_distance
value: 2.65
name: Mean Forward Distance (m)
---
# SAC Unitree G1 — MuJoCo Locomotion Policy
A **Soft Actor-Critic (SAC)** policy trained for the Unitree G1 humanoid in MuJoCo simulation. Currently **learning to balance** — stays upright ~4 seconds and stumbles forward.
Trained entirely on a MacBook (CPU, no GPU, no Isaac Gym) using [strands-robots](https://github.com/cagataycali/strands-gtc-nvidia).
## Results
| Metric | Value |
|--------|-------|
| Algorithm | SAC (Soft Actor-Critic) |
| Training steps | 1.91M |
| Training time | ~60 min (MacBook M-series, CPU) |
| Parallel envs | 8 |
| Network | MLP [256, 256] |
| Best reward | **530** |
| Mean distance | **2.65m** |
| Episode length | ~200/1,000 (~4 seconds upright) |
| Status | Balancing + stumbling forward |
## Demo Video
<video src="https://huggingface.co/cagataydev/sac-unitree-g1-mujoco/resolve/main/g1_balancing.mp4" controls autoplay loop muted></video>
## Why It's Hard
The G1 has **29 DOF** vs Go2's 12. Bipedal balance is fundamentally harder — the robot must coordinate hip, knee, ankle, and torso simultaneously while maintaining a tiny support polygon.
With more training (~5-10M steps, ~3 hours), it should learn to walk.
## Usage
```python
from stable_baselines3 import SAC
model = SAC.load("best/best_model")
obs, _ = env.reset()
for _ in range(1000):
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
```
## Reward Function
```
reward = forward_vel × 5.0 # primary: move forward
+ alive_bonus × 1.0 # stay upright
+ upright_reward × 0.3 # orientation bonus
- ctrl_cost × 0.001 # minimize energy
- lateral_penalty × 0.3 # don't drift sideways
- smoothness × 0.0001 # discourage jerky motion
```
## Files
- `best/best_model.zip` — Best checkpoint
- `checkpoints/` — All 100K-step checkpoints
- `logs/evaluations.npz` — Evaluation metrics
- `g1_balancing.mp4` — Demo video
## Environment
- **Simulator**: MuJoCo (via mujoco-python)
- **Robot**: Unitree G1 (29 DOF) from MuJoCo Menagerie
- **Observation**: joint positions, velocities, torso orientation, height (87-dim)
- **Action**: joint torques (29-dim, continuous)
## License
Apache-2.0
|