Reinforcement Learning
stable-baselines3
robotics
mujoco
locomotion
unitree
g1
humanoid
sac
strands-robots
Eval Results (legacy)
Instructions to use cagataydev/sac-unitree-g1-mujoco with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use cagataydev/sac-unitree-g1-mujoco with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="cagataydev/sac-unitree-g1-mujoco", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - reinforcement-learning | |
| - robotics | |
| - mujoco | |
| - locomotion | |
| - unitree | |
| - g1 | |
| - humanoid | |
| - sac | |
| - stable-baselines3 | |
| - strands-robots | |
| library_name: stable-baselines3 | |
| model-index: | |
| - name: SAC-Unitree-G1-MuJoCo | |
| results: | |
| - task: | |
| type: reinforcement-learning | |
| name: Humanoid Locomotion | |
| dataset: | |
| type: custom | |
| name: MuJoCo LocomotionEnv | |
| metrics: | |
| - type: mean_reward | |
| value: 530 | |
| name: Best Mean Reward | |
| - type: mean_distance | |
| value: 2.65 | |
| name: Mean Forward Distance (m) | |
| # SAC Unitree G1 β MuJoCo Locomotion Policy | |
| A **Soft Actor-Critic (SAC)** policy trained for the Unitree G1 humanoid in MuJoCo simulation. Currently **learning to balance** β stays upright ~4 seconds and stumbles forward. | |
| Trained entirely on a MacBook (CPU, no GPU, no Isaac Gym) using [strands-robots](https://github.com/cagataycali/strands-gtc-nvidia). | |
| ## Results | |
| | Metric | Value | | |
| |--------|-------| | |
| | Algorithm | SAC (Soft Actor-Critic) | | |
| | Training steps | 1.91M | | |
| | Training time | ~60 min (MacBook M-series, CPU) | | |
| | Parallel envs | 8 | | |
| | Network | MLP [256, 256] | | |
| | Best reward | **530** | | |
| | Mean distance | **2.65m** | | |
| | Episode length | ~200/1,000 (~4 seconds upright) | | |
| | Status | Balancing + stumbling forward | | |
| ## Demo Video | |
| <video src="https://huggingface.co/cagataydev/sac-unitree-g1-mujoco/resolve/main/g1_balancing.mp4" controls autoplay loop muted></video> | |
| ## Why It's Hard | |
| The G1 has **29 DOF** vs Go2's 12. Bipedal balance is fundamentally harder β the robot must coordinate hip, knee, ankle, and torso simultaneously while maintaining a tiny support polygon. | |
| With more training (~5-10M steps, ~3 hours), it should learn to walk. | |
| ## Usage | |
| ```python | |
| from stable_baselines3 import SAC | |
| model = SAC.load("best/best_model") | |
| obs, _ = env.reset() | |
| for _ in range(1000): | |
| action, _ = model.predict(obs, deterministic=True) | |
| obs, reward, done, truncated, info = env.step(action) | |
| ``` | |
| ## Reward Function | |
| ``` | |
| reward = forward_vel Γ 5.0 # primary: move forward | |
| + alive_bonus Γ 1.0 # stay upright | |
| + upright_reward Γ 0.3 # orientation bonus | |
| - ctrl_cost Γ 0.001 # minimize energy | |
| - lateral_penalty Γ 0.3 # don't drift sideways | |
| - smoothness Γ 0.0001 # discourage jerky motion | |
| ``` | |
| ## Files | |
| - `best/best_model.zip` β Best checkpoint | |
| - `checkpoints/` β All 100K-step checkpoints | |
| - `logs/evaluations.npz` β Evaluation metrics | |
| - `g1_balancing.mp4` β Demo video | |
| ## Environment | |
| - **Simulator**: MuJoCo (via mujoco-python) | |
| - **Robot**: Unitree G1 (29 DOF) from MuJoCo Menagerie | |
| - **Observation**: joint positions, velocities, torso orientation, height (87-dim) | |
| - **Action**: joint torques (29-dim, continuous) | |
| ## License | |
| Apache-2.0 | |