OpenArm RL best policy

Best PPO teacher policy chain from the OpenArm MuJoCo cube-in-box campaign.

This is not a single monolithic checkpoint. The best teacher is a chained controller:

  1. Grasp leg: checkpoints/grasp_hover_v3_vm_ppo_1000000_steps.zip
  2. Place leg: checkpoints/place_fixed_v5_ppo_1000000_steps.zip
  3. Runtime gate/controller: code/eval_chained_gated.py

Final validated numbers, from BEST_POLICY.md:

  • In-box: 43/50 = 86.0%
  • Gentle: 42/50 = 84.0%
  • Eval seeds: 1000-1049
  • Handover: switch after 10 consecutive grasped/lifted steps (STREAK=10, ZTHR=0.52)
  • Release gate: keep gripper closed until cube is above box footprint and low horizontal speed

See BEST_POLICY.md for exact reproduction command, known failures, and history.

Source working tree at upload time: /home/nvidia/.openclaw/workspace/projects/openarmmujoco.

Downloads last month
44
Video Preview
loading