OpenArm RL best policy

Best PPO teacher policy chain from the OpenArm MuJoCo cube-in-box campaign.

This is not a single monolithic checkpoint. The best teacher is a chained controller:

Final validated numbers, from BEST_POLICY.md:

In-box: 43/50 = 86.0%
Gentle: 42/50 = 84.0%
Eval seeds: 1000-1049
Handover: switch after 10 consecutive grasped/lifted steps (STREAK=10, ZTHR=0.52)
Release gate: keep gripper closed until cube is above box footprint and low horizontal speed

See BEST_POLICY.md for exact reproduction command, known failures, and history.

Source working tree at upload time: /home/nvidia/.openclaw/workspace/projects/openarmmujoco.

Video Preview