How to use from the
Use from the
LeRobot library
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e .[smolvla]
# Launch finetuning on your dataset
python lerobot/scripts/train.py \
--policy.path=OpenRAL/rskill-smolvla-metaworld \
--dataset.repo_id=lerobot/svla_so101_pickplace \
--batch_size=64 \
--steps=20000 \
--output_dir=outputs/train/my_smolvla \
--job_name=my_smolvla_training \
--policy.device=cuda \
--wandb.enable=true
# Run the policy using the record function
python -m lerobot.record \
  --robot.type=so101_follower \
  --robot.port=/dev/ttyACM0 \ # <- Use your port
  --robot.id=my_blue_follower_arm \ # <- Use your robot id
  --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras
  --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording
  --dataset.repo_id=HF_USER/dataset_name \  # <- This will be the dataset name on HF Hub
  --dataset.episode_time_s=50 \
  --dataset.num_episodes=10 \
  --policy.path=OpenRAL/rskill-smolvla-metaworld

rskill-smolvla-metaworld

OpenRAL rSkill — SmolVLA (0.45 B) finetuned on the MetaWorld MT50 benchmark (50 manipulation tasks, Rethink Sawyer arm).

Quick start

from openral_rskill.loader import rSkill
pkg = rSkill.from_yaml("rskills/smolvla-metaworld/rskill.yaml")
# Single demo scene (BenchmarkScene tier, paper protocol):
openral benchmark scene --config scenes/benchmark/metaworld_push.yaml \
    --rskill rskills/smolvla-metaworld

# Full headline suites (write eval/<suite>.json with reproduced_locally=true):
openral benchmark run --suite metaworld_mt10 --rskill rskills/smolvla-metaworld
openral benchmark run --suite metaworld_mt50 --rskill rskills/smolvla-metaworld

Upstream model

Field Value
Source repo lerobot/smolvla_metaworld
Base model lerobot/smolvla_base
Paper arxiv:2506.01844 — SmolVLA: Efficient Vision-Language-Action Model
License Apache-2.0
Parameters ~450 M
Benchmark MetaWorld MT50 (50 tasks, Rethink Sawyer)
Training data lerobot/metaworld_mt50

The checkpoint is multi-task: a single set of weights covers the whole MetaWorld family, so the manifest gates it with the family entry evaluated_tasks: ["metaworld"] (covers every metaworld/<task>-v3 task id and the bare metaworld scene id). The 4-D proprio state, the single observation.images.camera1 RGB input, and the normalisation statistics are verified against the lerobot checkpoint — see docs/reference/vla_compatibility.md §3.2.

Supported robots

Robot Embodiment tag Status Notes
Rethink Sawyer (MetaWorld sim) sawyer ✓ matches Native training embodiment.
Franka Panda / SO-100 — does not match The libero / so100_follower tags are intentionally excluded; MetaWorld uses a different task distribution and camera setup.

Sensors required

Key Modality Min resolution Notes
observation.images.camera1 RGB 224 × 224 Mapped from MetaWorld's corner camera (corner2, 480×480 native). No adapter image flip — lerobot's MetaworldEnv already corrects the corner camera's 180° inversion.

Manifest summary

Field Value
name OpenRAL/rskill-smolvla-metaworld
version 0.1.0
license apache-2.0
role s1
runtime / quantization.dtype pytorch / bf16
weights_uri hf://lerobot/smolvla_metaworld
latency_budget.per_chunk_ms 150 ms
evaluated_tasks ["metaworld"] (family gate)
commercial_use_allowed true

Full schema: openral_core.RSkillManifest.

Evaluation

Locally reproduced on the MetaWorld MT50 suite via openral benchmark run --suite metaworld_mt50 (reproduced_locally: true, see eval/metaworld_mt50.json).

Suite Tasks Protocol Result
MT50 50 1 episode / seed 0 / max_steps=200 16/50 solved · avg 0.30

The MT50 run also covers all 10 MT10 tasks; a dedicated openral benchmark run --suite metaworld_mt10 reproduction can be written to eval/metaworld_mt10.json by re-running the command above. Raise n_episodes for a paper-equivalent (50-goals/task) number.

Solved at seed 0 (success_rate 1.0): assembly-v3, button-press-topdown-v3, button-press-v3, coffee-button-v3, door-close-v3, door-lock-v3, drawer-close-v3, faucet-close-v3, handle-press-side-v3, handle-press-v3, pick-place-v3, pick-place-wall-v3, plate-slide-back-side-v3, plate-slide-side-v3, push-v3.

Single-episode/seed-0 numbers are a cheap smoke of the headline set, not a paper claim — per-task success on harder tasks is seed-sensitive.

Demo scenes

Five single-task BenchmarkScene entries (website-demo tier, 500-step horizon, 50 episodes) live under scenes/benchmark/: metaworld_push.yaml, metaworld_pick_place.yaml, metaworld_button_press.yaml, metaworld_door_open.yaml, metaworld_drawer_open.yaml. The full sweeps live in benchmarks/metaworld_mt10.yaml (10 tasks) and benchmarks/metaworld_mt50.yaml (50 tasks).

License

This rSkill package (rskill.yaml, README.md, eval/metaworld_mt50.json) is Apache-2.0. The wrapped weights are also Apache-2.0. Commercial use is allowed.

See also

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for OpenRAL/rskill-smolvla-metaworld