Instructions to use OpenRAL/rskill-smolvla-metaworld with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use OpenRAL/rskill-smolvla-metaworld with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=OpenRAL/rskill-smolvla-metaworld \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=OpenRAL/rskill-smolvla-metaworld - Notebooks
- Google Colab
- Kaggle
# Launch finetuning on your dataset
python lerobot/scripts/train.py \
--policy.path=OpenRAL/rskill-smolvla-metaworld \
--dataset.repo_id=lerobot/svla_so101_pickplace \
--batch_size=64 \
--steps=20000 \
--output_dir=outputs/train/my_smolvla \
--job_name=my_smolvla_training \
--policy.device=cuda \
--wandb.enable=true# Run the policy using the record function
python -m lerobot.record \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM0 \ # <- Use your port
--robot.id=my_blue_follower_arm \ # <- Use your robot id
--robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras
--dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording
--dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub
--dataset.episode_time_s=50 \
--dataset.num_episodes=10 \
--policy.path=OpenRAL/rskill-smolvla-metaworldrskill-smolvla-metaworld
OpenRAL rSkill — SmolVLA (0.45 B) finetuned on the MetaWorld MT50 benchmark (50 manipulation tasks, Rethink Sawyer arm).
Quick start
from openral_rskill.loader import rSkill
pkg = rSkill.from_yaml("rskills/smolvla-metaworld/rskill.yaml")
# Single demo scene (BenchmarkScene tier, paper protocol):
openral benchmark scene --config scenes/benchmark/metaworld_push.yaml \
--rskill rskills/smolvla-metaworld
# Full headline suites (write eval/<suite>.json with reproduced_locally=true):
openral benchmark run --suite metaworld_mt10 --rskill rskills/smolvla-metaworld
openral benchmark run --suite metaworld_mt50 --rskill rskills/smolvla-metaworld
Upstream model
| Field | Value |
|---|---|
| Source repo | lerobot/smolvla_metaworld |
| Base model | lerobot/smolvla_base |
| Paper | arxiv:2506.01844 — SmolVLA: Efficient Vision-Language-Action Model |
| License | Apache-2.0 |
| Parameters | ~450 M |
| Benchmark | MetaWorld MT50 (50 tasks, Rethink Sawyer) |
| Training data | lerobot/metaworld_mt50 |
The checkpoint is multi-task: a single set of weights covers the whole
MetaWorld family, so the manifest gates it with the family entry
evaluated_tasks: ["metaworld"] (covers every metaworld/<task>-v3 task id
and the bare metaworld scene id). The 4-D proprio state, the single
observation.images.camera1 RGB input, and the normalisation statistics are
verified against the lerobot checkpoint — see docs/reference/vla_compatibility.md §3.2.
Supported robots
| Robot | Embodiment tag | Status | Notes |
|---|---|---|---|
| Rethink Sawyer (MetaWorld sim) | sawyer |
✓ matches | Native training embodiment. |
| Franka Panda / SO-100 | — | does not match | The libero / so100_follower tags are intentionally excluded; MetaWorld uses a different task distribution and camera setup. |
Sensors required
| Key | Modality | Min resolution | Notes |
|---|---|---|---|
observation.images.camera1 |
RGB | 224 × 224 | Mapped from MetaWorld's corner camera (corner2, 480×480 native). No adapter image flip — lerobot's MetaworldEnv already corrects the corner camera's 180° inversion. |
Manifest summary
| Field | Value |
|---|---|
name |
OpenRAL/rskill-smolvla-metaworld |
version |
0.1.0 |
license |
apache-2.0 |
role |
s1 |
runtime / quantization.dtype |
pytorch / bf16 |
weights_uri |
hf://lerobot/smolvla_metaworld |
latency_budget.per_chunk_ms |
150 ms |
evaluated_tasks |
["metaworld"] (family gate) |
commercial_use_allowed |
true |
Full schema: openral_core.RSkillManifest.
Evaluation
Locally reproduced on the MetaWorld MT50 suite via
openral benchmark run --suite metaworld_mt50 (reproduced_locally: true,
see eval/metaworld_mt50.json).
| Suite | Tasks | Protocol | Result |
|---|---|---|---|
| MT50 | 50 | 1 episode / seed 0 / max_steps=200 |
16/50 solved · avg 0.30 |
The MT50 run also covers all 10 MT10 tasks; a dedicated
openral benchmark run --suite metaworld_mt10 reproduction can be written to
eval/metaworld_mt10.json by re-running the command above. Raise n_episodes
for a paper-equivalent (50-goals/task) number.
Solved at seed 0 (success_rate 1.0): assembly-v3, button-press-topdown-v3,
button-press-v3, coffee-button-v3, door-close-v3, door-lock-v3,
drawer-close-v3, faucet-close-v3, handle-press-side-v3, handle-press-v3,
pick-place-v3, pick-place-wall-v3, plate-slide-back-side-v3,
plate-slide-side-v3, push-v3.
Single-episode/seed-0 numbers are a cheap smoke of the headline set, not a paper claim — per-task success on harder tasks is seed-sensitive.
Demo scenes
Five single-task BenchmarkScene entries (website-demo tier, 500-step horizon,
50 episodes) live under scenes/benchmark/:
metaworld_push.yaml, metaworld_pick_place.yaml, metaworld_button_press.yaml,
metaworld_door_open.yaml, metaworld_drawer_open.yaml. The full sweeps live in
benchmarks/metaworld_mt10.yaml (10 tasks) and benchmarks/metaworld_mt50.yaml
(50 tasks).
License
This rSkill package (rskill.yaml, README.md,
eval/metaworld_mt50.json) is Apache-2.0. The wrapped weights are also
Apache-2.0. Commercial use is allowed.
See also
rskills/smolvla-libero/README.md— gold-standard LIBERO finetune (locally verified).robots/sawyer/README.md.docs/reference/vla_compatibility.md§3.2.
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]