how to finetune this smolvla with my own dataset?

by Xiaoyan97 - opened Jun 29, 2025

Jun 29, 2025

Has your model worked? In my attempt, I recorded 40 action records at the same position and trained 20,000 steps using the official fine-tuning code. However, my model was unable to complete the actions.

python lerobot/scripts/train.py   --policy.path=lerobot/smolvla_base   --dataset.repo_id=Xiaoyan97/pick_up_scissors_v1   --dataset.root=Xiaoyan97/pick_up_scissors_v1   --batch_size=64   --steps=20000   --output_dir=outputs/smolvla_v1   --job_name=my_smolvla_training   --policy.device=cuda

Is it that my training steps are insufficient or for some other reason?

kaku

Owner Jun 30, 2025

This comment has been hidden (marked as Spam)

kaku

Owner Jun 30, 2025

•

edited Jun 30, 2025

Has your model worked? In my attempt, I recorded 40 action records at the same position and trained 20,000 steps using the official fine-tuning code. However, my model was unable to complete the actions.
python lerobot/scripts/train.py   --policy.path=lerobot/smolvla_base   --dataset.repo_id=Xiaoyan97/pick_up_scissors_v1   --dataset.root=Xiaoyan97/pick_up_scissors_v1   --batch_size=64   --steps=20000   --output_dir=outputs/smolvla_v1   --job_name=my_smolvla_training   --policy.device=cuda
Is it that my training steps are insufficient or for some other reason?

@Xiaoyan97
・Training steps should be sufficient, as the policy trained for 4000 steps can already somewhat grasp the candy.
・Also, the cameras on wrist and top seems appropriate

Did you also provide language instructions during data collection and when running the policy?

Xiaoyan97

Jun 30, 2025

I used top and wrist cameras, and the details can be found in my dataset. Before you reply, I increased the number of training steps to see if it worked. Once it works, I will give you feedback.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment