Model Card for SO101-GR00T-N1 Vials V2.1

This model is a task-specific fine-tuning of NVIDIA's GR00T-N1.6-3B policy for robotic manipulation. The policy was trained to detect, grasp, and place laboratory-style vials into a yellow vial rack using an SO-101 robotic arm.

The training pipeline follows NVIDIA's Sim-to-Real workflow and combines teleoperated demonstrations collected in both Isaac Sim and the real world.

Model Details

Model Description

Data was collected following NVIDIA's official Sim-to-Real Workshop, which combines demonstrations collected in simulation (Isaac Sim) with real-world teleoperation data.

To reproduce this training workflow, an adjusted fork of the original repository was used:

Repository: https://github.com/CursedRock17/Sim-to-Real-SO-101-Workshop/tree/gb10_current

The final training dataset is available at:

Dataset: https://huggingface.co/datasets/CursedRock17/so101_teleop_vials_sim_and_real_v21

You can visualize the V3.0 dataset using LeRobot's Dataset Visualizer

Additional datasets generated during development:

Simulation Only: https://huggingface.co/datasets/CursedRock17/so101_teleop_vials_sim_dr_train_trimmed
Real Only: https://huggingface.co/datasets/CursedRock17/so101_teleop_vials_rack_real
Combined Sim + Real: https://huggingface.co/datasets/CursedRock17/so101_teleop_vials_sim_and_real
Developed by: UMD MATRIX Lab
License: Apache-2.0
Finetuned from model: https://huggingface.co/nvidia/GR00T-N1.6-3B

Uses

This model was trained as a benchmark task for evaluating robotic foundation models on a constrained pick-and-place problem.

The task consists of locating a vial, grasping it, and placing it into an empty position within a yellow rack.

Downstream Use

Potential downstream uses include:

Further sim2real experiments
Alternative rack configurations
Robotic manipulation research
Fine-tuning for laboratory automation tasks
Evaluation of dataset quality and teleoperation strategies

Bias, Risks, and Limitations

The training dataset contains demonstrations from a single robotic platform, a single object category (vials), and a highly structured environment.

Observed limitations include:

Preference for a subset of rack positions during placement.
Hovering behavior before releasing the vial.
Reduced performance outside the training distribution.
Sensitivity to camera placement and environmental configuration.

Recommendations

For best performance:

Use an SO-101 robotic arm.
Maintain camera placements similar to the training setup.
Use controlled lighting conditions.
Keep the workspace dimensions similar to those used during training.
Fine-tune further before deploying to significantly different environments.

How to Get Started with the Model

Follow NVIDIA's Sim-to-Real workflow:

https://docs.nvidia.com/learning/physical-ai/sim-to-real-so-101/latest/index.html

For deployment and evaluation, use the modified workshop repository:

https://github.com/CursedRock17/Sim-to-Real-SO-101-Workshop/tree/gb10_current

Training Details

Training Data

Training data consists of teleoperated demonstrations collected in both simulation and the real world.

Task objective:

Pick up randomly positioned vials and place them into empty holes in a yellow rack.

Data collection characteristics:

Approximately 125 simulation episodes
Additional 15 real-world teleop episodes
Domain randomization enabled during simulation
Poor-quality episodes removed before training using LeRobot Doctor
Dual-camera observations (external + gripper camera)
Consistent grasping strategy across demonstrations

Dataset:

https://huggingface.co/datasets/CursedRock17/so101_teleop_vials_sim_and_real_v21

Training Procedure

The model was fine-tuned from NVIDIA's GR00T-N1.6-3B policy using NVIDIA's standard fine-tuning pipeline.

Preprocessing

Data collection emphasized:

Smooth teleoperation trajectories
Consistent grasping motions
Removal of failed demonstrations
Low cross-episode action variance
Short task horizons

Problematic episodes were manually removed prior to training.

Desired Movement

Lift head of the arm and open gripper at the same time, as you pick up, begin a straight trajectory towards the vial. Grasp the vial just underneath the cap. Drop once directly over the hole, stabilization at this point can be tricky since you don't have accurate depth. Try not to bump the arm on anything, including the rack. Stabilize the teleoperator arm with something else, only use one of your hands. After grasping the vial, I would pull back towards the base keeping the gripper elevated, pan to the left (seen in the consistency in the visualizer of the dataset), almost arch back, then drop the vial in. Get a full grasp on the vial, otherwise you end up with a weird drop orientation.

Training Hyperparameters

Hyperparmaeters were taken from the base finetuning file:

Base Model: GR00T-N1.6-3B
Training Steps: 30,000
Action Horizon: 16
Control Rate: 30 Hz
Training Regime: bf16 mixed precision
Final Loss: < 0.01
warmup_ratio: 0.05
weight_decay: 1e-5
learning_rate: 1e-4

Speeds, Sizes, Times

Training hardware:

Dell Pro Max with NVIDIA GB10

Training durations:

5,000 Steps: 3h 43m
30,000 Steps: 21h 40m

Evaluation

Testing Data, Factors & Metrics

External Tools

To check for loss, I used Weights & Biases page To check for viable epsiodes, I used both the "Action Insights" and "Doctor" tabs of the LeRobot Dataset Visualizer To check for attention, I started using the lerobot_attention_visualizer. Note, the scripts can be found in the helper_scripts section of my repo and are still a work in progress.

Testing Data

Evaluation was performed using held-out task executions in the same physical environment.

Factors

The following factors were evaluated:

Object localization success
Grasp success
Placement success
Out-of-distribution lighting robustness

Metrics

Primary metrics:

Vials Located
Vials Grasped
Vials Successfully Placed

Results

30K Step Checkpoint

Evaluation Episodes: 10

Vials Located: 10/10
Vials Grasped: 9/10
Vials Placed: 8/10

Placement Success Rate: 80% :)

5K Step Checkpoint

Evaluation Episodes: 10

Vials Located: 7/10
Vials Grasped: 2/10
Vials Placed: 1/10

Placement Success Rate: 10% :(

OOD Lighting Evaluation

Lighting Conditions:

0%
25%
75%
100%

Evaluation Episodes: 10

Vials Located: 10/10
Vials Grasped: 9/10
Vials Placed: 7/10

Placement Success Rate: 70% :)

Summary

Increasing training duration from 5,000 to 30,000 steps resulted in a substantial increase in task success.

The model demonstrates strong performance in constrained environments and retains reasonable performance under lighting variation.

Model Examination

Qualitative observations:

The policy often hovers over the rack before releasing a vial.
The model occasionally recovers from failed grasp attempts.
Placement behavior tends to favor a subset of rack positions.

Future work may include attention visualization and policy interpretability analysis.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact Calculator:

https://mlco2.github.io/impact

Hardware Type: Dell Pro Max with GB10
Hours Used: 21.67
Carbon Emitted: Not estimated

Technical Specifications

Model Architecture and Objective

The model is based on NVIDIA's GR00T-N1.6-3B Vision-Language-Action architecture.

Objective:

Locate vial
Grasp vial
Transport vial
Place vial into rack

Compute Infrastructure

Hardware

Dell Pro Max with NVIDIA GB10
SO-101 Robot Arm

Software

Isaac Sim 5.1.0
LeRobot v0.4.3 (Actual commit: e670ac5daf9b76)
Python 3.10

Model Card Authors

Lucas Wendland

University of Maryland MATRIX Lab

Model Card Contact

Please open an issue on the Hugging Face Hub repository for questions, bug reports, or reproduction issues.

Downloads last month: 56

Safetensors

Model size

3B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for CursedRock17/so101_teleop_vials_sim_and_real_finetune

Base model

nvidia/GR00T-N1.6-3B

Finetuned

(25)

this model

CursedRock17
/

so101_teleop_vials_sim_and_real_finetune

Model Card for SO101-GR00T-N1 Vials V2.1

Model Details

Model Description

Uses

Downstream Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing

Desired Movement

Training Hyperparameters

Speeds, Sizes, Times

Evaluation

Testing Data, Factors & Metrics

External Tools

Testing Data

Factors

Metrics

Results

30K Step Checkpoint

5K Step Checkpoint

OOD Lighting Evaluation

Summary

Model Examination

Environmental Impact

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Model Card Authors

Model Card Contact

Model tree for CursedRock17/so101_teleop_vials_sim_and_real_finetune

Dataset used to train CursedRock17/so101_teleop_vials_sim_and_real_finetune