Instructions to use v1tavitavita/lehome-residual-v4-global with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use v1tavitavita/lehome-residual-v4-global with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=v1tavitavita/lehome-residual-v4-global \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=v1tavitavita/lehome-residual-v4-global - Notebooks
- Google Colab
- Kaggle
| datasets: | |
| - lehome/dataset_challenge | |
| - lehome/dataset_challenge_merged | |
| base_model: | |
| - HuggingFaceTB/SmolVLM2-500M-Video-Instruct | |
| pipeline_tag: robotics | |
| tags: | |
| - robotics | |
| - lerobot | |
| - lehome-challenge | |
| - smolvla | |
| - residual-rl | |
| # LeHome Challenge 2026 β Submission | |
| **Method**: SmolVLA (frozen four_types 30K backbone) + state-only residual MLP, trained with sparse-reward residual RL on 40 Seen garments. Single model, deterministic inference. | |
| ## File Layout | |
| ``` | |
| submission_v4_global/ | |
| βββ README.md # this file | |
| βββ residual_v4_global.py # the policy module (drop into eval_policy/) | |
| βββ submission_models/ | |
| βββ vla_backbone/ # SmolVLA four_types 30K (LeRobot pretrained_model, ~865M) | |
| βββ residual_averaged.pt # 40-garment averaged residual MLP (~286K) | |
| βββ dataset_meta/ # LeRobot dataset metadata (stats.json etc.) | |
| βββ hf_cache/ # bundled SmolVLM2 weights for offline VLM load (~1.9G) | |
| βββ hub/models--HuggingFaceTB--SmolVLM2-500M-Video-Instruct/ | |
| βββ snapshots/<commit>/ # tokenizer + processor + model.safetensors | |
| βββ refs/main # commit hash file | |
| ``` | |
| The wrapper detects `submission_models/hf_cache/` next to `vla_backbone/` and | |
| sets `HF_HOME` to it during `__init__`, so the SmolVLM2 backbone load | |
| (`vlm_model_name = "HuggingFaceTB/SmolVLM2-500M-Video-Instruct"`, | |
| `load_vlm_weights = true`) resolves entirely offline against the bundled cache. | |
| ## How To Run (evaluator side) | |
| 1. Drop `residual_v4_global.py` into `/opt/lehome-challenge/scripts/eval_policy/`. | |
| 2. Add to `scripts/eval_policy/__init__.py`: | |
| ```python | |
| from .residual_v4_global import ResidualV4GlobalPolicy | |
| ``` | |
| 3. Set environment variables: | |
| ```bash | |
| export LEHOME_VLA_POLICY_PATH=<path to submission_models/vla_backbone> | |
| export LEHOME_VLA_DATASET_ROOT=<path to submission_models/dataset_meta or any LeRobot dataset> | |
| export LEHOME_RESIDUAL_CHECKPOINT=<path to submission_models/residual_averaged.pt> | |
| export LEHOME_RESIDUAL_SCALE=0.03 | |
| ``` | |
| The wrapper sets `HF_HOME` automatically to the bundled `hf_cache/` when | |
| it sees `LEHOME_VLA_POLICY_PATH`, so no network access is required even on | |
| a fully offline evaluator. | |
| **Belt-and-suspenders** β if the evaluator's launcher imports | |
| `huggingface_hub` before our wrapper module loads (rare but possible), | |
| the redirect may be too late. To be safe, set HF env vars **before** | |
| invoking `python -m scripts.eval`: | |
| ```bash | |
| export HF_HOME="$LEHOME_VLA_POLICY_PATH/../hf_cache" | |
| export HF_HUB_CACHE="$HF_HOME/hub" | |
| export HF_HUB_OFFLINE=1 | |
| export TRANSFORMERS_OFFLINE=1 | |
| ``` | |
| 4. Invoke evaluator: | |
| ```bash | |
| python -m scripts.eval \ | |
| --policy_type residual_v4_global \ | |
| --policy_path "$LEHOME_VLA_POLICY_PATH" \ | |
| --dataset_root "$LEHOME_VLA_DATASET_ROOT" \ | |
| --garment_type <top_long|top_short|pant_long|pant_short> \ | |
| --num_episodes 5 --max_steps 600 \ | |
| --enable_cameras --device cpu --headless | |
| ``` | |
| ## Method Summary | |
| - **Backbone**: SmolVLA, jointly-trained on 4 garment types for 30K steps. Frozen during residual RL. | |
| - **Residual**: small state-only MLP (state_dim=12 β 256 β 256 β action_dim=12, 3 Linear+ReLU layers). | |
| - **Final action**: `clip(base_action + 0.03 * residual_mlp(state))`. | |
| - **Training signal**: sparse reward (1 if folding success at episode end, else 0). | |
| - **Training data**: 40 Seen garments (10 per type Γ 4 types), 30 episodes per garment, on-policy PPO updates. | |
| - **Aggregation**: weights averaged across 40 per-garment training runs to get a single global residual. | |
| - **Inference**: deterministic β no exploration noise, no online updates. | |
| ## Key Hyperparameters | |
| | Parameter | Value | | |
| |---|---| | |
| | residual hidden dims | (256, 256) | | |
| | residual scale | 0.03 | | |
| | state_dim | 12 | | |
| | action_dim | 12 | | |
| | training reward | sparse (1 on success) | | |
| | episodes per garment | 30 | | |
| | training garments | 40 (10 Seen Γ 4 types) | | |
| ## Evaluation Results | |
| Run on `lehome3 / 120.209.70.195:30239`, 4Γ NVIDIA L40S, 4-GPU parallel. | |
| 48 garments Γ 5 episodes = 240 episodes total. | |
| | Metric | Value | | |
| |---|---| | |
| | **Total** | **150/240 = 62.50%** | | |
| | Seen (40 garments Γ 5 ep) | 136/200 = 68.00% | | |
| | Unseen (8 garments Γ 5 ep) | 14/40 = 35.00% | | |
| | Top_Long | 43/60 = 71.67% (seen 74.0%, unseen 60.0%) | | |
| | Top_Short | 25/60 = 41.67% (seen 48.0%, unseen 10.0%) | | |
| | Pant_Long | 28/60 = 46.67% (seen 54.0%, unseen 10.0%) | | |
| | **Pant_Short** | **54/60 = 90.00%** (seen 96.0%, unseen 60.0%) | | |
| ### Reference baselines | |
| | Method | Total | Notes | | |
| |---|---|---| | |
| | **This submission** (v4 global, deterministic) | **62.50%** | 240 ep, 4 types, single model | | |
| | SmolVLA four_types 30K (no residual) | 60.42% | 96 ep, baseline backbone alone | | |
| | Historical v4 global with `explore=True` | 58.75% | 240 ep, non-deterministic | | |
| | ACT (single-type, top_long only) | 87.50% | 24 ep, not comparable across types | | |
| The +1.71pp gain over the SmolVLA backbone alone confirms the residual carries useful signal. | |
| The +3.75pp gain over the historical noisy run confirms determinism matters. | |
| ### Notes on the run | |
| - One garment (`Top_Long_Seen_9`) initially failed in the main parallel sweep with an Isaac Sim | |
| `TiledCamera._annotators` AttributeError (unrelated to the policy). It was retried in a | |
| fresh single-garment process and produced 4/5 success β that retry is included in the | |
| `150/240` figure above. | |
| - Episode-level data per garment is in `FINAL_SUMMARY.json` and per-garment stdout logs | |
| are in `eval_raw/`. | |
| ## Artefact Hashes | |
| | Artefact | sha256 | | |
| |---|---| | |
| | residual_averaged.pt | `9d695e278b4361509ac7e35f7d66eb251ec7e7f1f7c53878d453ef2b8aa0ce74` | | |
| | vla_backbone/model.safetensors | `7ff3915571622bf7530e9ba35540abf5c14f62d8c6a57491664b65a23869e6bc` | | |
| ## Reproducibility | |
| Inference is fully deterministic. Two runs with the same backbone, residual checkpoint, and `seed=42` (default) yield identical action sequences. Variability across runs comes only from Isaac Sim particle initialization (seeded by `--seed`). | |
| ## Notes | |
| - The `residual_averaged.pt` follows the format: | |
| ```python | |
| torch.save({ | |
| "state_dim": 12, | |
| "action_dim": 12, | |
| "hidden_dims": (256, 256), | |
| "model_state_dict": <state-only MLP weights>, | |
| }, path) | |
| ``` | |
| - This submission is a **single model** (one residual checkpoint) handling all four garment types β not a per-type specialist ensemble. | |
| - Inference path: `LeRobotPolicy.select_action(observation)` β adds `0.03 * residual_mlp(observation['observation.state'])`. | |
| ## Contact | |
| vita / realvitacai@gmail.com | |
| klein / kleinlau17@gmail.com |