Instructions to use v1tavitavita/lehome-residual-v4-global with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use v1tavitavita/lehome-residual-v4-global with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=v1tavitavita/lehome-residual-v4-global \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=v1tavitavita/lehome-residual-v4-global - Notebooks
- Google Colab
- Kaggle
LeHome Challenge 2026 — Submission
Method: SmolVLA (frozen four_types 30K backbone) + state-only residual MLP, trained with sparse-reward residual RL on 40 Seen garments. Single model, deterministic inference.
File Layout
submission_v4_global/
├── README.md # this file
├── residual_v4_global.py # the policy module (drop into eval_policy/)
└── submission_models/
├── vla_backbone/ # SmolVLA four_types 30K (LeRobot pretrained_model, ~865M)
├── residual_averaged.pt # 40-garment averaged residual MLP (~286K)
├── dataset_meta/ # LeRobot dataset metadata (stats.json etc.)
└── hf_cache/ # bundled SmolVLM2 weights for offline VLM load (~1.9G)
└── hub/models--HuggingFaceTB--SmolVLM2-500M-Video-Instruct/
├── snapshots/<commit>/ # tokenizer + processor + model.safetensors
└── refs/main # commit hash file
The wrapper detects submission_models/hf_cache/ next to vla_backbone/ and
sets HF_HOME to it during __init__, so the SmolVLM2 backbone load
(vlm_model_name = "HuggingFaceTB/SmolVLM2-500M-Video-Instruct",
load_vlm_weights = true) resolves entirely offline against the bundled cache.
How To Run (evaluator side)
Drop
residual_v4_global.pyinto/opt/lehome-challenge/scripts/eval_policy/.Add to
scripts/eval_policy/__init__.py:from .residual_v4_global import ResidualV4GlobalPolicySet environment variables:
export LEHOME_VLA_POLICY_PATH=<path to submission_models/vla_backbone> export LEHOME_VLA_DATASET_ROOT=<path to submission_models/dataset_meta or any LeRobot dataset> export LEHOME_RESIDUAL_CHECKPOINT=<path to submission_models/residual_averaged.pt> export LEHOME_RESIDUAL_SCALE=0.03The wrapper sets
HF_HOMEautomatically to the bundledhf_cache/when it seesLEHOME_VLA_POLICY_PATH, so no network access is required even on a fully offline evaluator.Belt-and-suspenders — if the evaluator's launcher imports
huggingface_hubbefore our wrapper module loads (rare but possible), the redirect may be too late. To be safe, set HF env vars before invokingpython -m scripts.eval:export HF_HOME="$LEHOME_VLA_POLICY_PATH/../hf_cache" export HF_HUB_CACHE="$HF_HOME/hub" export HF_HUB_OFFLINE=1 export TRANSFORMERS_OFFLINE=1Invoke evaluator:
python -m scripts.eval \ --policy_type residual_v4_global \ --policy_path "$LEHOME_VLA_POLICY_PATH" \ --dataset_root "$LEHOME_VLA_DATASET_ROOT" \ --garment_type <top_long|top_short|pant_long|pant_short> \ --num_episodes 5 --max_steps 600 \ --enable_cameras --device cpu --headless
Method Summary
- Backbone: SmolVLA, jointly-trained on 4 garment types for 30K steps. Frozen during residual RL.
- Residual: small state-only MLP (state_dim=12 → 256 → 256 → action_dim=12, 3 Linear+ReLU layers).
- Final action:
clip(base_action + 0.03 * residual_mlp(state)). - Training signal: sparse reward (1 if folding success at episode end, else 0).
- Training data: 40 Seen garments (10 per type × 4 types), 30 episodes per garment, on-policy PPO updates.
- Aggregation: weights averaged across 40 per-garment training runs to get a single global residual.
- Inference: deterministic — no exploration noise, no online updates.
Key Hyperparameters
| Parameter | Value |
|---|---|
| residual hidden dims | (256, 256) |
| residual scale | 0.03 |
| state_dim | 12 |
| action_dim | 12 |
| training reward | sparse (1 on success) |
| episodes per garment | 30 |
| training garments | 40 (10 Seen × 4 types) |
Evaluation Results
Run on lehome3 / 120.209.70.195:30239, 4× NVIDIA L40S, 4-GPU parallel.
48 garments × 5 episodes = 240 episodes total.
| Metric | Value |
|---|---|
| Total | 150/240 = 62.50% |
| Seen (40 garments × 5 ep) | 136/200 = 68.00% |
| Unseen (8 garments × 5 ep) | 14/40 = 35.00% |
| Top_Long | 43/60 = 71.67% (seen 74.0%, unseen 60.0%) |
| Top_Short | 25/60 = 41.67% (seen 48.0%, unseen 10.0%) |
| Pant_Long | 28/60 = 46.67% (seen 54.0%, unseen 10.0%) |
| Pant_Short | 54/60 = 90.00% (seen 96.0%, unseen 60.0%) |
Reference baselines
| Method | Total | Notes |
|---|---|---|
| This submission (v4 global, deterministic) | 62.50% | 240 ep, 4 types, single model |
| SmolVLA four_types 30K (no residual) | 60.42% | 96 ep, baseline backbone alone |
Historical v4 global with explore=True |
58.75% | 240 ep, non-deterministic |
| ACT (single-type, top_long only) | 87.50% | 24 ep, not comparable across types |
The +1.71pp gain over the SmolVLA backbone alone confirms the residual carries useful signal. The +3.75pp gain over the historical noisy run confirms determinism matters.
Notes on the run
- One garment (
Top_Long_Seen_9) initially failed in the main parallel sweep with an Isaac SimTiledCamera._annotatorsAttributeError (unrelated to the policy). It was retried in a fresh single-garment process and produced 4/5 success — that retry is included in the150/240figure above. - Episode-level data per garment is in
FINAL_SUMMARY.jsonand per-garment stdout logs are ineval_raw/.
Artefact Hashes
| Artefact | sha256 |
|---|---|
| residual_averaged.pt | 9d695e278b4361509ac7e35f7d66eb251ec7e7f1f7c53878d453ef2b8aa0ce74 |
| vla_backbone/model.safetensors | 7ff3915571622bf7530e9ba35540abf5c14f62d8c6a57491664b65a23869e6bc |
Reproducibility
Inference is fully deterministic. Two runs with the same backbone, residual checkpoint, and seed=42 (default) yield identical action sequences. Variability across runs comes only from Isaac Sim particle initialization (seeded by --seed).
Notes
- The
residual_averaged.ptfollows the format:torch.save({ "state_dim": 12, "action_dim": 12, "hidden_dims": (256, 256), "model_state_dict": <state-only MLP weights>, }, path) - This submission is a single model (one residual checkpoint) handling all four garment types — not a per-type specialist ensemble.
- Inference path:
LeRobotPolicy.select_action(observation)→ adds0.03 * residual_mlp(observation['observation.state']).
Contact
vita / realvitacai@gmail.com
klein / kleinlau17@gmail.com
Model tree for v1tavitavita/lehome-residual-v4-global
Base model
HuggingFaceTB/SmolLM2-360M