--- datasets: - lehome/dataset_challenge - lehome/dataset_challenge_merged base_model: - HuggingFaceTB/SmolVLM2-500M-Video-Instruct pipeline_tag: robotics tags: - robotics - lerobot - lehome-challenge - smolvla - residual-rl --- # LeHome Challenge 2026 — Submission **Method**: SmolVLA (frozen four_types 30K backbone) + state-only residual MLP, trained with sparse-reward residual RL on 40 Seen garments. Single model, deterministic inference. ## File Layout ``` submission_v4_global/ ├── README.md # this file ├── residual_v4_global.py # the policy module (drop into eval_policy/) └── submission_models/ ├── vla_backbone/ # SmolVLA four_types 30K (LeRobot pretrained_model, ~865M) ├── residual_averaged.pt # 40-garment averaged residual MLP (~286K) ├── dataset_meta/ # LeRobot dataset metadata (stats.json etc.) └── hf_cache/ # bundled SmolVLM2 weights for offline VLM load (~1.9G) └── hub/models--HuggingFaceTB--SmolVLM2-500M-Video-Instruct/ ├── snapshots// # tokenizer + processor + model.safetensors └── refs/main # commit hash file ``` The wrapper detects `submission_models/hf_cache/` next to `vla_backbone/` and sets `HF_HOME` to it during `__init__`, so the SmolVLM2 backbone load (`vlm_model_name = "HuggingFaceTB/SmolVLM2-500M-Video-Instruct"`, `load_vlm_weights = true`) resolves entirely offline against the bundled cache. ## How To Run (evaluator side) 1. Drop `residual_v4_global.py` into `/opt/lehome-challenge/scripts/eval_policy/`. 2. Add to `scripts/eval_policy/__init__.py`: ```python from .residual_v4_global import ResidualV4GlobalPolicy ``` 3. Set environment variables: ```bash export LEHOME_VLA_POLICY_PATH= export LEHOME_VLA_DATASET_ROOT= export LEHOME_RESIDUAL_CHECKPOINT= export LEHOME_RESIDUAL_SCALE=0.03 ``` The wrapper sets `HF_HOME` automatically to the bundled `hf_cache/` when it sees `LEHOME_VLA_POLICY_PATH`, so no network access is required even on a fully offline evaluator. **Belt-and-suspenders** — if the evaluator's launcher imports `huggingface_hub` before our wrapper module loads (rare but possible), the redirect may be too late. To be safe, set HF env vars **before** invoking `python -m scripts.eval`: ```bash export HF_HOME="$LEHOME_VLA_POLICY_PATH/../hf_cache" export HF_HUB_CACHE="$HF_HOME/hub" export HF_HUB_OFFLINE=1 export TRANSFORMERS_OFFLINE=1 ``` 4. Invoke evaluator: ```bash python -m scripts.eval \ --policy_type residual_v4_global \ --policy_path "$LEHOME_VLA_POLICY_PATH" \ --dataset_root "$LEHOME_VLA_DATASET_ROOT" \ --garment_type \ --num_episodes 5 --max_steps 600 \ --enable_cameras --device cpu --headless ``` ## Method Summary - **Backbone**: SmolVLA, jointly-trained on 4 garment types for 30K steps. Frozen during residual RL. - **Residual**: small state-only MLP (state_dim=12 → 256 → 256 → action_dim=12, 3 Linear+ReLU layers). - **Final action**: `clip(base_action + 0.03 * residual_mlp(state))`. - **Training signal**: sparse reward (1 if folding success at episode end, else 0). - **Training data**: 40 Seen garments (10 per type × 4 types), 30 episodes per garment, on-policy PPO updates. - **Aggregation**: weights averaged across 40 per-garment training runs to get a single global residual. - **Inference**: deterministic — no exploration noise, no online updates. ## Key Hyperparameters | Parameter | Value | |---|---| | residual hidden dims | (256, 256) | | residual scale | 0.03 | | state_dim | 12 | | action_dim | 12 | | training reward | sparse (1 on success) | | episodes per garment | 30 | | training garments | 40 (10 Seen × 4 types) | ## Evaluation Results Run on `lehome3 / 120.209.70.195:30239`, 4× NVIDIA L40S, 4-GPU parallel. 48 garments × 5 episodes = 240 episodes total. | Metric | Value | |---|---| | **Total** | **150/240 = 62.50%** | | Seen (40 garments × 5 ep) | 136/200 = 68.00% | | Unseen (8 garments × 5 ep) | 14/40 = 35.00% | | Top_Long | 43/60 = 71.67% (seen 74.0%, unseen 60.0%) | | Top_Short | 25/60 = 41.67% (seen 48.0%, unseen 10.0%) | | Pant_Long | 28/60 = 46.67% (seen 54.0%, unseen 10.0%) | | **Pant_Short** | **54/60 = 90.00%** (seen 96.0%, unseen 60.0%) | ### Reference baselines | Method | Total | Notes | |---|---|---| | **This submission** (v4 global, deterministic) | **62.50%** | 240 ep, 4 types, single model | | SmolVLA four_types 30K (no residual) | 60.42% | 96 ep, baseline backbone alone | | Historical v4 global with `explore=True` | 58.75% | 240 ep, non-deterministic | | ACT (single-type, top_long only) | 87.50% | 24 ep, not comparable across types | The +1.71pp gain over the SmolVLA backbone alone confirms the residual carries useful signal. The +3.75pp gain over the historical noisy run confirms determinism matters. ### Notes on the run - One garment (`Top_Long_Seen_9`) initially failed in the main parallel sweep with an Isaac Sim `TiledCamera._annotators` AttributeError (unrelated to the policy). It was retried in a fresh single-garment process and produced 4/5 success — that retry is included in the `150/240` figure above. - Episode-level data per garment is in `FINAL_SUMMARY.json` and per-garment stdout logs are in `eval_raw/`. ## Artefact Hashes | Artefact | sha256 | |---|---| | residual_averaged.pt | `9d695e278b4361509ac7e35f7d66eb251ec7e7f1f7c53878d453ef2b8aa0ce74` | | vla_backbone/model.safetensors | `7ff3915571622bf7530e9ba35540abf5c14f62d8c6a57491664b65a23869e6bc` | ## Reproducibility Inference is fully deterministic. Two runs with the same backbone, residual checkpoint, and `seed=42` (default) yield identical action sequences. Variability across runs comes only from Isaac Sim particle initialization (seeded by `--seed`). ## Notes - The `residual_averaged.pt` follows the format: ```python torch.save({ "state_dim": 12, "action_dim": 12, "hidden_dims": (256, 256), "model_state_dict": , }, path) ``` - This submission is a **single model** (one residual checkpoint) handling all four garment types — not a per-type specialist ensemble. - Inference path: `LeRobotPolicy.select_action(observation)` → adds `0.03 * residual_mlp(observation['observation.state'])`. ## Contact vita / realvitacai@gmail.com klein / kleinlau17@gmail.com