Robotics
LeRobot
Safetensors
lehome-challenge
smolvla
residual-rl
v1tavitavita's picture
Update README.md
6a81cdc verified
|
Raw
History Blame Contribute Delete
6.75 kB
---
datasets:
- lehome/dataset_challenge
- lehome/dataset_challenge_merged
base_model:
- HuggingFaceTB/SmolVLM2-500M-Video-Instruct
pipeline_tag: robotics
tags:
- robotics
- lerobot
- lehome-challenge
- smolvla
- residual-rl
---
# LeHome Challenge 2026 β€” Submission
**Method**: SmolVLA (frozen four_types 30K backbone) + state-only residual MLP, trained with sparse-reward residual RL on 40 Seen garments. Single model, deterministic inference.
## File Layout
```
submission_v4_global/
β”œβ”€β”€ README.md # this file
β”œβ”€β”€ residual_v4_global.py # the policy module (drop into eval_policy/)
└── submission_models/
β”œβ”€β”€ vla_backbone/ # SmolVLA four_types 30K (LeRobot pretrained_model, ~865M)
β”œβ”€β”€ residual_averaged.pt # 40-garment averaged residual MLP (~286K)
β”œβ”€β”€ dataset_meta/ # LeRobot dataset metadata (stats.json etc.)
└── hf_cache/ # bundled SmolVLM2 weights for offline VLM load (~1.9G)
└── hub/models--HuggingFaceTB--SmolVLM2-500M-Video-Instruct/
β”œβ”€β”€ snapshots/<commit>/ # tokenizer + processor + model.safetensors
└── refs/main # commit hash file
```
The wrapper detects `submission_models/hf_cache/` next to `vla_backbone/` and
sets `HF_HOME` to it during `__init__`, so the SmolVLM2 backbone load
(`vlm_model_name = "HuggingFaceTB/SmolVLM2-500M-Video-Instruct"`,
`load_vlm_weights = true`) resolves entirely offline against the bundled cache.
## How To Run (evaluator side)
1. Drop `residual_v4_global.py` into `/opt/lehome-challenge/scripts/eval_policy/`.
2. Add to `scripts/eval_policy/__init__.py`:
```python
from .residual_v4_global import ResidualV4GlobalPolicy
```
3. Set environment variables:
```bash
export LEHOME_VLA_POLICY_PATH=<path to submission_models/vla_backbone>
export LEHOME_VLA_DATASET_ROOT=<path to submission_models/dataset_meta or any LeRobot dataset>
export LEHOME_RESIDUAL_CHECKPOINT=<path to submission_models/residual_averaged.pt>
export LEHOME_RESIDUAL_SCALE=0.03
```
The wrapper sets `HF_HOME` automatically to the bundled `hf_cache/` when
it sees `LEHOME_VLA_POLICY_PATH`, so no network access is required even on
a fully offline evaluator.
**Belt-and-suspenders** β€” if the evaluator's launcher imports
`huggingface_hub` before our wrapper module loads (rare but possible),
the redirect may be too late. To be safe, set HF env vars **before**
invoking `python -m scripts.eval`:
```bash
export HF_HOME="$LEHOME_VLA_POLICY_PATH/../hf_cache"
export HF_HUB_CACHE="$HF_HOME/hub"
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
```
4. Invoke evaluator:
```bash
python -m scripts.eval \
--policy_type residual_v4_global \
--policy_path "$LEHOME_VLA_POLICY_PATH" \
--dataset_root "$LEHOME_VLA_DATASET_ROOT" \
--garment_type <top_long|top_short|pant_long|pant_short> \
--num_episodes 5 --max_steps 600 \
--enable_cameras --device cpu --headless
```
## Method Summary
- **Backbone**: SmolVLA, jointly-trained on 4 garment types for 30K steps. Frozen during residual RL.
- **Residual**: small state-only MLP (state_dim=12 β†’ 256 β†’ 256 β†’ action_dim=12, 3 Linear+ReLU layers).
- **Final action**: `clip(base_action + 0.03 * residual_mlp(state))`.
- **Training signal**: sparse reward (1 if folding success at episode end, else 0).
- **Training data**: 40 Seen garments (10 per type Γ— 4 types), 30 episodes per garment, on-policy PPO updates.
- **Aggregation**: weights averaged across 40 per-garment training runs to get a single global residual.
- **Inference**: deterministic β€” no exploration noise, no online updates.
## Key Hyperparameters
| Parameter | Value |
|---|---|
| residual hidden dims | (256, 256) |
| residual scale | 0.03 |
| state_dim | 12 |
| action_dim | 12 |
| training reward | sparse (1 on success) |
| episodes per garment | 30 |
| training garments | 40 (10 Seen Γ— 4 types) |
## Evaluation Results
Run on `lehome3 / 120.209.70.195:30239`, 4Γ— NVIDIA L40S, 4-GPU parallel.
48 garments Γ— 5 episodes = 240 episodes total.
| Metric | Value |
|---|---|
| **Total** | **150/240 = 62.50%** |
| Seen (40 garments Γ— 5 ep) | 136/200 = 68.00% |
| Unseen (8 garments Γ— 5 ep) | 14/40 = 35.00% |
| Top_Long | 43/60 = 71.67% (seen 74.0%, unseen 60.0%) |
| Top_Short | 25/60 = 41.67% (seen 48.0%, unseen 10.0%) |
| Pant_Long | 28/60 = 46.67% (seen 54.0%, unseen 10.0%) |
| **Pant_Short** | **54/60 = 90.00%** (seen 96.0%, unseen 60.0%) |
### Reference baselines
| Method | Total | Notes |
|---|---|---|
| **This submission** (v4 global, deterministic) | **62.50%** | 240 ep, 4 types, single model |
| SmolVLA four_types 30K (no residual) | 60.42% | 96 ep, baseline backbone alone |
| Historical v4 global with `explore=True` | 58.75% | 240 ep, non-deterministic |
| ACT (single-type, top_long only) | 87.50% | 24 ep, not comparable across types |
The +1.71pp gain over the SmolVLA backbone alone confirms the residual carries useful signal.
The +3.75pp gain over the historical noisy run confirms determinism matters.
### Notes on the run
- One garment (`Top_Long_Seen_9`) initially failed in the main parallel sweep with an Isaac Sim
`TiledCamera._annotators` AttributeError (unrelated to the policy). It was retried in a
fresh single-garment process and produced 4/5 success β€” that retry is included in the
`150/240` figure above.
- Episode-level data per garment is in `FINAL_SUMMARY.json` and per-garment stdout logs
are in `eval_raw/`.
## Artefact Hashes
| Artefact | sha256 |
|---|---|
| residual_averaged.pt | `9d695e278b4361509ac7e35f7d66eb251ec7e7f1f7c53878d453ef2b8aa0ce74` |
| vla_backbone/model.safetensors | `7ff3915571622bf7530e9ba35540abf5c14f62d8c6a57491664b65a23869e6bc` |
## Reproducibility
Inference is fully deterministic. Two runs with the same backbone, residual checkpoint, and `seed=42` (default) yield identical action sequences. Variability across runs comes only from Isaac Sim particle initialization (seeded by `--seed`).
## Notes
- The `residual_averaged.pt` follows the format:
```python
torch.save({
"state_dim": 12,
"action_dim": 12,
"hidden_dims": (256, 256),
"model_state_dict": <state-only MLP weights>,
}, path)
```
- This submission is a **single model** (one residual checkpoint) handling all four garment types β€” not a per-type specialist ensemble.
- Inference path: `LeRobotPolicy.select_action(observation)` β†’ adds `0.03 * residual_mlp(observation['observation.state'])`.
## Contact
vita / realvitacai@gmail.com
klein / kleinlau17@gmail.com