--- library_name: transformers base_model: Qwen/Qwen3.5-2B tags: - vla - robotics - vision-language-action - qwen3.5 license: cc-by-nc-4.0 --- # Qwen3.5-2B-LoRA-LAP-UR5e-PyAV [VLA-0](https://github.com/omron-sinicx/vla0) checkpoint: **Qwen/Qwen3.5-2B** fine-tuned with **LoRA** on LIBERO benchmark tasks. VLA-0 represents robot actions directly as text tokens — no architectural changes to the base VLM. Trained on **UR5e (RoboVerse)**. ## Quick Start ### 1. Download ```bash pip install huggingface_hub huggingface-cli download denkiwakame/Qwen3.5-2B-LoRA-LAP-UR5e-PyAV --local-dir ./Qwen3.5-2B-LoRA-LAP-UR5e-PyAV ``` ### 2. Load with PEFT + transformers ```python import pickle import torch from peft import PeftModel from transformers import Qwen2_5_VLForConditionalGeneration, Qwen2_5_VLProcessor ckpt_dir = "./Qwen3.5-2B-LoRA-LAP-UR5e-PyAV" # Load base model + LoRA adapter base = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16, device_map="auto", ) model = PeftModel.from_pretrained(base, f"{ckpt_dir}/model_final") processor = Qwen2_5_VLProcessor.from_pretrained(f"{ckpt_dir}/model_final") # Load dataset stats (required for action denormalization) with open(f"{ckpt_dir}/dataset_stats.pkl", "rb") as f: dataset_stats = pickle.load(f) ``` ### 3. Load with VLA-0 framework ```python from rv_train.train import get_pretrained_model model, cfg = get_pretrained_model("./Qwen3.5-2B-LoRA-LAP-UR5e-PyAV", device=0) model.eval() ``` ## `dataset_stats.pkl` Action normalization statistics computed from the training dataset. Required at inference time to denormalize model outputs back to the original action space. ```python import pickle with open("dataset_stats.pkl", "rb") as f: stats = pickle.load(f) # stats contains mean/std for action dimensions ``` ## Intermediate Checkpoints `main` holds the recommended/final weights. Earlier training-step snapshots are published as **branches** named `step-` (e.g., `step-17000`, `step-18000`). Load any of them by passing `revision=`: ```python # Download a specific revision huggingface-cli download denkiwakame/Qwen3.5-2B-LoRA-LAP-UR5e-PyAV --revision step-18000 --local-dir ./Qwen3.5-2B-LoRA-LAP-UR5e-PyAV-step-18000 # Or load directly via transformers Qwen2_5_VLForConditionalGeneration.from_pretrained( "denkiwakame/Qwen3.5-2B-LoRA-LAP-UR5e-PyAV", revision="step-18000", subfolder="model_final", ) ``` See the [repository branches tab](https://huggingface.co/denkiwakame/Qwen3.5-2B-LoRA-LAP-UR5e-PyAV/refs) for the full list. ## Training Details - **Base Model**: `Qwen/Qwen3.5-2B` - **Method**: LoRA - **Dataset**: UR5e (RoboVerse) - **Framework**: [VLA-0](https://github.com/omron-sinicx/vla0)
Training Config ```yaml DATALOADER: ROBOVERSE: cfg_opts: IMAGE.crop_img:0.9:IMAGE.img_size:224:IMAGE.cam_list:('3p1','wrist_right1') cfg_path: libs/RoboVerse/roboverse/configs/ur5e_cluttered_pick_3obj_120.yaml batch_size: 16 num_workers: 8 EXP: AMP: true DATASET: roboverse EXP_ID: lap_qwen3_5_2b_fft_ur5e_cluttered_pick_3obj_120_lora LOSS: {} LR_SCHED: none MODEL: qwen OPTIMIZER: adamw SEED: 0 EXP_EXTRA: no_test: true no_track: true no_val: true save_at_steps: - 2000 - 4000 - 6000 - 8000 save_ckp: 0 save_last_ckpt: true test_eval_freq: 1 val_eval_freq: 1 LR_SCHED: lr_clip: 1.0e-08 lr_decay_factor: 0.5 lr_patience: 4 MODEL: QWEN: action_mask_aug_per: 0.4 action_type: original add_vision_id: true attention_dropout: 0.0 enable_thinking: true grad_checkpoint: false history: 1 horizon: 8 lap_action_is_absolute: true lap_emit_holds: false lap_rotation_precision: 1 lap_sum_decimal: 1f lora_config: default lora_rank: 8 num_bins_actions: 1000 num_cam: 2 original_action_dim: 7 qwen_model_id: Qwen/Qwen3.5-2B reasoning: true rgb_img_size: - 224 - 224 rgb_input: true tiled_rgb_imgs: true use_flash_attention_2: true use_lora: true use_qlora: false TRAIN: clip_grad_norm: 0.0 l2: 1.0e-10 lr: 1.0e-05 num_epochs: 100 num_iters: 10000 save_iter_ckp: 2500 WANDB: enable: true entity: '' log_interval: 100 mode: online project: vla0 resume_id: '' run_name: '' tags: '' ```
## Files | File | Description | |------|-------------| | `model_final/adapter_config.json` | PEFT adapter configuration (includes `base_model_name_or_path`) | | `model_final/adapter_model.safetensors` | LoRA adapter weights | | `model_final/tokenizer.json` | Tokenizer | | `dataset_stats.pkl` | Action normalization statistics (required for inference) | | `config.yaml` | Training configuration | ## License CC-BY-NC-4.0 (following the upstream VLA-0 license). Subject to [Qwen License](https://huggingface.co/Qwen/Qwen3.5-2B) for the base model.