--- library_name: peft base_model: OpenGVLab/InternVL3-2B tags: - internvl - internvl3 - vision - image-text-to-text - lora - blind-assist - walk-vlm datasets: - blind-assist/walk-train --- # internvl3-2b-walk-lora-v1 ## Model Description This is a **LoRA adapter** for **InternV3-2B**, fine-tuned on the **WalkVLM** dataset to assist visually impaired individuals with navigation hazard detection. ## How to Use ### Method 1: Using PEFT (Recommended) ```python import torch from peft import PeftModel from transformers import AutoModel, AutoTokenizer # Load Base Model base_model = AutoModel.from_pretrained( "OpenGVLab/InternVL3-2B", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("OpenGVLab/InternVL3-2B", trust_remote_code=True) # Load LoRA Adapter model = PeftModel.from_pretrained(base_model, "blind-assist/internvl3-2b-walk-lora-v1") # Merge for faster inference (optional) model = model.merge_and_unload() # Use for inference response = model.chat( tokenizer=tokenizer, pixel_values=pixel_values, # Your preprocessed image question="Describe any obstacles in this scene.", generation_config=dict(max_new_tokens=256) ) ``` ### Method 2: Manual LoRA Merge If PEFT doesn't work due to model architecture, use manual merging: ```python # See our inference script at: # https://github.com/Blind-Assist/InternVL/blob/walkvlm/internvl_chat/test_finetuned_model.py ``` ## Training Details - **Base Model:** [OpenGVLab/InternVL3-2B](https://huggingface.co/OpenGVLab/InternVL3-2B) - **Method:** LoRA (Low-Rank Adaptation) - **LoRA Rank:** 128 - **Dataset:** [blind-assist/walk-train](https://huggingface.co/datasets/blind-assist/walk-train) - **Task:** Navigation hazard detection for visually impaired users ## Files - `adapter_config.json` - PEFT LoRA configuration - `adapter_model.safetensors` - LoRA weights only (~50MB) ## License Same as base model (OpenGVLab/InternVL3-2B)