---
library_name: peft
base_model: OpenGVLab/InternVL3-2B
tags:
- internvl
- internvl3
- vision
- image-text-to-text
- lora
- blind-assist
- walk-vlm
datasets:
- blind-assist/walk-train
---

# internvl3-2b-walk-lora-v1

## Model Description
This is a **LoRA adapter** for **InternV3-2B**, fine-tuned on the **WalkVLM** dataset to assist visually impaired individuals with navigation hazard detection.

## How to Use

### Method 1: Using PEFT (Recommended)
```python
import torch
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer

# Load Base Model
base_model = AutoModel.from_pretrained(
    "OpenGVLab/InternVL3-2B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("OpenGVLab/InternVL3-2B", trust_remote_code=True)

# Load LoRA Adapter
model = PeftModel.from_pretrained(base_model, "blind-assist/internvl3-2b-walk-lora-v1")

# Merge for faster inference (optional)
model = model.merge_and_unload()

# Use for inference
response = model.chat(
    tokenizer=tokenizer,
    pixel_values=pixel_values,  # Your preprocessed image
    question="Describe any obstacles in this scene.",
    generation_config=dict(max_new_tokens=256)
)
```

### Method 2: Manual LoRA Merge
If PEFT doesn't work due to model architecture, use manual merging:
```python
# See our inference script at:
# https://github.com/Blind-Assist/InternVL/blob/walkvlm/internvl_chat/test_finetuned_model.py
```

## Training Details
- **Base Model:** [OpenGVLab/InternVL3-2B](https://huggingface.co/OpenGVLab/InternVL3-2B)
- **Method:** LoRA (Low-Rank Adaptation)
- **LoRA Rank:** 128
- **Dataset:** [blind-assist/walk-train](https://huggingface.co/datasets/blind-assist/walk-train)
- **Task:** Navigation hazard detection for visually impaired users

## Files
- `adapter_config.json` - PEFT LoRA configuration
- `adapter_model.safetensors` - LoRA weights only (~50MB)

## License
Same as base model (OpenGVLab/InternVL3-2B)