Upload AirCopBench LoRA adapter (paper arxiv 2511.11025)

c0f3e92 verified 19 days ago

2.23 kB

library_name: peft
license: apache-2.0
base_model: Qwen/Qwen2.5-VL-7B-Instruct
datasets:
  - EasonFan/AirCopBench
tags:
  - lora
  - peft
  - multimodal
  - vision-language
  - uav
  - aerial
  - visual-question-answering
  - multi-agent-perception
pipeline_tag: image-text-to-text

aircop-7b — Qwen2.5-VL-7B fine-tuned on AirCopBench

LoRA adapter for Qwen/Qwen2.5-VL-7B-Instruct, supervised fine-tuned on the training split of AirCopBench, a multi-UAV collaborative aerial perception VQA benchmark.

Paper: https://arxiv.org/pdf/2511.11025

Task

Each question shows the same scene captured at the same moment by 2–6 UAV cameras from different viewpoints, and asks a 4-way multiple-choice question (object grounding, counting, matching, causal/collaboration assessment, etc.). The model answers with a single option letter.

Results (AirCopBench test, 1025 questions)

Subset	Accuracy
Overall	0.7532 (772/1025)
Real2 (2 real UAVs)	0.5785
Sim3 (3 sim UAVs)	0.8244
Sim5 (5 sim UAVs)	0.7551
Sim6 (6 sim UAVs)	0.7634

Parse failures: 0.

Training

Method: LoRA SFT (rank 16, lora_target: all), 1 epoch, bf16, flash-attn 2
Effective batch size 16 (per-device 8 × grad-accum 2), lr 1e-4 cosine, image_max_pixels 262144
Framework: LLaMA-Factory, template qwen2_vl
~12.7k multi-image samples (Real2 / Sim3 / Sim5 / Sim6)

Usage

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel

base = "Qwen/Qwen2.5-VL-7B-Instruct"
model = AutoModelForImageTextToText.from_pretrained(base, dtype=torch.bfloat16, device_map="cuda")
model = PeftModel.from_pretrained(model, "EasonFan/aircop-7b")
processor = AutoProcessor.from_pretrained(base)

messages = [{"role": "user", "content": [
    {"type": "text", "text": "UAV1:"}, {"type": "image"},
    {"type": "text", "text": "UAV2:"}, {"type": "image"},
    {"type": "text", "text": "Question: ...\nOptions:\nA. ...\nB. ...\nC. ...\nD. ...\nAnswer with only the letter."},
]}]
# build inputs with processor.apply_chat_template + processor(...) and call model.generate()