metadata
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Instruct
tags:
- multimodal
- bias
- grpo
- vlm
- dacon
- bbq
DACON SKKU Bias VLM โ v8B (robustness-aware GRPO)
2026 ์ฑ๊ท ๊ด๋ ๋ฉํฐ๋ชจ๋ฌ AI Bias ์ฑ๋ฆฐ์ง(236722) ์ถํ ๋ชจ๋ธ. GRPO-only(์ฝ๋์คํํธ SFT ์์, ์ถ๋ก ํ์ฅ ์์) ๋จ์ผํ ํฐ(0/1/2) ๋ฉํฐ๋ชจ๋ฌ ํธํฅ์ํ VLM. base: Qwen3-VL-8B-Instruct (Apache-2.0).
ํ ์ค ์์ฝ
v8A๊ฐ OOD๋ฅผ ์ฌ๋ฆฌ๋ฉฐ ์์๋ ์ต์ ์์ ๊ฐ๊ฑด์ฑ(option-shuffle consistency)์ ํ๋ณตํ๋ฉด์ OOD๋ฅผ ์ ์งํ ๋ฒ์ . ์ ํ์ง ๋ฐ์ดํฐ์ฆ๊ฐ + shuffle-consistency/source-normalized ๋ณด์์ผ๋ก ํ์ต.
ํ๊ฐ (held-out ๊ณต๊ฐ v8_eval 900, DACON ํ๊ฐ์ ๋ฏธ์ฌ์ฉ)
| ์งํ | v4 | v6 | v8A | v8B |
|---|---|---|---|---|
| BBQ amb/dis acc | 1.0/1.0 | 1.0/1.0 | 1.0/1.0 | 1.0/1.0 |
| OOD acc | 0.8033 | 0.8083 | 0.8117 | 0.8067 |
| option-shuffle consistency | 0.9478 | 0.9456 | 0.9411 | 0.9456 (ํ๋ณต) |
| unknown-position consistency | 1.0 | 1.0 | 1.0 | 1.0 |
| over-abstain / person-error | 0/0 | 0/0 | 0/0 | 0/0 |
| format validity | 0.9556 | 0.9533 | 0.9733 | 0.9544 |
| ๋ถ๋ฅ | โ | โ | โ | PASS |
ํ์ต ๋ฐฉ๋ฒ
- GRPO-only, LoRA rank 16 (attention-only, MLP off, vision frozen), ๋จ์ผํ ํฐ ์ถ๋ ฅ, lr 1e-6, 200 steps, num_gen 8.
- ๋ณด์ 6์ข : answer / shuffle_consistency / abstain / source_normalized / format / length.
- ๋์ ์ํ๋ง(์คํ๋ผ์ธ ๊ทผ์ฌ) + ์ ํ์ง(์๋ณธ+์ ํ) ๋ฐ์ดํฐ์ฆ๊ฐ.
- ๋ฐ์ดํฐ: ๊ณต๊ฐ BBQ(Elfsong/BBQ) + ์ผ๋ฐ์ถ๋ก (SIQA/CSQA/OBQA/ARC). DACON ํ๊ฐ์ ยทํ ์คํธ ๋ฏธ์ฌ์ฉ.
์ฌ์ฉ
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
m = AutoModelForImageTextToText.from_pretrained("psh3333/dacon-skku-bias-vlm-v8b", torch_dtype=torch.bfloat16, device_map="auto")
proc = AutoProcessor.from_pretrained("psh3333/dacon-skku-bias-vlm-v8b")
# system + user(context/question/3 options + image) โ ๋จ์ผํ ํฐ 0/1/2 (๋ชจ๋ธ ์์ฑ ํ
์คํธ์์ ํ์ฑ)
์ต์ข ๋ผ๋ฒจ์ ๋ชจ๋ธ ์์ฑ ํ ์คํธ์์ ํ์ฑ(๊ท์น๊ธฐ๋ฐ ์ ํ ์๋). ์ธ๋ถ API ์ถ๋ก ์์. ๊ธฐ์คํ๊ฒฝ torch 2.6 ํธํ.
๊ท์น ์ค์
DACON ํ๊ฐ์ ๋ฏธ์ฌ์ฉ ยท ํ๊ฐ์ ํจํด๋ง์ด๋ 0 ยท ๊ท์น๊ธฐ๋ฐ ์ ๋ต์ ํ 0 ยท ์ต์ข ๋ต=๋ชจ๋ธํ ์คํธ ยท base Apache-2.0.