psh3333 commited on
Commit
ebd3f79
ยท
verified ยท
1 Parent(s): 7ef52c0

Delete README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -33
README.md DELETED
@@ -1,33 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- base_model: Qwen/Qwen3-VL-8B-Instruct
4
- tags: [multimodal, bias, grpo, vlm, dacon, bbq, negative-result]
5
- ---
6
-
7
- # DACON SKKU Bias VLM โ€” v7 (cold-start SFT + reasoning GRPO) โ€” NEGATIVE RESULT
8
-
9
- 2026 ์„ฑ๊ท ๊ด€๋Œ€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ AI Bias ์ฑŒ๋ฆฐ์ง€(236722) ์‹คํ—˜ ๋ชจ๋ธ. base Qwen3-VL-8B-Instruct (Apache-2.0).
10
- **v7์€ ์˜๋„์ ์œผ๋กœ ๋ณด์กดํ•œ ์‹คํŒจ(์Œ์„ฑ๊ฒฐ๊ณผ) ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.** ์ฝœ๋“œ์Šคํƒ€ํŠธ SFT + ์ถ”๋ก (reasoning) GRPO๋ฅผ ์‹œ๋„ํ–ˆ์œผ๋‚˜
11
- ๊ณผ์ ํ•ฉยทํŒŒ๊ตญ์  ๋ง๊ฐ์œผ๋กœ OOD ์ผ๋ฐ˜ํ™”๊ฐ€ ๋ถ•๊ดดํ–ˆ์Šต๋‹ˆ๋‹ค.
12
-
13
- ## ๊ตฌ์„ฑ
14
- - ์ฝœ๋“œ์Šคํƒ€ํŠธ SFT(BBQ 100%) + ์ถ”๋ก  ๋ผ๋ฒจ ํ…œํ”Œ๋ฆฟ + ์ถ”๋ก  GRPO(๋‚œ์ด๋„๋ฏน์Šค), LoRA rank 64.
15
-
16
- ## ํ‰๊ฐ€ (held-out ๊ณต๊ฐœ v8_eval 900, DACON ํ‰๊ฐ€์…‹ ๋ฏธ์‚ฌ์šฉ)
17
- | ์ง€ํ‘œ | v4(๊ธฐ์ค€) | **v7** |
18
- |---|---|---|
19
- | BBQ ambiguous acc | 1.0 | **0.7067** โฌ‡ |
20
- | BBQ disambiguated acc | 1.0 | 1.0 |
21
- | OOD acc | 0.8033 | **0.6733** โฌ‡ |
22
- | option-shuffle consistency | 0.9478 | **0.7944** โฌ‡ |
23
- | ambiguous person-selection error | 0.0 | **0.2933** โฌ‡ |
24
- | bias s_AMB | 0.0 | **-0.0667** (์—ญํŽธํ–ฅ) |
25
-
26
- (๋ณ„๋„ OOD-only 800์…‹์—์„œ๋Š” OOD 0.416๊นŒ์ง€ ํ•˜๋ฝ.)
27
-
28
- ## ๊ตํ›ˆ
29
- - ablation: ์ฝœ๋“œ์Šคํƒ€ํŠธ SFT๋งŒ์œผ๋กœ GRPO ์ด์ „์— ์ด๋ฏธ OOD 0.42 โ†’ **๋ถ•๊ดด ์›์ธ์€ ์ฝœ๋“œ์Šคํƒ€ํŠธ SFT**(GRPO ๋ฌด์ฃ„).
30
- - Chu et al. 2025 "SFT Memorizes, RL Generalizes" ์˜ˆ์ธก๊ณผ ์ผ์น˜. "ํŠนํ™”ยท์•”๊ธฐํ• ์ˆ˜๋ก ์ผ๋ฐ˜ํ™” ์ƒ์‹ค."
31
- - ํ›„์† v8A/v8B๋Š” **์ฝœ๋“œ์Šคํƒ€ํŠธ๋ฅผ ์ œ๊ฑฐ**(GRPO-only)ํ•˜์—ฌ ์ผ๋ฐ˜ํ™” ์œ ์ง€ โ†’ ์˜ฌ๋ฐ”๋ฅธ ์ฒ˜๋ฐฉ.
32
-
33
- ๋ ˆํฌ๋Š” ์žฌํ˜„ยท๋Œ€์กฐ(์Œ์„ฑ๊ฒฐ๊ณผ) ๋ชฉ์ ์œผ๋กœ ๊ณต๊ฐœ. ์ œ์ถœ ํ›„๋ณด๋Š” v8B(`psh3333/dacon-skku-bias-vlm-v8b`).