SNU Thunder-LLM Korean Benchmark Suite - a thunder-research-group Collection

thunder-research-group 's Collections

SNU Thunder-LLM Korean Benchmark Suite

SNU Thunder-LLM English Benchmark Suite

SNU Thunder-LLM Dataset Suite

Post-Training Datasets

Negation Benchmarks

SNU Thunder-LLM Korean Benchmark Suite

updated Apr 20

thunder-research-group/SNU_Ko-LAMBADA

Viewer • Updated 8 days ago • 2.26k • 149
thunder-research-group/SNU_Ko-WinoGrande

Viewer • Updated 8 days ago • 1.27k • 135
thunder-research-group/SNU_Ko-ARC

Viewer • Updated 8 days ago • 3.54k • 187
thunder-research-group/SNU_Ko-GSM8K

Viewer • Updated 8 days ago • 1.32k • 128 • 1
thunder-research-group/SNU_Ko-IFEval

Viewer • Updated 8 days ago • 841 • 123
thunder-research-group/SNU_Ko-EQ-Bench

Viewer • Updated 8 days ago • 171 • 88
skt/kobest_v1

Viewer • Updated Mar 28, 2024 • 23.4k • 3.97k • 54

Note We use hellaswag > test set for evaluation
HAERAE-HUB/KMMLU

Viewer • Updated Mar 5, 2024 • 244k • 15.4k • 98
HYU-NLP/KR-HumanEval

Viewer • Updated Jun 3, 2025 • 328 • 23

Note We use v1 for evaluation
LGCNS/KorQuAD_2.0

Viewer • Updated Aug 7, 2025 • 93.7k • 487 • 2
thunder-research-group/SNU_Ko-MuSR

Viewer • Updated Nov 24, 2025 • 750 • 133
thunder-research-group/SNU_Thunder-KoNUBench

Viewer • Updated Apr 22 • 4.78k • 20 • 1