ZHANGYUXUAN-zR nielsr HF Staff commited on
Commit
e32aaf0
·
1 Parent(s): 5378302

Add community evaluation results for DEEP-SWE, GPQA, HLE, SWE-BENCH_PRO (#12)

Browse files

- Add community evaluation results for DEEP-SWE, GPQA, HLE, SWE-BENCH_PRO (dc7bbb10eb1b42e53e457acab4484cacd0ac56b5)
- Update .eval_results/hle.yaml (fa33c2590d0dcbac5756903d2163910eb822a8c9)
- Update .eval_results/hle.yaml (7f96c1d4edb724d93f7c34d172add9a32be27c63)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

.eval_results/deep-swe.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: datacurve/deep-swe
3
+ task_id: deep_swe
4
+ value: 46.2
5
+ source:
6
+ url: https://huggingface.co/zai-org/GLM-5.2
7
+ name: Model Card
.eval_results/gpqa.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: Idavidrein/gpqa
3
+ task_id: diamond
4
+ value: 91.2
5
+ source:
6
+ url: https://huggingface.co/zai-org/GLM-5.2
7
+ name: Model Card
.eval_results/hle.yaml ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: cais/hle
3
+ task_id: hle
4
+ value: 40.5
5
+ source:
6
+ url: https://huggingface.co/zai-org/GLM-5.2
7
+ name: Model Card
8
+
9
+ - dataset:
10
+ id: cais/hle
11
+ task_id: hle
12
+ value: 54.7
13
+ source:
14
+ url: https://huggingface.co/zai-org/GLM-5.2
15
+ name: Model Card
16
+ notes: "With tools"
.eval_results/swe-bench_pro.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: ScaleAI/SWE-bench_Pro
3
+ task_id: SWE_Bench_Pro
4
+ value: 62.1
5
+ source:
6
+ url: https://huggingface.co/zai-org/GLM-5.2
7
+ name: Model Card