Agents-A1
Qwen3.6-35B-A3B
Step-3.5-Flash
Kimi-K2.6
DeepSeek-V4-pro(Max)
gpt-5.5(xhigh)
0
20
40
60
80
100
36.2
23.1
54.0
48.2
52.2
47.6
HLE
0
20
40
60
80
100
37.7
38.3
41.1
38.7
43.3
46.4
HiPhO
0
20
40
60
80
100
60.3
61.0
73.0
76.0
78.0
79.0
FrontierScience-Olympiad
0
20
40
60
80
100
2.9
6.7
17.9
13.3
26.7
40.0
FrontierScience-Research
0
20
40
60
80
100
67.9
69.0
83.2
83.4
84.4
75.5
BrowseComp
0
20
40
60
80
100
71.0
56.3
90.0
90.0
84.0
86.0
XBench
0
20
40
60
80
100
38.7
36.9
50.5
55.0
42.3
56.4
SEAL-0
0
20
40
60
80
100
78.6
84.5
80.6
98.1
87.4
96.0
GAIA
0
20
40
60
80
100
64.4
64.6
71.8
73.5
75.9
80.6
IFBench
0
20
40
60
80
100
91.3
93.5
94.5
93.3
93.3
94.8
IFEval
0
20
40
60
80
100
35.8
40.4
53.5
50.0
56.1
44.3
SciCode
0
20
40
60
80
100
48.7
46.0
21.6
37.8
62.2
56.8
MolBench-Bind