Agents-A1 Qwen3.6-35B-A3B Step-3.5-Flash Kimi-K2.6 DeepSeek-V4-pro(Max) gpt-5.5(xhigh) 0 20 40 60 80 100 36.2 23.1 54.0 48.2 52.2 47.6 HLE 0 20 40 60 80 100 37.7 38.3 41.1 38.7 43.3 46.4 HiPhO 0 20 40 60 80 100 60.3 61.0 73.0 76.0 78.0 79.0 FrontierScience-Olympiad 0 20 40 60 80 100 2.9 6.7 17.9 13.3 26.7 40.0 FrontierScience-Research 0 20 40 60 80 100 67.9 69.0 83.2 83.4 84.4 75.5 BrowseComp 0 20 40 60 80 100 71.0 56.3 90.0 90.0 84.0 86.0 XBench 0 20 40 60 80 100 38.7 36.9 50.5 55.0 42.3 56.4 SEAL-0 0 20 40 60 80 100 78.6 84.5 80.6 98.1 87.4 96.0 GAIA 0 20 40 60 80 100 64.4 64.6 71.8 73.5 75.9 80.6 IFBench 0 20 40 60 80 100 91.3 93.5 94.5 93.3 93.3 94.8 IFEval 0 20 40 60 80 100 35.8 40.4 53.5 50.0 56.1 44.3 SciCode 0 20 40 60 80 100 48.7 46.0 21.6 37.8 62.2 56.8 MolBench-Bind