An open benchmark for comparing full agent systems across diverse real-world tasks. Reports both quality and cost.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
Open Agent Leaderboard
An open benchmark for comparing full AI agent systems across diverse real-world tasks. Reports both quality and cost.
Unlike model-only benchmarks, we evaluate the complete agent — the model, the tools, the planning strategy, the error recovery — as a single system. The same model can produce very different results depending on the agent wrapped around it.
- Website: exgentic.ai
- Results: open-agent-leaderboard/results
- Leaderboard: open-agent-leaderboard/leaderboard
- Blog: open-agent-leaderboard/blog
- Framework: Exgentic
- Paper: arXiv:2602.22953
Submit results
Run evaluations using Exgentic and open a PR on the results dataset.