AI & ML interests

None defined yet.

Recent Activity

Elron  updated a dataset 4 days ago
open-agent-leaderboard/traces
Elron  updated a Space 4 days ago
open-agent-leaderboard/README
Elron  updated a Space 4 days ago
open-agent-leaderboard/blog
View all activity

Organization Card

Open Agent Leaderboard

An open benchmark for comparing full AI agent systems across diverse real-world tasks. Reports both quality and cost.

Unlike model-only benchmarks, we evaluate the complete agent — the model, the tools, the planning strategy, the error recovery — as a single system. The same model can produce very different results depending on the agent wrapped around it.

Submit results

Run evaluations using Exgentic and open a PR on the results dataset.