Routing, voting, and cascading all hand back one model's answer. So your ceiling is set by how often every model is wrong at once. Call that β. It is not how often models agree (ρ).
Picture a panel of experts where you can only return one expert's answer. Choosing the best one helps, right up until a question lands on a blind spot they all share. Then no rule wins, because the right answer was never in the room.
That is the ceiling, and it is exact. Give a query to a pool of m models. If every one is wrong, no selection policy (router, weighted vote, cascade, debate) can be right, since each returns one member's answer. Accuracy is capped at 1−β, where β = P(all m wrong).
The field reports pairwise correlation ρ instead, and ρ is provably blind to β. You can hold the entire pairwise law fixed and still move β, a Fréchet-class fact we make exact in the paper. A single-factor copula calibrated on ρ underprices the co-failure tail, a bias that grows with pool size, driven by a common-mode atom that no pairwise number represents.
Grade the models once on a held-out set and count the questions all of them missed. That count alone caps what any router could add. No training, no cost. Move the inputs and watch the ceiling.
The usual estimate reads joint failure off pairwise agreement. It runs low, and runs lower as the pool grows, because the models share blind spots that no pair reveals. Drag the slider to see it open up.
scroll to pan ↔
On open-ended math and code, every model trips on some of the same questions, so the ceiling bites. On multiple-choice, someone always lands the answer, so β is near zero and combining only breaks ties.
Take hard science questions. As multiple-choice, models can guess or eliminate, so someone is always right. Remove the options, make them answer cold, and 10 of 79 now stump every model at once.
scroll to pan ↔
Every number here recomputes live over one 2026 OpenRouter pool, from $30/Mtok flagships down to $0.03/Mtok open weights. The roster, the matrices, the grading, and the code are all released to rerun.