Model Scorecards

A permanent record of how each model performs on the panel — how often it answers, how fast, how closely it tracks the verified consensus, and where it’s strongest. Every number is earned on live verification traffic.

Last updated: Jul 29, 2026

ClaudeActive

Responded: 100%
p95: 23.4s
Agreement: 82%

View scorecard →

GPTActive

Responded: 100%
p95: 16.2s
Agreement: 81%

View scorecard →

GrokActive

Responded: 100%
p95: 20.5s
Agreement: 82%

View scorecard →

PerplexityActive

Responded: 100%
p95: 22.4s
Agreement: 74%

View scorecard →

GeminiActive

Responded: 100%
p95: 9.0s
Agreement: 85%

View scorecard →

DeepSeekActive

Responded: 100%
p95: 25.5s
Agreement: 78%

View scorecard →

KimiBenched

Gathering data — the first numbers appear after the next daily refresh.

View scorecard →

Scorecards are rebuilt daily from live verification traffic. The same per-task scores shown here are what the router weighs when it has more capable models than a panel needs — so the strongest model for a task is the one most likely to answer it.