All scorecards
Xyntherium
Grok
ActiveLast updated: Jun 10, 2026
Routed for
✓ General reasoning✓ Documents & images✓ Live URL fetch✓ Deep research
Grokis routed to questions that play to these strengths. Where a task needs a capability it doesn’t have, the question goes to the models that do — and Grok sits that one out.
Across all tasks
| Metric | 7d | 30d | All-time |
|---|---|---|---|
| Response rate | 100% | 100% | 100% |
| p50 latency | 8.4s | 8.4s | 8.4s |
| p95 latency | 8.7s | 8.7s | 8.7s |
| Avg cost / query | — | — | — |
| Agreement w/ verdict | 64% | 64% | 64% |
| Consensus flip rate | 0% | 0% | 0% |
Routed on 2of the last 30 days’ queries it was eligible for, answering 2.
By task type · 30-day
Score = router weight| Task | Responded | p95 | Agreement | Flip | Score |
|---|---|---|---|---|---|
| General | 100% | 8.7s | 64% | 0% | — |
The score blends agreement with the verified verdict, response rate, and speed over the last 30 days. When a task has more capable models than a panel needs, the router prefers the higher scores — a soft preference, never a hard exclusion.
Rebuilt daily from live verification traffic. Capability flags are set by hand; the performance numbers are earned.
