Reader Questions
What this page is trying to separate
Leaderboard
Score, solve time, and cost on the selected test
This section ranks measured stock routes, projected unlocked Mythos rows, Opus, Codex, and open-weight systems on one benchmark at a time. CyBT-CTF is the primary blinded read; BOTSv3 is a public-benchmark cross-check.
Solve Rate
Solve rates on the selected test
Solid rows are measured; dotted Mythos rows estimate unlocked Fable behavior from same-task Fable-versus-Opus evidence. Bars are sorted so better results sit farther right.
Tradeoff Map
Interactive score, time, and cost scatter
Use the toggles to compare accuracy, solve time, median time, and cost. Lower-is-better metrics are oriented so better points move right or up.
Pass-through Risk
How often Fable answered versus passed to Opus
Pass-throughs and no-model outcomes are measured only for Fable here, so this chart does not plot comparison systems. It shows how often Fable answered itself, passed the attempt to Opus, or produced no answer across the available summary runs.
Coverage
Blinded score coverage and public BOTSv3 ATT&CK cross-check
Proprietary Systems
Fable vs Opus and Codex
Open-Weight Systems
Fable vs opencode open-weight models
Refusals and Opus Pass-throughs
How often Fable declined regular cyber tasks
This is the operational caveat. Some attempts are confirmed refusal pass-throughs to Opus; others ended as provider/no-model failures before a clean Fable answer existed. BOTSv3 has pass-through counts, but older records do not preserve the exact reason, and refusal counts by ATT&CK tactic are not available yet.
Answer Path
Fable answers vs Opus pass-throughs vs no answer
Reasons
Why attempts were passed to Opus
Benchmark Validity
CyBT-CTF versus public BOTSv3
CyBT-CTF is the primary read because it is blinded. BOTSv3 is public and useful only as a contamination check: if a model jumps on BOTSv3 but not CyBT-CTF, treat that as a warning sign rather than a clean capability gain.