Claude Mythos Preview Benchmarks

How does Claude Mythos Preview compare to its predecessors and competitors? Here are the benchmark results from Anthropic's system card and independent reports.

Key finding:Claude Mythos Preview achieves 93.9% on SWE-bench Verified, 100% on Cybench CTF (pass@1), and 97.6% on USAMO 2026 — representing what observers describe as a "step-change" over previous frontier models across software engineering, cybersecurity, and mathematical reasoning benchmarks.

Software Engineering Benchmarks

Benchmark	Mythos Preview	Opus 4.6	GPT-5	Gemini 2.5 Pro
SWE-bench Verified	93.9%	80.8%	~85%	~78%
SWE-bench Pro	77.8%	53.4%	~60%	~52%

Sources: Anthropic system card (April 2026), independent analyses. Values prefixed with ~ are estimated from public reports.

Cybersecurity Benchmarks

Benchmark	Mythos Preview	Opus 4.6	GPT-5
Cybench (CTF)	100% (pass@1)	75%	~70%
CyberGym	0.83	0.67	~0.62

Sources: Anthropic system card (April 2026). Cybench measures capture-the-flag performance; CyberGym measures defensive/offensive capability.

Reasoning & General Benchmarks

Benchmark	Mythos Preview	Opus 4.6	GPT-5
USAMO 2026	97.6%	42.3%	~55%
OSWorld	79.6%	72.7%	~74%

Sources: Anthropic system card (April 2026), USAMO 2026 competition results. OSWorld measures desktop automation.

Methodology Note

Benchmark values for Claude Mythos Preview and Claude Opus 4.6 are sourced from Anthropic's published system card (April 7, 2026). Values for GPT-5 and Gemini 2.5 Pro are estimated from publicly available benchmarks and may not reflect identical testing conditions. Values marked with "~" indicate estimates.

This data is provided for informational comparison only. Different evaluation protocols, prompting strategies, and scoring methods can significantly affect reported results. We encourage readers to consult primary sources for definitive benchmarking.