Opus 4.5 puts the world roughly back on track for the red line 😬 Every ~4 months, the length of coding tasks AI agents can perform (compared to human professionals) *doubles* More context on this finding in @METR_Evals thread x.com/METR_Evals/sta…
We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluations for other recent models, this is our highest published time horizon to date.