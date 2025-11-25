Soon after the GPT-5.1 and Gemini 3 launch, Anthropic has launched its Claude Opus 4.5 model. The AI startup claims that its new model is the best in the world for coding, agents, and computer-use-related tasks.

Where does it rank? Claude Opus 4.5 achieves an 80.9% score on SWE-bench Verified, a real-world software engineering benchmark. Notably, Opus 4.5 is the first-ever model to breach the 80% mark on SWE-bench Verified. In comparison, Google's newly released Gemini 3 Pro got a score of 76.2%, while OpenAI's GPT-5.1 Codex Max got a score of 77.9%.

The new model also ranks higher than any human candidate on Anthropic's 2-hour time limit test which is given to prospective performance engineering candidates.

“The take-home test is designed to assess technical ability and judgment under time pressure. It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.” the company says.

Agentic AI capabilities: The new model, Anthropic claims, outpaces rivals in the τ2-bench, a benchmark which measures the performance of agents in real-world, multi-turn tasks. In one of the scenarios, the model has to act as an airline service agent helping a distressed customer where the benchmark expects models to refuse a modification to a basic economy booking where the airline doesn't allow changes to that class of booking.

The company says Opus 4.5 “found an insightful (and legitimate) way to solve the problem: upgrade the cabin first, then modify the flights.”

Safer than previous models: Anthropic also claims that Claude Opus 4.5 is its “most robustly aligned model” so far.

“With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behaviour. Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry:” the company said in its blogpost.