OpenAI released GPT-5.4 on Thursday, its most capable and efficient model for professional work to date. It comes in three versions: a standard build, a reasoning variant called GPT-5.4 Thinking, and a high-performance GPT-5.4 Pro. The API version ships with a 1 million token context window and a reworked tool-calling system that cuts overhead significantly for complex agentic workflows.
The release landed in the middle of another turbulent week for OpenAI. ChatGPT uninstalls in the US were up 295% following OpenAI's deal with the Department of War. Meanwhile, Claude hit number one on the US App Store, overtaking ChatGPT.
What can GPT-5.4 do?
GPT-5.4 comes in three variants: standard, GPT-5.4 Thinking for reasoning tasks, and GPT-5.4 Pro for high-performance use cases.
The API ships with a 1 million token context window, the largest OpenAI has offered at this tier.
It posted record scores on computer use benchmarks and topped Mercor's APEX-Agents test for professional tasks in law and finance.
Tool Search lets the model look up tool definitions as needed rather than front-loading them, cutting token costs in large agentic systems.
A new chain-of-thought safety evaluation found the Thinking variant less likely to misrepresent its reasoning, supporting the case for CoT monitoring as a safety tool.
Why does this matter?
A 1 million token context window means the model can process the equivalent of several long novels in a single query. For businesses running complex document workflows or building AI agents that need to hold large amounts of context, that is a practical step change.
AI models that show their reasoning can also misrepresent it, working through a problem one way while displaying something tidier to the user. OpenAI's new evaluation tests for exactly this, and the results suggest the Thinking variant is less prone to it. That matters as these models get used for higher-stakes decisions.
Our take
GPT-5.4 is genuinely capable and in a quieter week would have led this edition of this newsletter without contest. The benchmarks are strong, the tool-calling update addresses a real developer pain point, and the safety evaluation shows OpenAI is at least aware of the trust questions swirling around reasoning models. On the product side, this is a good week for the company.
Following on from a bizarre previous week, the broader picture is harder for OpenAI to manage. Sam Altman admitted the DoD announcement was rushed and has since tweaked the deal's wording. Meanwhile, ChatGPT users are leaving under the belief that Anthropic not only offers a better model, but that the company has better values than OpenAI.
And another big thing... Anthropic designated an supply chain risk
At the end of last week, the Pentagon made their threats official and gave Anthropic a supply chain risk designation. This unprecedented action led hundreds employees from OpenAI, Google, IBM, Salesforce Ventures and others signing an open letter urging the DoD to withdraw the designation. It also called on Congress to examine whether invoking these authorities against an American technology company was appropriate. But in our view, it’s not just inappropriate - it’s an incredibly stupid thing to do.
Anthropic is the only American company ever to be publicly named a supply chain risk, a label that has traditionally been applied to foreign adversaries. Critics noted the DoD pointed to no technical failing or security breach to justify it. Anthropic has said it will challenge the designation in court, and negotiations between the two sides have reportedly resumed. The bizarre ongoing saga was well summed up by Nicholas Thompson, the CEO of The Atlantic.




