OpenAI released GPT-5.4 on Thursday, introducing a standard version alongside GPT-5.4 Thinking and GPT-5.4 Pro variants. The company described the model as its most capable and efficient frontier model for professional work.

The API version supports context windows up to 1 million tokens, the largest available from OpenAI. The model also demonstrates improved token efficiency, solving problems with significantly fewer tokens than its predecessor.

GPT-5.4 achieved record scores on computer use benchmarks OSWorld-Verified and WebArena Verified. It also scored 83% on OpenAI’s GDPval test for knowledge work tasks.

The model led Mercor’s APEX-Agents benchmark, which tests professional skills in law and finance, according to Mercor CEO Brendan Foody. Foody stated that GPT-5.4 excels at creating long-horizon deliverables like slide decks and financial models, delivering top performance faster and at lower cost than competitors.

OpenAI said the model is 33% less likely to make errors in individual claims compared to GPT 5.2. Overall responses are 18% less likely to contain errors.

The company introduced Tool Search to manage tool calling in the API. The system looks up tool definitions as needed, reducing token use and cost in systems with many tools.

OpenAI added a new safety evaluation to test chain-of-thought monitoring. The evaluation showed deception is less likely in the GPT-5.4 Thinking version, suggesting the model lacks the ability to hide its reasoning.


Featured image credit