OpenAI has announced the release of GPT-5.4, a new foundation model designed to deliver higher performance and greater efficiency for professional use cases. The company describes it as its “most capable and efficient frontier model for professional work,” and is launching it in three variants: the standard GPT-5.4 model, GPT-5.4 Thinking for complex reasoning tasks, and GPT-5.4 Pro, optimized for high-performance workloads.

One of the most notable upgrades is the massive context window available through the API, which can reach up to 1 million tokens. This significantly expands the amount of information developers can process in a single request, enabling more complex workflows such as analyzing long documents, building detailed reports, or handling large-scale datasets. OpenAI also reports improved token efficiency, meaning GPT-5.4 can complete the same tasks using fewer tokens than previous versions, reducing operational costs for AI-powered applications.

Benchmark performance has also improved across several areas. GPT-5.4 achieved record results in computer interaction benchmarks such as OSWorld-Verified and WebArena Verified, while also reaching 83% on OpenAI’s GDPval test, which measures the model’s ability to perform knowledge-work tasks. According to Mercor CEO Brendan Foody, the model also leads the APEX-Agents benchmark, which evaluates professional capabilities in fields like law and finance. In practice, this translates to stronger performance in generating complex outputs such as financial models, legal analysis, and strategic presentations.

Reliability has been another major focus of the release. OpenAI says GPT-5.4 is 33% less likely to make factual errors in individual claims compared to GPT-5.2, while overall responses are 18% less likely to contain mistakes. The company has also introduced a new system called Tool Search to improve how models interact with external tools through the API. Instead of loading all tool definitions into a prompt, the model can now look them up dynamically, reducing token usage and making AI-driven systems faster and more cost-efficient.

Finally, OpenAI has expanded its safety evaluations, particularly around chain-of-thought reasoning, the internal step-by-step explanations models use when solving complex tasks. Researchers have long been concerned that reasoning models could potentially misrepresent these processes. Early tests indicate that deceptive behavior is less likely in the Thinking version of GPT-5.4, suggesting that the model has limited capacity to intentionally manipulate its reasoning explanations under normal conditions.

With larger context windows, improved efficiency, and stronger reasoning capabilities, GPT-5.4 represents another step toward AI systems capable of handling complex professional workflows at scale.

Source