OpenAI announces o3 and o4-mini, its most capable models with state-of-the-art reasoning

OpenAI has launched its new flagship reasoning models, o3 and o4-mini, which achieve state-of-the-art performance and support full tool access.

As expected, OpenAI today announced o3 and o4-mini, its latest reasoning models that deliver state-of-the-art (SOTA) results on several AI benchmarks. For the first time, these reasoning models can also access external tools, including web browsing, a Python interpreter, and more.

OpenAI emphasized that these new models are trained to reason about when and how to use tools to generate detailed responses in the correct output formats. With state-of-the-art reasoning and full tool access, users can expect better responses to their queries compared to any previous OpenAI models.

OpenAI o3 is the company’s most powerful reasoning model, setting new SOTA benchmarks on Codeforces, SWE-bench, and MMMU. Since it supports image uploads, users can utilize it for analyzing images, charts, and graphics. According to external experts, o3 makes 20% fewer major errors than OpenAI o1 on complex, real-world tasks.

OpenAI o4-mini is a smaller model designed for fast, cost-efficient reasoning. Despite its size, it achieves performance comparable to o3 in math, coding, and visual tasks. For example, on AIME 2025, o4-mini scored 99.5% when combined with a Python interpreter. Because it’s more efficient than o3, users can expect significantly higher usage limits, making it ideal for high-volume reasoning tasks.

OpenAI claims that these two new models should feel more natural and conversational, as they can reference memory and past conversations. Under the hood, OpenAI used large-scale reinforcement learning to improve performance and trained both models to use tools through reinforcement learning as well.

In terms of efficiency, for most real-world use cases, the new o3 and o4-mini models will be both smarter and more cost-effective than o1 and o3-mini, respectively. In terms of safety, OpenAI reports that both o3 and o4-mini remain below the "High" threshold across all three categories of its Safety Framework.

Alongside these models, OpenAI also announced a new experiment called Codex CLI—a lightweight coding agent for developers to use directly from their PCs. In addition, the company is launching a $1 million initiative to support projects that leverage Codex CLI and OpenAI models.

The new o4-mini, o4-mini-high, and o3 models are now available for ChatGPT Plus, Pro, and Team users through the model selector. These models replace o1, o3-mini, and o3-mini-high. ChatGPT Enterprise and Edu users will gain access next week. ChatGPT Free users can try o4-mini by selecting "Think" in the text composer. OpenAI plans to release OpenAI o3-pro in a few weeks, with full tool support.

The o3 model is priced at $10 per million input tokens and $40 per million output tokens. The o4-mini model maintains the same pricing as o3-mini: $1.10 per million input tokens and $4.40 per million output tokens.

Developers can now access these new models via the Chat Completions API and the Responses API. The Responses API now supports reasoning summaries and the ability to preserve reasoning tokens around function calls for improved performance. Soon, OpenAI will add support for first-party tools like web search, file search, and the code interpreter within the model’s reasoning process.