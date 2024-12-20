On the final day of the '12 Days of OpenAI' event, OpenAI today announced the upcoming o3 family of reasoning models. Similar to the existing o1 family, the o3 family will include o3 and o3 mini models.

OpenAI also shared some benchmark numbers for the o3 models.

The o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation. With a high-compute o3 configuration, it scored 87.5% on the Semi-Private Eval.

On the EpochAI Frontier Math benchmark, o3 solved 25.2% of problems, while existing models only solved 2%.

On SWE-Bench Verified, o3 scored 71.7, which is 22.8 points higher than o1.

On Codeforces, o3 achieved an Elo rating of 2727.

On the AIME 2024, o3 achieved a score of 96.7%. For comparison, o1 scored 83.3.

On GPQA Diamond, o3 scored 87.7%. In comparison, o1 scored 78%.

ARC prize team wrote the following regarding the new o3 models from OpenAI:

OpenAI's new o3 model represents a significant leap forward in AI's ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.

The o3 mini model will provide an option for users to select between three reasoning levels: High, Medium, and Low. The Low level will be the fastest but less accurate, while the High level will be the slowest but more accurate.

OpenAI has not yet released the o3 models. However, it has started sharing the o3 models for safety and security testing, beginning today. Interested safety and security researchers can also apply to get access to the o3 models before the public launch. The o3 models are expected to be available to the public in 2025.