When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Apple's AI models still trail behind OpenAI's GPT-4o despite latest update

Apple Intelligence Model

At WWDC 2025, Apple announced several updates related to Apple Intelligence for both developers and consumers. With the new Foundation Models framework, developers can now bring AI experiences to their apps that work offline in a privacy-preserving way and are available free of charge. The Foundation Models framework is built on Apple’s own in-house-developed AI models.

Apple also unveiled a new generation of language foundation models. According to Apple, these updated models are faster, more efficient, and offer improved tool use, better reasoning capabilities, multimodal support for image and text inputs, and support for 15 languages.

Apple Intelligence includes two foundation models:

  • A 3-billion-parameter model that runs on-device using Apple Silicon.
  • A server-based mixture-of-experts model optimized for Private Cloud Compute.

Apple noted that the on-device 3B language model is not designed to be a general-purpose chatbot. Instead, it is intended to perform text-related tasks such as summarization, entity extraction, text understanding, refinement, short dialogues, and creative content generation, among others.

The big question is how well Apple’s models perform compared to other leading models on the market. Rather than using standard AI benchmarks, Apple shared results from its own internal evaluations of fundamental language and reasoning capabilities.

Apple Intelligence Model

According to Apple’s text-based evaluations, its on-device 3B model performs favorably against the slightly larger Qwen-2.5-3B and competitively against the larger Qwen-3-4B and Gemma-3-4B in English. Its server-based model performs slightly better than Llama-4-Scout but falls short compared to Qwen-3-235B and OpenAI’s proprietary GPT-4o.

In evaluations involving image input, Apple’s on-device model outperforms InternVL and Qwen, and performs competitively against Gemma. While Apple’s server model beats Qwen-2.5-VL, it underperforms when compared to Llama-4-Scout and GPT-4o.

These results highlight how far Apple still has to go in foundational AI capabilities. It seems Apple compared its models to GPT-4o to make its performance appear relatively decent. If Apple were to compare its results against OpenAI’s latest O-series models or Google’s Gemini 2.5 Pro, the gap would likely appear much wider. It will be interesting to see how Apple navigates the AI era with its in-house capabilities in the years ahead.

Report a problem with article
ebook offer
Next Article

Last chance: Solutions Architect's Handbook, Third Edition (worth $42.99) download

Windows 11 logo
Previous Article

How to enable the redesigned Windows 11 Start menu

Join the conversation!

Login or Sign Up to read and post a comment.

1 Comment - Add comment