Elon Musk"s xAI today announced Grok 4, its latest flagship multimodal AI model. xAI claims that Grok 4 is a top-tier AI model with state-of-the-art performance in academic, mathematical, and reasoning benchmarks. The Grok 4 Heavy version, with multi-agent tools, delivers even more impressive gains in popular AI benchmarks.
Academic and Reasoning Benchmarks:
- Humanity’s Last Exam (HLE): Grok 4 (no tools) achieved 25.4%, outperforming Google’s Gemini 2.5 Pro (21.6%) and OpenAI’s o3-high (21%). Grok 4 Heavy (multi-agent + tools) reached 44.4%, compared to Gemini 2.5 Pro with tools at 26.9%.
- ARC-AGI-2: Grok 4 scored 16.2%, nearly double the next-highest model (Claude Opus 4).
- MMLU-style evaluations: Achieved a 0.866 score (86.6%) on MMLU with an overall Intelligence Index of 73, leading the industry.
STEM & Coding Benchmarks:
- GPQA: Grok 4 scored 87.5%, while the more powerful Grok 4 Heavy variant reached 88.9%.
- AIME: Grok 4 Heavy achieved a perfect 100% score, while Grok 4 achieved 98.8%.
- SWE-Bench: A specialized variant, Grok 4 Code, which will be released in August 2025, has achieved a 72-75% score on the SWE-bench.
According to Artificial Analysis, Grok 4 has achieved an Artificial Analysis Intelligence Index of 73, which is ahead of OpenAI o3 at 70 and Google Gemini 2.5 Pro at 70.
Grok 4 is at the point where it essentially never gets math/physics exam questions wrong, unless they are skillfully adversarial.
— Elon Musk (@elonmusk) July 10, 2025
It can identify errors or ambiguities in questions, then fix the error in the question or answer each variant of an ambiguous question. https://t.co/vB6NUOZTOX
Grok 4"s API pricing is the same as that of Grok 3; it costs $3/$15 per 1M input/output tokens ($0.75 per 1M cached input tokens).
For consumers, xAI has new subscription tiers. The basic free tier will only have limited access to Grok 3. The SuperGrok plan costs $30/month and will give you increased access to the Grok 4 and Grok 3 models. The $300/month SuperGrok Heavy plan will offer access to Grok 4 Heavy, Grok 4, and Grok 3.