Unmodified Llama 4 Maverick ranks below rivals following Meta cheating allegations

Meta's Llama 4 release was no doubt controversial for its ranking on the LMArena dashboard. Now, an unmodified version of Llama 4 Maverick has seen its ranking fall below months-old rivals.

David Uzondu Neowin · Apr 12, 2025 06:32 EDT · Hot! with 7 comments

Llama 4 logo

Recently, Meta released Llama 4, a new family of large language models consisting of Scout, Maverick, and Behemoth. From the benchmark results, Llama 4 Maverick (Llama-4-Maverick-03-26-Experimental) came 2nd, beating models like OpenAI's GPT-4o and Google's Gemini 2.0 Flash, and trailing only behind Gemini 2.5 Pro.

But pretty soon, the cracks began to form as users noticed differences in behavior between the Maverick used in benchmarks and the one available to the public. This led to accusations that Meta was cheating, prompting a response from a Meta executive on X:

We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models.

That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were…
— Ahmad Al-Dahle (@Ahmad_Al_Dahle) April 7, 2025

LMArena acknowledged that Meta failed to abide by its policies, apologized to the public, and issued a policy update.

We've seen questions from the community about the latest release of Llama-4 on Arena. To ensure full transparency, we're releasing 2,000+ head-to-head battle results for public review. This includes user prompts, model responses, and user preferences. (link in next tweet)

Early…
— lmarena.ai (formerly lmsys.org) (@lmarena_ai) April 8, 2025

Now, the unmodified release version of the model (Llama-4-Maverick-17B-128E-Instruct) has been added to LMArena, and it ranks 32nd. For the record, older models like Claude 3.5 Sonnet, released last June, and Gemini-1.5-Pro-002, released last September, rank higher.

Llama 4 Maverick Unmodified ranking on LMArena

In a statement to TechCrunch, a Meta spokesperson mentioned that the Llama-4-Maverick-03-26-Experimental was specially tuned for chat and did pretty well on LMArena benchmarks, adding that the company is "excited" to see what developers will build now that an open source version of Llama 4 has been released.