In general, large technology companies do not make announcements over the weekend. However, Meta surprised everyone by unveiling its Llama 4 series of models this past weekend. The Llama series includes three models: Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth.

Llama 4 Scout is the smallest model in the series, featuring 17 billion active parameters with 16 experts. Meta claims Scout is the best multimodal model in its class, outperforming Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 in standard AI benchmarks. Despite its impressive performance, this model can run on a single NVIDIA H100 GPU. Notably, it supports an industry-leading context window of 10 million tokens. The real-world effectiveness of such a large context window remains to be seen.

Llama 4 Maverick is the mainstream model, also with 17 billion active parameters but scaled up to 128 experts. Meta claims Maverick is the best multimodal model in its category, surpassing the widely used GPT-4o and Gemini 2.0 Flash in industry benchmarks. Its experimental chat version has scored 1417 on LMArena, ranking No. 2 among all leading LLMs.

Meta also announced Llama 4 Behemoth, the largest model in the lineup, which is still under training. Behemoth features 288 billion active parameters with 16 experts. According to Meta, this massive model outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several standard AI benchmarks.

Llama 4 Scout and Llama 4 Maverick are now available for download on llama.com and Hugging Face. For general consumers, these models are already powering Meta AI across WhatsApp, Messenger, Instagram Direct, and the web.

Microsoft today announced that the new Llama 4 Scout and Maverick models are now available in Azure AI Foundry as managed compute offerings. Developers can find them as Llama-4-Scout-17B-16E, Llama-4-Scout-17B-16E-Instruct, and Llama 4-Maverick-17B-128E-Instruct-FP8. You can learn more about Microsoft's Llama offerings on Azure here.