Meta has officially released Llama 4, the latest generation of its AI model family, now powering the Meta AI assistant across platforms like WhatsApp, Messenger, Instagram, and the web.

This new release includes two models:

Llama 4 Scout: A compact, high-performance model that can run on a single Nvidia H100 GPU.
Llama 4 Maverick: A more advanced model designed to rival OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash in terms of performance and efficiency.

Both models are available for download from Meta directly or through Hugging Face.

Performance Highlights

Meta claims Llama 4 Scout delivers strong results, boasting a 10-million-token context window (AI’s working memory) and outperforming models like Google’s Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across several benchmarks — all while remaining efficient enough to run on a single GPU.

Similarly, Llama 4 Maverick reportedly competes with top-tier models like GPT-4o and DeepSeek-V3, especially in coding and reasoning tasks, while using less than half the number of active parameters.

Looking Ahead: Llama 4 Behemoth

Still in training, the upcoming Llama 4 Behemoth is described by CEO Mark Zuckerberg as “the highest performing base model in the world.” It features 288 billion active parameters and a total of 2 trillion parameters. Meta says it already outperforms GPT-4.5 and Claude Sonnet 3.7 in multiple STEM-related benchmarks.

Under the Hood: A New Architecture

Llama 4 models are built using a Mixture of Experts (MoE) architecture — an efficient approach that activates only the parts of the model needed for each task, optimizing both speed and resource usage.

Open Source? Not Quite.

Meta is labeling Llama 4 as “open-source,” but licensing restrictions still apply. For example, companies with more than 700 million monthly active users must seek Meta’s permission to use the models commercially — a condition that critics say conflicts with the true definition of open-source, according to the Open Source Initiative.