Meta
Company information
Company's most powerful model
Llama 4 Maverick
Released Apr 5, 2025
No eval report; model card published Apr 5, 2025
Training and internal deployment: dates not reported
The evals are probably very bad but we don't even know because Meta won't tell us what it did.
This is all Meta published — no details:
Critical Risks
We spend additional focus on the following critical risk areas:
1. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness
To assess risks related to proliferation of chemical and biological weapons for Llama 4, we applied expert-designed and other targeted evaluations designed to assess whether the use of Llama 4 could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. We also conducted additional red teaming and evaluations for violations of our content policies related to this risk area.2. Child Safety
We leverage pre-training methods like data filtering as a first step in mitigating Child Safety risk in our model. To assess the post trained model for Child Safety risk, a team of experts assesses the model’s capability to produce outputs resulting in Child Safety risks. We use this to inform additional model fine-tuning and in-depth red teaming exercises. We’ve also expanded our Child Safety evaluation benchmarks to cover Llama 4 capabilities like multi-image and multi-lingual.3. Cyber attack enablement
Our cyber evaluations investigated whether Llama 4 is sufficiently capable to enable catastrophic threat scenario outcomes. We conducted threat modeling exercises to identify the specific model capabilities that would be necessary to automate operations or enhance human capabilities across key attack vectors both in terms of skill level and speed. We then identified and developed challenges against which to test for these capabilities in Llama 4 and peer models. Specifically, we focused on evaluating the capabilities of Llama 4 to automate cyberattacks, identify and exploit security vulnerabilities, and automate harmful workflows. Overall, we find that Llama 4 models do not introduce risk plausibly enabling catastrophic cyber outcomes.
On CBRNE, Meta doesn't even share a conclusion about the model's capabilities — it just claims to have assessed them.
Meta's evals have been poorly designed or wildly underelicited in the past; presumably these evals were no better.
Llama 4 Maverick evaluation categories
Click on a category to see my analysis and the relevant part of the company's eval report.
Chem & bio
Meta says it evaluated for these capabilities, but it hasn't reported any details. Given this and its poor track record, it isn't credible that Meta did good evaluations.
Cyber
Meta says it ruled out dangerous cyber capabilities, but it hasn't reported any details. Given this and its poor track record, it isn't credible that Meta did good evaluations.
AI R&D
None
Scheming capabilities
None
Misalignment propensity
None
Llama 4 Maverick safety case categories
This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API (i.e., when the user doesn't control the model weights) is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.
Misuse (via API) prevention
Meta seems to think this model doesn't need mitigations against misuse. The so-called model card mentions so-called safeguards but doesn't claim any are helpful for preventing misuse. It does not mention Meta's Frontier AI Framework, much less the capability thresholds in that document.
Scheming risk prevention
Nothing; Meta doesn't seem to have thought about this category of risks.
Security
Meta seems to think this model doesn't need security controls; indeed, it published the weights. But it didn't mention its Frontier AI Framework when releasing this model. It doesn't seem to have a security plan.