Google DeepMind

Eval and safeguard planning

Gemini 2.5 Pro

Released Mar 25, 2025 as experimental; updated May 6 and May 20 and Jun 5 and Jun 17 (stable version)

Final eval report published Jun 17, 2025; reports on earlier model versions published Apr 29 and May 9

Training and internal deployment: dates not reported

DeepMind's evals themselves seem good but the elicitation is somewhat unclear and why it thinks the results indicate safety is very unclear. Overall DeepMind doesn't show that it has ruled out dangerous capabilities, especially in CBRN and cyber.

DeepMind says it has an "alert threshold" for each "Critical Capability Level" (CCL), as an early warning for dangerous capabilities. It says the model has reached the alert threshold for Cyber Uplift Level 1 but not other CCLs. But it doesn't say what the thresholds are.[✲]

DeepMind says some external groups assessed the model's capabilities. It briefly summarizes some aspects of methodology and results, but doesn't share details or publish the groups' reports.

Note: DeepMind is working on a powerful reasoning mode: Deep Think. DeepMind says Deep Think is undergoing safety testing and is not yet widely available, and the dangerous capability evals do not account for it.

Elicitation checklist:🟡 🟡 🟡
Accountability checklist:

Gemini 2.5 Pro evaluation categories

Click on a category to see my analysis and the relevant part of the company's eval report.

Chem & bio

DeepMind claims the model doesn't have dangerous capabilities, but it hasn't shown that. It only reported results on multiple-choice evals, it didn't say what threshold implies safety or compare to human performance, and the elicitation seems inadequate.

Cyber

DeepMind claims that the model passes an "alert threshold" but not the CCLs. The evals seem good, but it's unclear what the results mean in the real world and what results would change DeepMind's mind.

AI R&D

DeepMind claims the model probably can't substantially accelerate AI R&D. That seems correct, but DeepMind doesn't say what level of performance would trigger greater scrutiny, and it may have underelicited the model.

Scheming capabilities

DeepMind claims the model doesn't have dangerous capabilities. Its stealth and situational awareness evals support that.

Misalignment propensity

None — DeepMind doesn't say it tried to evaluate misalignment propensity or check for blatant misalignment in high-stakes settings.

Gemini 2.5 Pro safety categories

This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API (i.e., when the user doesn't control the model weights) is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.

Misuse (via API) prevention

DeepMind claims that the model doesn't enable misuse (but is close on cyber), so safeguards are unnecessary. But eval results seem to contradict that claim.

DeepMind's CCLs in CBRN, cyber, or AI R&D capabilities trigger safeguards against misuse. The thresholds are fine in the abstract but that's inadequate since DeepMind seems to interpret the eval results poorly. When safeguards are load-bearing, DeepMind is supposed to ensure that they are robust; it doesn't say how it will check. And the whole plan is not a commitment.

Scheming risk prevention

DeepMind claims that the model isn't capable of undermining human control, so safeguards are unnecessary.

Per the Framework, instrumental reasoning capabilities trigger control techniques and a control safety case. This is great, and DeepMind is the only company that says this. But DeepMind doesn't share details on how it might mitigate the risk or tell whether remaining risk is acceptable. And the whole plan is not a commitment.

Security

DeepMind claims that the model doesn't enable misuse, so safeguards are unnecessary. But eval results seem to contradict that claim.

DeepMind's CCLs in CBRN, cyber, or AI R&D capabilities trigger various "recommended security levels." The thresholds are fine in the abstract but that's inadequate since DeepMind seems to interpret the eval results poorly. The recommended security levels seem very low; all but one will leave DeepMind's model weights vulnerable to many actors. And the whole plan is not a commitment.