Google DeepMind

Eval and safeguard planning

Gemini 2.5 Deep Think

Released Aug 1, 2025

Model card published Aug 1, 2025

Training and internal deployment: dates not reported

DeepMind says the model may have reached the CBRN CCL. It claims to have implemented safeguards against CBRN misuse accordingly. It shares little information on the details; based on what it does share, it's implausible that the mitigations are effective against sophisticated users.

DeepMind also claims the model is near but below the cyber uplift CCL. Being below the CCL is plausible but certainly not shown by the eval results DeepMind shares. DeepMind briefly claims to have implemented safeguards against cyber misuse, but the load-bearing part is the inability claim; DeepMind doesn't elaborate and presumably the safeguards are much less effective than for CBRN.

Elicitation checklist:🟡 🟡 🟡 ❌

Accountability checklist:❌ ❌

Gemini 2.5 Deep Think evaluation categories

Click on a category to see my analysis and the relevant part of the company's eval report.

Chem & bio

DeepMind says the model may have reached the CBRN CCL. DeepMind only reported results on multiple-choice evals, and the elicitation and interpretation seem inadequate.

Cyber

DeepMind claims that the model passes an "alert threshold" but not the CCLs. The evals seem good, but it's unclear what the results mean in the real world and what results would change DeepMind's mind.

AI R&D

DeepMind claims the model probably can't substantially accelerate AI R&D. That seems correct, but DeepMind doesn't say what level of performance would trigger greater scrutiny, and it may have underelicited the model.

Scheming capabilities

DeepMind claims the model doesn't have dangerous capabilities. Its stealth and situational awareness evals support that.

Misalignment propensity

None — DeepMind doesn't say it tried to evaluate misalignment propensity or check for blatant misalignment in high-stakes settings. It says "Safety testing for Deceptive Alignment risk is still ongoing" and "External safety testing will provide greater insight into the [misalignment] propensity . . . of this model."

Gemini 2.5 Deep Think safety categories

This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API (i.e., when the user doesn't control the model weights) is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.

Misuse (via API) prevention

DeepMind says that it has implemented safeguards against CBRN misuse in accordance with its Frontier Safety Framework. It doesn't share enough information to let us tell that the mitigations are effective.

DeepMind's CCLs in CBRN, cyber, or AI R&D capabilities trigger safeguards against misuse. The abstract plan is good: implement safeguards like classifiers, then check whether they are robust. But it's vague; it's not clear that DeepMind will implement it validly. And the whole plan is not a commitment.

Scheming risk prevention

DeepMind claims that the model isn't capable of undermining human control, so safeguards are unnecessary.

Per the Framework, instrumental reasoning capabilities trigger control techniques and a control safety case. This is great, and DeepMind is the only company that says this. But DeepMind doesn't share details on how it might mitigate the risk or tell whether remaining risk is acceptable. And the whole plan is not a commitment.

Security

DeepMind says that it has implemented SL2 security in accordance with its Frontier Safety Framework.

DeepMind's CCLs in CBRN, cyber, or AI R&D capabilities trigger various "recommended security levels." The recommended security levels seem very low; all but one will leave DeepMind's model weights vulnerable to many actors. And the whole plan is not a commitment and there's no transparency on security.