OpenAI

Eval and safeguard planning

GPT-5

Released Aug 7, 2025

System card published Aug 7, 2025

Training and internal deployment: dates not reported

This page is incomplete.

GPT-5 performs similarly to previous systems on dangerous capability evals. Like ChatGPT Agent, OpenAI designates it as High capability in biology.

Since the model is High capability, OpenAI is supposed to implement misuse safeguards and security controls.

As usual, OpenAI does some good evals, but its interpretation of them is very unclear — it's very unclear how OpenAI thinks eval results translate to risk or what results would be concerning to OpenAI.

Elicitation checklist:🟡 ❌ ❌ ❌

Accountability checklist:❌ 🟡

GPT-5 evaluation categories

Click on a category to see my analysis and the relevant part of the company's eval report.

Chem & bio

OpenAI says the model might have dangerous capabilities in biology.

Cyber

OpenAI claims that the model doesn't have dangerous cyber capabilities. It doesn't really say why it believes that or what would change its mind. It does reasonable evals; it's not clear how to interpret the results.

AI R&D

OpenAI says the model isn't as helpful as a strong research assistant. The model might be substantially underelicited, and it's not clear what would change OpenAI's mind.

Scheming capabilities

OpenAI had a good external evaluator, METR, do some evaluations. METR claims that the model probably lacks dangerous scheming capabilities.

Misalignment propensity

OpenAI had a good external evaluator, Apollo, do some misalignment propensity evaluations, but it doesn't share details.

GPT-5 safety categories

This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API (i.e., when the user doesn't control the model weights) is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.

Misuse (via API) prevention

Coming soon

Scheming risk prevention

OpenAI doesn't mention risks from misalignment in the system card. Presumably OpenAI believes that the model doesn't have relevant dangerous capabilities, so safeguards are unnecessary.

It's unclear which of OpenAI's capability thresholds trigger its misalignment standards. Misalignment standards are supposed to "sufficiently minimize the risk associated with a misaligned model circumventing human control and oversight and executing severe harms." OpenAI mentions some good paths to safety and some confused paths, so overall the standard is concerning.

Security

OpenAI says the model might have High capability in biology. This is supposed to trigger High security controls. The system card doesn't say whether OpenAI has implemented High controls in security specifically; it just says OpenAI has implemented "associated safeguards" or "safeguards . . . to sufficiently minimize the risks." The system card just has one contentless paragraph on OpenAI's security.