OpenAI
Eval and safeguard planning
ChatGPT Agent
Released Jul 17, 2025
System card published Jul 17, 2025
Training and internal deployment: dates not reported
ChatGPT Agent performs similarly to previous systems on dangerous capability evals. It's most notable for being designated by OpenAI as High capability in biology.
Since the model is High capability, OpenAI is supposed to implement misuse safeguards and security controls. For the first time, OpenAI's safeguards are load-bearing: OpenAI says the system is safe because of its safeguards rather than just because it lacks dangerous capabilities. The system card summarizes OpenAI's misuse safeguards and shows that they are at least moderately robust, but some claims are dubious and OpenAI doesn't publish lots of details. The system card has just one vacuous paragraph on security, and it doesn't explicitly say whether OpenAI thinks it is meeting its security standard for High capability models.
As usual, OpenAI does some good evals, but its interpretation of them is very unclear — it's very unclear how OpenAI thinks eval results translate to risk or what results would be concerning to OpenAI.
OpenAI says the model has High capabilities in bio but not cyber or AI R&D. OpenAI has done evals for scheming capabilities and misalignment propensity in the past but apparently didn't for ChatGPT Agent. OpenAI has never mentioned measuring "autonomy" capability, even though that seems to be the main way misalignment safeguards are supposed to be triggered.
ChatGPT Agent evaluation categories
Click on a category to see my analysis and the relevant part of the company's eval report.
Chem & bio
OpenAI says the model might have dangerous capabilities in biology.
Cyber
OpenAI claims that the model doesn't have dangerous cyber capabilities. It doesn't really say why it believes that or what would change its mind. It does reasonable evals; it's not clear how to interpret the results.
AI R&D
OpenAI says the model isn't as helpful as a strong research assistant. The model is slightly weaker on the evals than o3 was. The model might be substantially underelicited, and it's not clear what would change OpenAI's mind.
Scheming capabilities
None — OpenAI doesn't say it tried to evaluate for scheming capabilities. Its previous models have had some such capabilities.
Misalignment propensity
None — OpenAI doesn't say it tried to evaluate misalignment propensity or check for blatant misalignment in high-stakes settings. Its previous models have had moderately concerning propensities.
ChatGPT Agent safety categories
This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API (i.e., when the user doesn't control the model weights) is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.
Misuse (via API) prevention
OpenAI says the model might have High capability in bio (but not cyber). This entails that the model should have safeguards against bio misuse. OpenAI uses safety training and a monitor, and it shows that these safeguards are moderately robust.
Scheming risk prevention
OpenAI says the model doesn't have dangerous capabilities, so safeguards are unnecessary.
It's unclear which of OpenAI's capability thresholds trigger its misalignment standards. Misalignment standards are supposed to "sufficiently minimize the risk associated with a misaligned model circumventing human control and oversight and executing severe harms." OpenAI mentions some good paths to safety and some confused paths, so overall the standard is concerning.
Security
OpenAI says the model might have High capability in biology. This is supposed to trigger High security controls. The system card doesn't directly say whether OpenAI has implemented High security controls; it just says OpenAI has implemented "associated safeguards" or "safeguards consistent with High capability models." The system card just has one vacuous paragraph on OpenAI's security.