Analyzing AI companies' claims about their models' safety

There are two ways to show that an AI model is safe: show that it doesn't have dangerous capabilities, or show that it's safe even if it has dangerous capabilities. Currently, AI companies almost exclusively claim that their models don't have dangerous capabilities,[✲] on the basis of tests called model evals.

I think AI companies' evals are often poor. Often an eval shows that a model has somewhat concerning capabilities, but the companies interpret it as showing that the model is safe and don't explain why or say what results would indicate danger. While I don't believe that AIs have catastrophically dangerous capabilities yet, I'm worried that companies' evals will still be bad in the future. If companies used the best existing evals and—crucially—followed best practices for running the evals, reporting results, interpreting results, and accountability, the situation would be much better.[✲]

Additionally, companies' plans for when models have dangerous capabilities are poor. Some companies are doing some reasonable work on preventing catastrophic misuse, but companies' plans on security and preventing risks from misalignment are essentially no more than "trust us."

I'm Zach Stein-Perlman. In this website, I collect and assess the public information on five AI companies' model evals for dangerous capabilities plus their claims and plans in three areas of safety. Click a logo for details on a specific company, or click directly below for an introduction to evals and this site.

Get updates

Get emails with analysis on companies' new reports

Companies

Anthropic Logo
OpenAI Logo
Google DeepMind Logo
Meta Logo
xAI Logo

This website is a low-key research preview. It's up to date as of July 13.