Commitments
The main set of commitments companies have made relevant to transparency on model evals and safety cases is the AI Seoul Summit commitments. This includes:
Provide public transparency on the implementation of [commitments including assessing capabilities and risks and comparing to thresholds], except insofar as doing so would increase risk or divulge sensitive commercial information to a degree disproportionate to the societal benefit.
Other transparency commitments include:
- White House voluntary commitments (no xAI):
- "The companies commit to publicly reporting their AI systems’ capabilities," and in particular "to publish reports for all new significant model public releases" (excluding models no more powerful than GPT-4), including "the safety evaluations conducted," including evals for dangerous capabilities.
- Anthropic:
- Responsible Scaling Policy: "Public disclosures: We will publicly release key information related to the evaluation and deployment of our models (not including sensitive details). These include summaries of related Capability and Safeguards reports when we deploy a model as well as plans for current and future comprehensive capability assessments and deployment and security safeguards."
- Voluntary Commitments: "Model Cards: With each new model release, we publish a detailed model card or addendum. These cards provide information about model capabilities and performance across various benchmarks; known limitations and potential risks; results of safety evaluations and red teaming; information on model training; and more."
- OpenAI Preparedness Framework:
- "Public disclosures: We will release information about our Preparedness Framework results in order to facilitate public awareness of the state of frontier AI capabilities for major deployments. This published information will include the scope of testing performed, capability evaluations for each Tracked Category, our reasoning for the deployment decision, and any other context about a model’s development or capabilities that was decisive in the decision to deploy. Additionally, if the model is beyond a High threshold, we will include information about safeguards we have implemented to sufficiently minimize the associated risks."
Companies have also nominally made commitments on evals and responses in their safety frameworks, but these are weak/vague, and in the White House voluntary commitments.