xAI

Company information

Company's most powerful model

Grok 3

Released Feb 19, 2025

Eval report: none

Training and internal deployment: dates not reported

xAI apparently didn't do model evals for dangerous capabilities for Grok 3.

Grok 3 safety case categories

This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.

Misuse (via API) prevention

xAI's draft Risk Management Framework said xAI planned to set eval thresholds and do evals in the future. Its planned response to dangerous capabilities was to implement mitigations to get the model below the thresholds. Unfortunately, this is an invalid approach to mitigating dangerous capabilities: the crucial question is whether mitigations are robust; the Framework doesn't address that. xAI hasn't said anything about its current practices.

Scheming risk prevention

xAI's draft Risk Management Framework said it planned to evaluate for "risk factors for loss of control." Unfortunately, I have no confidence in the metrics mentioned in the Framework. Additionally, the Framework does not mention what xAI should do if the evaluations show concerning results. xAI doesn't seem to be thinking about risks from subtle scheming.

Security

Nothing