xAI
Company information
Company's most powerful model
Grok 3
Released Feb 19, 2025
Eval report: none
Training and internal deployment: dates not reported
xAI apparently didn't do model evals for dangerous capabilities for Grok 3.
Grok 3 safety case categories
This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.
Misuse (via API) prevention
xAI's draft Risk Management Framework said xAI planned to set eval thresholds and do evals in the future. Its planned response to dangerous capabilities was to implement mitigations to get the model below the thresholds. Unfortunately, this is an invalid approach to mitigating dangerous capabilities: the crucial question is whether mitigations are robust; the Framework doesn't address that. xAI hasn't said anything about its current practices.
Scheming risk prevention
xAI's draft Risk Management Framework said it planned to evaluate for "risk factors for loss of control." Unfortunately, I have no confidence in the metrics mentioned in the Framework. Additionally, the Framework does not mention what xAI should do if the evaluations show concerning results. xAI doesn't seem to be thinking about risks from subtle scheming.
Security
Nothing