xAI
Eval and safeguard planning
Grok 4
Released July 9, 2025
Eval report: none
Training and internal deployment: dates not reported
xAI said "Grok 4 is the most intelligent model in the world"—moreover, "Grok 4 represents a leap in frontier intelligence"—but didn't publish anything about safety. In particular, it didn't release results for dangerous capability evals, or even say what kinds of evals it ran or whether it thinks it ruled out dangerous capabilities.
Grok 4 safety categories
This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API (i.e., when the user doesn't control the model weights) is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.
Misuse (via API) prevention
xAI's draft Risk Management Framework said xAI planned to set eval thresholds and do evals in the future. Its planned response to dangerous capabilities was to implement mitigations to get the model below the thresholds. Unfortunately, this is an invalid approach to mitigating dangerous capabilities: the crucial question is whether mitigations are robust; the Framework doesn't address that. xAI hasn't said anything about its current practices.
Scheming risk prevention
xAI's draft Risk Management Framework said it planned to evaluate for "risk factors for loss of control." Unfortunately, I have no confidence in the metrics mentioned in the Framework. Additionally, the Framework does not mention what xAI should do if the evaluations show concerning results. xAI doesn't seem to be thinking about risks from subtle scheming.
Security
Nothing