xAI

Eval and safeguard planning

Grok 4

Released Jul 9, 2025

Model card published Aug 20, 2025 (and quietly changed later)

Training and internal deployment: dates not reported

xAI said "Grok 4 is the most intelligent model in the world"; moreover, "Grok 4 represents a leap in frontier intelligence." Initially, it didn't say anything about safety and its misuse safeguards were absurdly weak. A month later, it published stuff on safety. It said Grok 4 has dangerous chem/bio capabilities and safeguards are necessary. It also said Grok 4 doesn't pose large misalignment risks; this happens to be true but xAI's thinking on misalignment is profoundly confused and the metric it uses has nothing to do with misalignment risk.

Elicitation checklist:❌ ❌ ❌ ❌

Accountability checklist:❌ ❌

Grok 4 evaluation categories

Click on a category to see my analysis and the relevant part of the company's eval report.

Chem & bio

xAI does some simple evals and says Grok 4 has dangerous capabilities.

Cyber

xAI does some simple evals and says Grok 4 lacks dangerous capabilities.

AI R&D

None

Scheming capabilities

None

Misalignment propensity

xAI does evals to measure (1) propensity to lie when pressured to do so by the system prompt or user and (2) sycophancy. These aren't really relevant to misalignment risk.

Grok 4 safety categories

This section is new and experimental. Click on a category to see my analysis. Preventing misuse via API (i.e., when the user doesn't control the model weights) is relatively straightforward, especially for bio; scheming risk prevention will eventually be important and improving it from current levels is tractable; security will eventually be important and we basically know what to do but it's very costly. I'm interested in feedback, especially on whether this content is helpful to you or what questions you wish I answered here instead.

Misuse (via API) prevention

xAI's plan is that if models have dangerous capabilities, it will ensure that the "answer rate [is] less than 1 out of 20 on restricted queries." It claims to be meeting this standard for chem/bio weapons development. Unfortunately, it doesn't say anything about red-teaming its mitigations and they seem unlikely to be effective in the real world.

Scheming risk prevention

"Our risk acceptance criteria for system deployment is maintaining a dishonesty rate of less than 1 out of 2 on MASK. We plan to add additional thresholds tied to other benchmarks." Unfortunately, MASK has almost nothing to do with catastrophic misalignment risk, and upfront benchmarking is not a good approach to misalignment risk.

Security

xAI claims to have good security. This is not credible.