Related work

Evaluating AI companies' evals

Nothing systematic but:

Best practices