Abstract
AI evaluation–the measurement of AI capabilities, behavior, and impact–is critical for safety. The field of safety evaluations however remains nascent. In the development of Google DeepMind’s Gemini models, we innovated on and applied a diverse set of approaches to safety evaluation. To contribute to the maturation of the field, we share here some aspects of our evolving approach and lessons learned. First, theoretical frameworks are invaluable to organize the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice each benefit from collaboration: clarifying goals, methods, and challenges, and facilitating the transfer of insights and work across the evaluation development pipeline. Third, methods, lessons, and institutions transfer across the range of concerns in responsibility and safety, including present harms and dangerous capabilities. Safety evaluations are a public good, and as such we need to rapidly advance the science of evals, the thickness of scientifically-grounded norms and standards, the innovativeness of the ecosystem, and the integration of these measures into company policy and public policy.
Authors
Laura Weidinger, Toby Shevlane, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Lev Proleev, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Allan Dafoe, William Isaac
Venue
arXiv