Evaluating Frontier Models for Dangerous Capabilities

Published: 21 March 2024

Abstract

To understand the risks of a new AI system, we must understand what it can and cannot do. To this end, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini models. These evaluations cover five topics: (1) persuasion & deception; (2) cyber-security; (3) self-proliferation; (4) self-reasoning & self-modification; and (5) biological and nuclear risk. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help lay the groundwork for a rigorous science of dangerous capability evaluation, in preparation for future, more capable models.

Authors

Mary Phuong, Matthew Aitchison, Elliot Catt, Scogan ‎(Sarah Cogan), Heidi Howard, Alex Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Maria Abi Raad, Tom Lieberum, Sebastian Farquhar, gdelt , Anian Ruoss, Marcus Hutter, Rohin Shah, Allan Dafoe, Toby Shevlane, Sarah Hodkinson, Sharon Lin, Albert Webson, Lewis Ho, Seliem El-Sayed, Sasha Brown

Venue

arXiv

Gemini

Gemma

Generative models

Experiments

Projects

Publications

News

AI for biology

AI for climate and sustainability

AI for mathematics and computer science

AI for physics and chemistry

AI transparency

News

Careers

Milestones

Education

Responsibility

The Podcast

Evaluating Frontier Models for Dangerous Capabilities

Abstract

Authors

Venue

Evaluating Frontier Models for Dangerous Capabilities

Share

Abstract

Authors

Venue