AI Safety Summit: An update on our approach to safety and responsibility
In September, we were asked by the UK Secretary of State for the Department of Science, Innovation and Technology (DSIT) to share Google DeepMind’s approach to 7 areas of safety and responsibility for frontier AI. Our response is below. As the UK prepares to host the AI Safety Summit, we believe it’s important to have an inclusive conversation about AI safety practices, and how we can best build on them together. We hope this provides a useful snapshot of some of our priorities.
At Google DeepMind, we aim to build AI responsibly to benefit humanity. At the heart of this mission is our commitment to act as responsible pioneers in the field of AI, in service of society’s needs and expectations. We believe applying AI across all sorts of domains – including scientific disciplines, economic sectors, and to improve and develop new products and services – will unlock new levels of human progress. However, we need to develop and deploy this technology thoughtfully and responsibly — our mission is only achievable with the responsible development and deployment of AI systems.
We’ve already seen people use AI to address societal challenges, including by helping scientists better detect breast cancer, forecast floods, limit the warming effects of jet contrails, accelerate clean nuclear fusion, predict protein structures, and achieve healthcare breakthroughs. Vast potential remains to supercharge scientific research and economic productivity, tackle global challenges like climate change and co-create new approaches to perennial policy priorities like education. This is the opportunity that we must keep in mind as we work together to develop guardrails for the development and deployment of the technology.
In the past few years alone, we have seen significant advances in large, general-purpose models, often referred to as “foundation models,” as well as in applications in which those models are deployed, such as chatbots. We are currently anticipating a next generation of leading foundation models, which we refer to as “frontier” models. We hope that their capabilities will far exceed those of the foundation models we see today, along multiple dimensions. In general, we hope that they will be much more general purpose; span many modalities; and be able to engage in complex tasks that involve memory, planning, and the use of external tools (such as internet tools). These frontier models will present significant opportunities right across society; indeed, these new capabilities will help unlock many positive and potentially transformational applications of AI technology – from personal assistance to educational tutoring and scientific research support. Simultaneously, they may present evolving complexities and risks.
Responsible AI developers must therefore look ahead and anticipate possible future risks, both the potential for amplification of existing issues and novel safety risks. For instance, researchers have pointed out that it is plausible that future AI systems could conduct offensive cyber operations, deceive people through dialogue, manipulate people into carrying out harmful actions, develop weapons (e.g. biological, chemical), fine-tune and operate other high-risk AI systems, or assist people with any of these tasks. People with malicious intentions accessing such models could also misuse their capabilities. Or, due to failures of alignment, these AI models might take harmful actions even without anyone intending so.
For years, we have been building the responsibility and governance infrastructure to get ahead of these risks – from conducting foresight research to developing tools and frameworks and designing organisational practices. All of our AI development is guided by a rigorous, end-to-end process to identify, assess, and manage the potential opportunities and risks. Many of the questions in your letter refer to processes that are already embedded across our entire organisation and reflected in our day-to-day work. Other questions refer to practices that we are currently building or exploring – as they are most relevant not to the class of models that have been deployed to date, but to future generation of models that we expect to see across society. We are setting up this infrastructure for the future at pace, and we expect to make progress in the months and years ahead.
We already have a meaningful baseline for collective action across the industry. Alongside other AI labs, in July of 2023 we committed publicly with the US government to a set of measures to ensure we develop AI in a way that is safe, secure, and trustworthy. We need to continue advancing the state of the art in the safety practices involved; broaden international recognition of the importance of these principles; have more AI developers make these commitments as their systems approach the “frontier;” and identify additional practices that should become standards of responsibility. The upcoming Safety Summit presents a valuable opportunity to garner international support for these priorities.
Many of the open questions are ones that we can answer only through collaborative dialogue. To help drive this discussion forward, we recently announced the formation of the Frontier Model Forum, an industry body focused on ensuring safe and responsible development of frontier AI models. We launched this forum with Anthropic, Microsoft, and OpenAI, and we expect more AI companies to join in the future. The Forum will focus on three key areas: identifying best practices for the responsible development and deployment of frontier models; advancing AI safety research to promote responsible development of frontier models, minimise risks, and enable independent, standardised evaluations of capabilities and safety; and facilitating information sharing among companies and governments. The Forum will also support efforts to develop applications that can help meet society’s greatest challenges, such as climate change mitigation and adaptation, early cancer detection and prevention, and combating cyber threats.
Advancing these best practices and evaluations is key to getting safety right, and an ecosystem of assurance and certification will need to develop around this to ensure users and the public can feel confident in the robustness of frontier models. We are supportive of efforts by governments to kickstart the development of a mature assurance ecosystem around AI.
We are also addressing these questions with other organisations that bring together industry and civil society. We need a broad range of perspectives to have a seat at the table to identify opportunities and harms, as well as how to address them. We are proud co-founders of the Partnership on AI (PAI), and are currently collaborating with a broad set of PAI partner organisations on efforts closely related to your questions. In addition, this year we partnered with civil society organisations to co-convene ten multidisciplinary roundtable discussions on the risks and opportunities presented by the deployment of AI in key sectors. Roundtable participants included stakeholders from civil society, academia, advocacy groups, government and industry, including startups, and the insights add valuable and sometimes overlooked perspectives to these debates.
By establishing the Frontier AI Taskforce, the UK is laying strong foundations for tackling these challenges and designing solutions in partnership with industry. We detail below priority areas where we are collaborating with the Taskforce, including on structured model access for selected researchers and advancing the practice of evaluations. Considered alongside the approach outlined in the White Paper on AI Regulation, we believe these efforts can inform proportionate and risk-based AI governance frameworks in the UK. We appreciate the efforts of the Taskforce to develop deep expertise as the regulatory community balances the important goal of addressing potential risks of frontier models with ensuring innovation is not unnecessarily stifled.
We believe effective AI safety and responsibility policies need to be multi-layered, in reflection of the layers of the technology stack that make up an AI system. While most of our content in this letter focuses on safeguards we apply at the level of our models, we also offer some thoughts in the context of products and applications in which those models are deployed, in cases where we think those offer more appropriate answers to your questions.
The UK Safety Summit offers a critical opportunity to focus the international conversation on frontier AI governance and gather perspectives from a broad range of stakeholders across society. Due to the global nature of AI technology, we need agreement on common principles and priorities for responsible development and deployment, to guide countries to develop interoperable governance approaches and collaborate toward broad and equitable distribution of the benefits of AI. We look forward to continuing this discussion with you and with the broader international community.
- Responsible Capabilities Scaling
- Model Evaluations and Red-Teaming
- Model Reporting and Information Sharing
- Reporting Structure for Vulnerabilities Found after Model Release and Post-Deployment Monitoring for Patterns of Misuse
- Security Controls including Securing Model Weights
- Identifiers of AI-Generated Material
- Priority Research and Investment on Societal, Safety and Security Risks
- Data Input Controls and Audit
- AI for Good
Responsible Capabilities Scaling
Google believes it is imperative to take a responsible approach to AI. To this end, Google’s AI Principles, introduced in 2018, guide product development and help us assess every AI application. Pursuant to these principles, we assess our AI applications in view of the following objectives:
1. Be socially beneficial. 2. Avoid creating or reinforcing unfair bias. 3. Be built and tested for safety. 4. Be accountable to people. 5. Incorporate privacy design principles. 6. Uphold high standards of scientific excellence. 7. Be made available for uses that accord with these principles.
In addition, we will not design or deploy AI in the following areas:
1. Technologies that cause or are likely to cause overall harm. Where there is a material risk of harm, we will proceed only where we believe that the benefits substantially outweigh the risks, and will incorporate appropriate safety constraints. 2. Weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people. 3. Technologies that gather or use information for surveillance violating internationally accepted norms. 4. Technologies whose purpose contravenes widely accepted principles of international law and human rights.
To put our principles into practice, a central team is dedicated to ethical reviews of new AI and advanced technologies before launch, working with internal domain experts in machine-learning fairness, security, privacy, human rights, the social sciences, and, for cultural context, Google’s employee resource groups. We also share our learnings and responsible AI practices, including public progress reports (see here for the 2022 report) on our efforts to implement our AI Principles.
Our AI Principles guide which projects Google DeepMind will and will not pursue and describe commitments that we pledge to uphold — including to develop and apply strong safety and security practices.
In addition to the AI Principles, Google DeepMind has a mature end-to-end process for identifying, assessing and managing ethics and safety risks posed by our models. This extends into a similar end-to-end process for Google applications that draw on these models. We have iterated on this process for several years and will continue to do so in response to the capability development of frontier models. Key components are as follows:
- Google has established the Responsible AI Council, with senior executive representation, to assess novel and high-risk issues in model development and deployment, and is integrating the evaluation and quality assurance of new models into existing Trust & Safety and Enterprise Risk Management protocols.
- Google DeepMind has a dedicated Responsible Development and Innovation team that works with researchers and practitioners across our entire organisation to understand if a model they propose to develop – or to update – poses ethics and safety risks, and to ensure that appropriate safety mitigations and other ethics considerations are incorporated into model and product development.
- If a model may pose risks, this team works with the teams developing the model to carry out a standardised Ethics and Safety Assessment, which is reviewed and updated at various stages of the model’s development. The assessment identifies and describes specific risks and benefits posed by the model, including analysis on how they may manifest, and any intended accompanying mitigations. Many assessments are supported by dedicated ethics and safety research (see Section 7), and external expert consultations - for example, to better understand risks that have been documented in similar third-party models, as we did around the release of the AlphaFold Database, or to better analyse more systemic potential risks that are not well suited to the types of evaluations we describe below.
- Leading models are also subject to a suite of evaluations. The exact evaluations depend on the model in question (see Section 2, below). Our evaluations for large language models currently deployed focus on content safety and fairness and inclusion, and we are also developing evaluations for dangerous capabilities (in domains such as biosecurity, persuasion and manipulation, cybersecurity, and autonomous replication and adaptation). We will continue to scope and design new evaluations, including via ethics and safety research, and via collaborations with third parties.
- Our dedicated internal governance body, the Responsibility and Safety Council (RSC), is tasked with helping to uphold our AI Principles and has been holding regular review meetings for more than five years. The RSC is made up of a rotating set of Google DeepMind senior leaders and is designed to be representative across teams and functions. Drawing on the Ethics and Safety Assessment and our suite of evaluations, the RSC makes recommendations about whether to proceed with the further development or deployment of a model, and/or about the safety and ethics stipulations under which a project should continue.
- In parallel with sessions of the Responsibility and Safety Council, teams developing models will work with partners across Google DeepMind to put in place various technical and policy mitigations. In some instances, the RSC can also act as a catalyst for developing novel mitigations, such as our recently-launched watermarking tool, SynthID (see Section 7, below).
- Multiple other teams contribute to the processes outlined above, including dedicated teams working in areas like ethics, sociotechnical, robustness, alignment, governance research, and more.
To promote appropriate transparency about this process we publish regular reports on our AI risk management approaches, covering topics such as the development and use of model evaluations, safety mitigations, and risk domain analyses.
We are continually building on this established process, and are currently preparing our risk management for future generations of frontier models, through some of the methods described below. While it is unlikely these methods will be needed for today’s models, we believe advance preparation to mitigate future potential risks is important, and that we should start building this infrastructure now. At the same time, designing these frameworks presents significant, open scientific challenges: there is a lot of work to do on the basic research into how we can best design evaluations and other tools to map and mitigate risks, as described in more detail in section 2, below. Several components we are actively exploring include:
- Comprehensive dangerous capability evaluations: A step jump in capabilities, or the emergence of new types of capabilities, could be a key driver of large-scale risks from frontier models. As such, understanding a model’s capability profile via an informative set of evaluations is key to appropriately determining mitigations. We have built and will maintain a world-class team developing such evaluations, and we have been working with external experts to make sure evaluations have adequate coverage (See Section 2). As mentioned above, our current priority is developing evaluations for biosecurity, persuasion and manipulation, cybersecurity, and autonomous replication and adaptation.
- Broadened mitigations: The mitigations required to contain risks from future frontier models may differ from those for today’s models. Google already has extensive cybersecurity protocols to draw on, and has articulated a Secure AI Framework (SAIF), and we are exploring their expansion for frontier models (see Section 6). For example, we are building upon our existing robust information security measures to harden guardrails against potential exfiltration of model weights that pose significant misuse risks. Other possible mitigations that future frontier models may require include post-deployment monitoring protocols (see Section 5) and restrictions on internal use or augmentation.
- Transparency: Frontier model safety is an issue of broad social relevance, and an important open research question. Google has been an industry leader in transparency, creating many of the documentation templates that are widely used to document AI systems today such as model cards and data sheets; and publishing our approach to responsibility regarding our leading systems (e.g. AlphaFold, PaLM-2). We are working to evolve our disclosure practices for frontier models (see Section 3), including via the newly created Frontier Model Forum described in detail above, to continue to provide appropriate external awareness of the safety of our frontier model activities.
- Frameworks for frontier model end-to-end safety: End-to-end risk management processes for frontier models need to account for novel and emergent capabilities, models possibly posing risks prior to deployment, and such risks being of potentially greater scale. To prepare for future frontier models, a framework is required that establishes processes for understanding model capability profiles and the risks they pose, for assigning appropriate mitigations based on a model’s capability profile and risk analysis (including, for example, guidelines mapping evaluation results to recommended mitigations)—and for ensuring both are designed and sequenced in such a way that possible risks all through the model’s life-cycle are adequately mitigated.
As we work to make progress on these open research problems, our cross-functional team exploring end-to-end safety frameworks for frontier models is guided by the following considerations:
- Proportionality: frontier models should undergo safety mitigations proportionate to their potential risks.
- Informed judgement: decisions about safety mitigations should be informed by model evaluations, red-teaming, expert judgement, and other relevant evidence. Where there is little evidence and high uncertainty, we are restrained and cautious in our approach to model development and deployment.
- Grounded in domain expertise: a “risk domain” is an area in which a future model may be capable of causing significant harm. Working with domain experts, we aim to design our model evaluations to be maximally informative about the likelihood of models causing harm in such domains, and establish guidelines for assignment of mitigations based on evaluation results, to ensure that our practice meets our AI Principles.
- Information security: often, strong information security is a critical mitigation for a model that can be significantly misused, to ensure models with dangerous capabilities do not irreversibly proliferate.
One approach for implementing these principles that we are exploring is to operationalize proportionality by establishing a spectrum of categories of potential risk for different models, with recommended mitigations for each category. The task of the risk assessment process would be to 1) assign and adjust the categorisation for each model, and corresponding mitigations, according to an assessment of what mitigations are required to achieve adequate safety; and 2) elicit information to reduce uncertainty about the risks a model poses where necessary. The process would function across the model’s life cycle as follows:
- Before training: frontier models are assigned an initial categorisation by comparing their projected performance to that of similar models that have already gone through the process, and allowing for some margin for error in projected risk.
- During training: the performance of the model is monitored to ensure it is not significantly exceeding its predicted performance. Models in certain categories may have mitigations applied during training.
- After training: post-training mitigations appropriate to the initial category assignment are applied. To relax mitigations, models can be submitted to an expert committee for review. The expert committee draws on risk evaluations, red-teaming, guidelines from the risk domain analysis, and other appropriate evidence to make adjustments to the categorisation, if appropriate.
The categories we are considering span from those containing low-risk models, in which our current processes already operate, to those containing models that may pose greater risk, where additional mitigations may be recommended, such as subjecting models to strict information security, tracking deployments of the model and its derivatives in a way that enables rollbacks, requiring internal review before further augmentation, and disclosures to appropriate stakeholders.
Our exploratory work on responsible capabilities scaling is in a continuous state of evolution and iteration. It is critical that industry, government, scientists, and civil society work together to explore and develop these practices. As these potential risks are currently not well understood, policies need flexibility to evolve in light of new evidence. Ultimately, we believe that frontier developers should adhere to a shared risk management process, underscoring the importance of forging broad consensus.
Model Evaluations and Red-Teaming
Evaluation is one of the main tools we have for risk assessment. Evaluations encompass a range of practices, including assessing a model's performance on tasks, which helps us identify 'capabilities' – and also assessing impacts, including benefits and harms, post-deployment. Evaluations can serve several important functions by providing benchmark safety metrics; helping us understand capabilities and risks of models; informing responsible decisions about deployment; helping to build shared understanding about risks across the AI community; and providing common measures for assessing mitigations. There are a variety of approaches to evaluation, including benchmarking or assessing an AI model’s capabilities; user evaluations and automated testing; and impact assessments. AI researchers already use a range of evaluations to identify unwanted behaviours in AI systems, such as AI systems making misleading statements, biassed decisions, or repeating content from the training dataset.
Our process for assessing our models involves subjecting them to a relevant set of evaluations at various checkpoints throughout the development life-cycle. The set of evaluations will differ by model type. Our evaluations are most comprehensive for model modalities (such as text-to-text and text-to-image) and model harms (such as child safety, and bias and discrimination) that are relatively well-codified and studied. We have also developed evaluations for other priority areas (e.g. generation of dangerous content), and are exploring many new types of evaluations, for example for dangerous capabilities (see below).
We use ‘red teaming’ to refer to a specific type of evaluation that involves stepping into the role of an adversary and executing simulated attacks against targets, as well as more open-ended exploration of a system that does not involve specific harmful goals. We have dedicated internal AI Red Teams that interrogate our leading models, and the products that draw on them. These teams draw on the rich experience that Google’s broader Red Team has accumulated over the past decade, and supplements it with AI expertise. These AI Red Teams are a key component of Google’s Secure AI Framework (SAIF) - see Part 6. As we outlined in this recent report, our AI Red Team leverages attackers' tactics, techniques and procedures (TTPs) to test a range of system defences.
As a risk mitigation tool, evaluations and red-teaming exercises are only useful to the extent that they have a clear audience and purpose. When it comes to actioning evaluations, we consider three categories of users and goals. First, the teams developing our leading models use ethics and safety evaluations for iterative improvement. Second, our dedicated internal teams use held-out evaluations and red-teaming exercises from an ‘assurance’ perspective, to support our internal governance bodies and to provide information to third parties, for example via model cards (see Section 3). Third, we work with external evaluators and red-teamers, outside Google, to widen the scope of what we can be tested for, and to ensure independent evaluation.
As the AI community builds and deploys increasingly powerful systems, we recognize the need shared across the industry to urgently advance the field of ethics and safety evaluations. We have been working to do so for years, and it is one of the key commitments we made to the White House earlier this year. In the future, as models develop, we will need more robust tests for established social and ethical risks such as bias and privacy in addition to areas that will be increasingly important as AI becomes more widely adopted, such as factuality, misinformation, anthropomorphism, and information hazards. We also believe that for organisations developing frontier models, the portfolio of evaluations should include evaluation for extreme risks, as we laid out in a recent publication.
These evaluations should inform appropriate risk assessments throughout the life cycle of model development and deployment. We are currently scoping new dangerous capabilities evaluations in specific domains (biosecurity, autonomous replication and adaptation, cybersecurity, and persuasion and manipulation) that may be appropriate for our future model launches, as well as how to execute them safely and responsibly.
At the same time, we recognize that the evaluation challenge is much harder than it looks. Standardised best practices and benchmarks for safety evaluations are not yet established, but are a growing area of discussion and research across industry, academia, governments, and civil society. It should also be acknowledged that evaluations themselves are an indicator as to how a model is performing, but should not be relied on as a failsafe. Establishing satisfactory evaluation methods is a very challenging problem for various reasons:
- As outlined above, there is the potential for rapid increase in capabilities of AI systems at the frontier.
- Defining concrete tests for complex constructs (such as bias) can be difficult because there are many possible ways to test for these.
- Determining thresholds of acceptable behaviour (e.g. what level of performance a model has to reach on a benchmark before it is “safe enough” or “good enough” to be released) is a normative decision that needs to be taken in a legitimate way.
- Determining thresholds for unacceptable behaviour - which types of knowledge are considered too risky for a model to output – is also a normative decision.
- A lack of established procedures and structure for evaluations in this area means evaluations can be susceptible to a form of Goodhart’s law - when a measure becomes a target, it ceases to be a good measurement.
- There is currently a lack of standardisation for evaluation across current models, given differences in features such as context length and retrieval capabilities.
- There are practical and legal obstacles to undertaking some types of evaluations in real-world settings - such as in a wet lab or with customers using a live model.
More progress – both technical and institutional – is needed, requiring collective effort across industry, academia, governments, and civil society. The scope of potential evaluations is huge, and we need to ensure that time and attention are allocated efficiently. We believe that leading AI labs should help develop the evaluations ecosystem – including by developing the expertise of policymakers, and by building capacity among third parties for conducting independent evaluations. With that in mind, we are involved in various AI community efforts to standardise what evaluations are needed and to share best practices, including via the Frontier Model Forum, and are providing structured access to one of our leading models to gain diverse perspectives for testing and evaluation. We believe bringing in these perspectives can help us improve our models, to complement our internal responsibility approaches, and support the further development of evaluation methodology in the industry.
Model Reporting and Information Sharing
We agree that sharing information about the capabilities, opportunities, and potential risks from AI models is critical to enable their responsible use. AI is first and foremost a field of research, and Google and Google DeepMind collectively accounted for the largest share - by some distance - of the top 100 most cited AI papers in 2022. Jumper et al. (2021), the AlphaFold paper, has been cited more than 14,000 times and is the 4th most cited paper in both AI and biology. We routinely participate in leading AI conferences, such as NeurIPS, in order to share knowledge that enables others to build responsibly. We see this shared understanding and common frameworks built on decades of research as prerequisites to making AI safe.
Our researchers helped create and advance many of the documentation templates that are increasingly used by the AI community to document AI systems today, such as model cards and data sheets. We have also published model cards as part of broader technical papers for many of our leading models, such as for PALM and Gopher, and we have published in-depth pieces on how we approached responsibility for some of our leading systems (e.g. AlphaFold, PaLM-2). We are working to evolve our disclosure practices for upcoming frontier models, such as identifying additional information that may be useful to include in model cards, to continue to provide appropriate external awareness of the safety of our frontier model activities. It’s important to note that what type of information should be shared with what audience is an active discussion in the AI community. For instance, there is information that may be more useful and appropriate to share internally, or with product developers. We expect to see iteration in model card formats, such as distinct templates for internal vs. public uses, or more modular model cards.
We share a number of additional resources to help downstream developers develop and operate products safely and responsibly. The versatility of models makes them extremely useful to an enormously broad set of developers, but it also makes it difficult to predict exactly what kinds of unintended or unforeseen outputs they might produce in a particular use case. Given these complexities and risks, our model APIs are designed in alignment with our AI Principles. At the same time, it is important for developers to understand and test their models that leverage these powers to ensure safe and responsible deployment, and we help them to do so in a number of ways:
- We have developed a framework of Responsible AI practices based on our own research and experience, which helps others in the AI community design and operate AI-powered products safely and responsibly. Recommended practices include using a human-centred design approach; understanding the limitations of one’s dataset and models; conducting repeated testing; and more.
- We make available to customers comprehensive information and documentation for our foundation models and other AI tools offered to third-party developers under Google Cloud (via Vertex AI), including specific information on how to deploy AI applications in a safe and compliant manner, and limitations that developers should be aware of.
We also actively disseminate information about the performance of our AI systems, and various best practices, at various collaborative fora in the AI community and with policymakers. These include the Partnership on AI; the Global Partnership on AI; the Organization for Economic Cooperation and Development (OECD) (e.g. here and here); the U.S. President’s Council of Advisors on Science and Technology (PCAST). We have also collaborated with the U.S. National Institute of Standards and Technology (NIST) as it developed its AI Risk Management Framework.
Reporting Structure for Vulnerabilities Found after Model Release and Post-Deployment Monitoring for Patterns of Misuse
We see post-deployment monitoring and reporting for vulnerabilities and misuse as being closely connected, and therefore address your questions on both areas, as well as describe our robust policies, in this section.
Monitoring and Reporting. We believe in the need for a comprehensive approach to discover and understand models’ capabilities, and ultimately their impact. Measures such as red-teaming and evaluation, described in more detail above, need to be applied both before and after release. This is primarily because the impacts of AI systems are best understood in a given context for their use, which correspondingly determines the most appropriate mitigations. For instance, the usefulness of large language models is partly driven by the fact that users interact with them using natural language via a ‘prompt’. This makes them well suited to widespread use, and integration into multiple products and services - therefore this means that vulnerabilities must be addressed both at the level of the model and that of the product. Another factor to consider is that the most advanced AI models may demonstrate ‘emergent capabilities’, which are capabilities that are “not present in smaller models but [are] present in larger models.” These capabilities are often not explicitly intended or easily predictable by developers prior to training, and they are sometimes discovered much later - even after the model is widely available. It is therefore important to have a broad variety of processes to report and remedy vulnerabilities which are discovered in the course of their use.
One key pillar is to empower external users to detect and disclose vulnerabilities. “Bug bounty” programs, which incentivize the reporting by external actors of vulnerabilities in software systems through the use of cash prizes and other rewards, can help AI developers to surface previously unknown safety and security issues. This could include cases where the AI system exhibits misaligned or biassed behaviour; the AI system assists the user to perform a highly dangerous task (e.g. assemble a bioweapon); new jailbreak prompts; or security vulnerabilities that undermine user data privacy. Google was one of the first Internet companies to launch a vulnerability rewards program, and has a long history of working with security experts to identify, report and address bugs across the full range of our services. As the program evolved, its “responsible disclosure” practices became a sector norm that most major Internet companies and many government agencies follow. The system’s adaptability has been crucial as the nature of security risks has evolved over time - something that is equally important as security practices in AI share many of the same dynamic properties. In line with the voluntary commitments made as part of the White House process (Commitment 4: Incent Third-Party Discovery and Reporting of Vulnerabilities), we have expanded the scope of our Bug Hunter Program (including Vulnerability Rewards Program) as part of our Secure AI Framework, to reward and incentivize anyone to identify and report vulnerabilities in our AI systems. Where appropriate, Google issues notifications about the vulnerabilities of its services according to the Common Vulnerabilities and Exposures system.
As part of our commitment to engage external stakeholders in red-teaming of our models, this summer we participated in the DEFCON convention’s first public generative AI red team challenge, alongside other leading AI developers. As part of this event, hackers completed a series of challenges to get a model to produce an undesirable response – such as getting the model to produce prompt injections, identify security vulnerabilities, generating misinformation or harmful information, etc. Coupled with our internal red-teaming work, enabling this type of external adversarial testing will be key to staying ahead of an evolving threat landscape.
We also recognise the importance of proactive post-deployment monitoring and reporting on the performance of our systems. Google has dedicated teams and processes for monitoring publicly posted content on social media platforms, news publications, blogs, trade publications, and newsletters—with the aim of collecting, detecting, and triaging signals of emerging threats to our systems. We have also been exploring other various ways to track the impact that our flagship models are having across society. For instance, in addition to having our AlphaFold protein structure prediction model undergo a robust prerelease process, we have been tracking its use by researchers to understand how it is being used in downstream scientific work. Our assessment indicates that over a million unique users have accessed the AlphaFold Database, potentially saving the research world trillions of dollars by reducing the need for slow and expensive experiments.
Many of the potential risks associated with frontier AI models are common across industry, and a collaborative approach to identifying, addressing and sharing insights and best practices is required. A useful resource is for AI developers to maintain or contribute to a public database of issues that users encounter with deployed AI systems. This helps direct attention to issues that should be fixed, and to novel ones that may arise. Google is a founding member of the Partnership on AI, which is one of the sponsors of the AI Incidents Database. The Database currently contains records of more than 1,000 reported incidents spanning multiple taxonomies of harm.
Conversations with civil society leaders, policy actors, and AI researchers around the world have also highlighted the need for additional processes by which frontier AI labs can share information related to the discovery of vulnerabilities or dangerous capabilities within frontier AI models, and their associated mitigations. In response, Frontier Model Forum (FMF) members have begun to scope a possible Responsible Disclosure Process. As an example of how this new process could work: based on work with domain experts, some FMF companies have already discovered capabilities, trends, and mitigations in an area of security concern, and are exploring ways that the associated research can serve as a relevant case study.
This year, Google publicly launched Bard, our experimental conversational AI service. We made a deliberate effort to introduce Bard in a thoughtful and incremental way. Bard is designed as an interface to a Large Language Model (LLM) that enables users to collaborate with generative AI. Similar to our other services, Bard uses a combination of machine learning and human review to enforce our generative AI policies.
While large language models (LLMs) are an exciting technology, they are not without their faults. Like all LLMs, Bard can sometimes generate responses that contain inaccurate, misleading, or otherwise offensive information presented confidently and convincingly. We give users with access to Bard a way to help us identify these types of responses. If a user sees inaccurate, offensive, unsafe, or otherwise problematic information, they can select the “thumbs down” symbol and provide additional feedback. To alert users to the possibility of these types of responses, we include a warning message to all Bard users directly below the input bar that states: “Bard may display inaccurate or offensive information that doesn’t represent Google’s views.” Additionally, Bard has a “Google it” button to make it easy for users to check its responses or explore sources on the web. When a user clicks “Google it,” Bard provides suggestions for Google Search queries. Clicking on a query opens Google Search in a new tab, where the user can check Bard’s responses or research further. Bard also highlights statements in its responses in different colours depending on whether Google Search finds content on the web that’s likely similar to or likely different from Bard’s statements.
Along with the broader AI community, we continue to identify opportunities where LLMs are useful and helpful, while also focusing on where we need to make improvements. Bard is experimental, and we are actively adding to Bard’s capabilities through ongoing research, testing, and user feedback as we continue to improve the technology. Our testing and evaluation, verification, and validation processes help us learn from software engineering best test practices and quality engineering to make sure the AI system is working as intended and can be trusted. To further improve Bard, we use a technique called Reinforcement Learning from Human Feedback (RLHF), which improves LLMs based on human preference feedback.
We have tested and continue to test Bard rigorously, but we know that users will find unique and complex ways to stress test it further. This is an important part of refining Bard, especially in these early days, and we actively update our safety protocols and methods to prevent Bard from outputting problematic or sensitive information. While we have sought to address and reduce risks proactively, Bard—like all LLM-based experiences—will still make mistakes. That is one of the reasons why we partner with numerous stakeholders in government and the private sector to address these risks.
Security Controls including Securing Model Weights
Keeping the most advanced AI models and systems secure is a cornerstone of responsible frontier AI model and systems development. We expect that as frontier models become more powerful, and are recognized as such, we will see increased attempts to disrupt, degrade, deceive, and steal them.
This is why we are building on our industry-leading general and infrastructure security approach. Our models are developed, trained, and stored within Google’s infrastructure, supported by central security teams and by a security, safety and reliability organisation consisting of engineers and researchers with world-class expertise. We were the first to introduce zero-trust architecture and software security best practices like fuzzing at scale, and we have built global processes, controls, and systems to ensure that all development (including AI/ML) has the strongest security and privacy guarantees. Our Detection & Response team provides a follow-the-sun model for 24/7/365 monitoring of all Google products, services and infrastructure - with a dedicated team for insider threat and abuse. We also have several red teams that conduct assessments of our products, services, and infrastructure for safety, security, and privacy failures. Staff undergo training on general and machine learning-specific safety and security practices, as well as on our AI Principles and Responsible AI Practices. We have a set of internal policies and guidelines to help ensure we are aligning on the best-practices for safety, security, and privacy across the development of AI/ML.
Our security teams include leading experts in information security, application security, cryptography, and network security. These teams maintain our security systems, develop security review processes, build security infrastructure, and implement our security policies. They assess potential security threats using commercial and custom-developed tools, as well as conducting penetration tests and quality assurance and security reviews.
Members of the security teams review security plans for our networks and services and provide project-specific consulting services to our product and engineering teams. They monitor for suspicious activity on our networks and address information security threats as needed. The teams also perform routine security evaluations and audits, which can involve engaging outside experts to conduct regular security assessments.
General software vulnerabilities identified are tracked centrally through the same process described here. The responsibility for remediation lies with the team with ownership of the affected system or code; timelines and outcomes of remediations are tracked centrally. For vulnerabilities and risks of models that are outside of software code, for example access control weaknesses or training data membership inference risks, risk mitigation measures are proposed and designed by security & reliability engineers with machine learning expertise.
We also draw on security red-teaming and engage the broader security research community through Google’s bug bounty programs, mentioned above. Participants in the Vulnerability Rewards Program include well-known machine-learning security experts, who regularly test and disclose machine learning-related security findings to us, and are paid financial rewards based on the assessed severities of their findings. These participants carry out testing using their own resources through regular, external-facing user access to Google systems.
In parallel with implementing these security best practices, we share learnings with the broader industry. Recently, Google introduced the Secure AI Framework (SAIF), a conceptual approach to security for AI systems in the public and private sectors. It is a start to make sure that responsible actors safeguard the technology that supports AI advancements, so that when AI models are developed and deployed, they’re secure by default. We continuously review our security frameworks to ensure they are fit for the technology we develop, and as we do the same with a view to frontier model development, we plan to develop on SAIF to help build preparedness across the whole industry.
Identifiers of AI-Generated Material
We take a holistic and evolving approach to the question of how to safely and responsibly identify AI-generated content and trace its provenance. It is important for government and industry to work together on this issue, as it is a question for both policy and technical research.
On the technical side, we are exploring and/or implementing multiple complementary solutions, including:
- Watermarking (adding invisible information to generated content that can later be picked up by a detector);
- Metadata (affixing information to content files that denotes whether they were AI generated and by which model); and,
- Digital signatures (Aka “hashing & logging” — generating perceptual hashes of content at time of generation for purposes of later detection, similar to reverse image search).
While complementary, these approaches are different, in terms of utility and viability, according to data modality (e.g. image vs text). Given their limitations, they can only be one part of a broader, holistic response that is needed to address risks posed by AI-generated content. They will often work best when accompanied by other methods, such as product design that enables users to explore the context of a piece of content. They will also need to be coupled with policy and governance approaches, such as AI literacy programmes for the public, and content policies for parties deploying AI-generated content in their communications (such as advertisers).
Watermarking allows for information to be embedded directly into content, even when an image undergoes some basic modifications. It is one tool in our overall toolkit to help indicate when content is created with Google’s generative AI tools. Moving forward, we are building our generative AI models to include watermarking capabilities and other techniques from the start. Like other technical mitigations, watermarking has inherent limitations and could be manipulated or circumvented by adversarial actors. Watermarking is an important part of our approach to improve the integrity of the overall ecosystem.
In August we launched a beta version of SynthID, a tool for watermarking and identifying AI-generated images. This allows us to identify and correctly attribute AI-generated content to its model source – currently, this feature is available for Google Cloud customers using our Imagen model. While SynthID isn’t foolproof against extreme image manipulations, it provides a promising technical approach for empowering people and organisations to work with AI-generated content responsibly.
Metadata allows creators to associate additional context with original files, giving users who encounter an image more information. In the coming months, through our "About this Image" tool, Google Search users will be able to see important information such as when and where similar images may have first appeared, where else the image has been seen online, including on news, fact-checking and social media sites—providing users with helpful context to determine whether what they are seeing is reliable.
Later this year, users will also be able to use this tool when searching for an image or screenshot using Google Lens, and on websites in Chrome. As we begin to roll out generative image capabilities, we will take steps to ensure that our AI-generated images have metadata to provide context for users who come across them outside our platforms. Creators and publishers will be able to add similar metadata, enabling users to see a label in images in Google Search marking them as AI-generated.
We build safeguards for our other generative AI tools as well. For example, Universal Translator—an experimental AI video dubbing service that helps experts translate a speaker's voice and match their lip movements—holds enormous potential for increasing learning comprehension. We have built the service with guardrails, including human review of outputs to ensure they stay within the translation use case and faithful to the original material. The tool is accessible only to authorised partners, who commit to reviewing generated content as well before public dissemination.
Additionally, we have invested in research and tooling to detect synthetic audio outputs of certain generative AI models. For instance, when we built our AudioLM model, we trained a classifier that can detect synthetic speech generated by the model with very high accuracy (currently 98.6%). We have also released a dataset of synthetic speech in support of an international challenge to develop high-performance synthetic audio detectors, which was downloaded by more than 150 research and industry organisations.
Information hazards are both global and industry-wide concerns, and building multi-stakeholder collaborations are an important means of sharing best practice. By publicly collaborating with others and working to develop services focussed on ethics and safety that serve not just our own products, but the wider ecosystem, we can foster norms on sharing best practices. For example, we participate in the Partnership on AI (PAI)’s Synthetic Media Framework.
Our work is ongoing to develop more technical solutions to help users understand whether content is AI-generated. We take testing extremely seriously, and last year we introduced the AI Test Kitchen app as a way for people to learn about, experience, and give feedback regarding our emerging AI technologies.
Moreover, industry has an important role to play in improving general population AI literacy. Even with the best technical and policy mitigations in place, it is vital that there is widespread understanding of generative AI and its implications.
Priority Research and Investment on Societal, Safety and Security Risks
We have a number of teams who focus full-time on AI ethics, safety, and governance research, alongside various cross-functional research efforts that people across the organisation contribute to. For instance, a dedicated team works to make our AI systems robust and verifiable. Current priorities in this area include watermarking and image–, text–, and audio– provenance; privacy, including auditing leakage from models and private fine turing; and robustness and failure analysis, including for safety-critical systems. We have teams conducting research projects designed to understand and mitigate potential risks from more advanced systems and to align them to human interests, including in areas such as interpretability and causality.
We also have a number of teams that pioneer research to better understand the risks and benefits posed by AI systems as they are deployed into, and interact with, wider society. These teams have published flagship papers on topics ranging from strengthening the involvement of under-represented communities in AI, to exploring how we can build human values into our AI systems. Months before chatbots became widely available for consumer use, our team published an extensive survey of the risks posed by Large Language Models, including from bias and toxicity, private data leaks, misinformation, and more. As a follow-up, they have just publish a flagship report on evaluations for multimodal generative models – including mapping gaps in the field and developing a novel taxonomy for evaluations at multiple levels: capability, human interaction, and systemic impact. Other priorities for research across the team include developing more representative and inclusive AI systems, including for the inputs and processes underpinning them; supporting value alignment of AI models; understanding risks posed by AI persuasion, manipulation, and anthropomorphisation; and developing a broader suite of evaluations for dangerous capabilities, as explained above. The research of these teams is integrated with our responsibility processes, and informs the decisions taken by groups like our Responsibility and Safety Council. For example, our research into useful and safe dialogue agents directly underpinned the development of our experimental model, “Sparrow”, and research into novel types of evaluations is guiding the testing and assessment of our upcoming generation of models.
We also believe we have a responsibility to support the broader ecosystem and build capacity for research on these topics. For years our teams have collaborated with external organisations researching AI’s impacts on society. Just this summer, Google announced the launch of the “Digital Futures Project,” including a $20 million fund that will provide grants to leading think tanks and academic institutions researching and encouraging responsible AI development. These organisations are looking into questions that include AI’s impact on global security; its impact on labour and the economy; and what kinds of governance structures and cross-industry efforts can best promote responsibility and safety in AI innovation. In addition, and as mentioned above, we are currently providing structured model access to one of our leading models, with the goal of building external capacity for safety research and evaluation.
Data Input Controls and Audit
We incorporate data governance processes at every stage of the AI lifecycle. Our foundation models are trained using publicly available data sources and open-source curated datasets, alongside proprietary data and data licensed from third parties. Fine-tuning general-purpose foundation models for more specific functions requires more specialised data that represents the targeted subject or use case (with RLHF being one of the commonly used strategies to achieve higher quality data for fine-tuning).
Teams wanting to use data for research (including pre-training and fine-tuning of frontier models) submit a data ingestion request to a dedicated data team. This initiates a review of the origins, content and licence of the data, with input from research, ethics, security, commercial and legal teams. The review also ensures that data was collected in accordance with our AI Principles.
Following review, the data may be modified before being used in research (e.g. filtering to ensure compliance with applicable law). If use is approved, our data team manages the ingestion, cataloguing and storing of the data, including if and how the data can be further shared internally and managing who has access to the data. The review, assessment, and any required mitigations are documented by the relevant internal teams.
AI for Good
We welcome the opportunity to identify areas where AI can help address policy challenges and improve people’s lives. We and others are already using AI to address societal challenges, including by helping scientists better detect breast cancer, forecast floods, limit the warming effects of jet contrails, accelerate clean nuclear fusion, predict protein structures, and achieve healthcare breakthroughs.
There are many examples of AI being used for good - and we have set out detail on their impact below.
The global challenges we face today are enormously complex, and do not lend themselves to easy solutions. Business as usual is no longer an option - we need technology that can help us understand and identify new ways to meet this complexity.
Despite the evidence around us to the contrary, research suggests that breakthroughs in scientific research are stalling, with the number of ‘disruptive’ publications declining over the past 50 years. Likewise, there is evidence that research productivity has been declining globally over the past 40 years. Scientific research holds the key to progress on complex global challenges, from climate change to drug discovery - as well as our economic well being - and AI can help accelerate it.
Our breakthrough AI system AlphaFold accelerated progress on the long-standing challenge of protein folding and supercharged a whole new industry of computational biology. We believe similar transformational progress is possible in energy, climate and education, as well as other scientific disciplines.
Perhaps most exciting is the potential for AI to help realise human potential: alongside helping solve problems that face us as a society, AI can help empower individuals through myriad new applications, including assistive technology for education, scientific research, and creative pursuits.
1. Accelerating scientific discovery: Biology at digital speed
The challenge: Advances in technologies like next-generation sequencing, and the various CRISPR techniques, mean that practitioners have access to ever-growing amounts of biological data and the ability to run larger biological experiments. However, biology is inherently complex and emergent. This means that the costs and failure rates for bringing new drugs to market are high, and efforts to use biology to engineer socially-beneficial applications, such as biofuels or alternative proteins, have struggled to scale beyond promising experiments and prototypes, outside of a small number of examples, like monoclonal antibodies.
The opportunity: AI can supercharge scientific research and biotechnology, in multiple ways, as evidenced by breakthroughs from AlphaFold to AlphaMissense.
- Advancing our fundamental understanding of biology and disease: AI is transforming how practitioners:
- Sequence and annotate: AI is being used to improve how we sequence and annotate biological data, from DNA through single cells, improving the accuracy and speed of these processes. For example, Google’s DeepVariant uses AI to identify and annotate genomic variants.
- Characterise and Analyse: AI can also be used to characterise and analyse biological molecules, processes and systems, for example to understand their structure, what they do, how they work, or their potential role in disease.
- For example, we have used AI to predict the 3D structure of proteins, using approaches that can be extended to other biological molecules. We have also developed AI systems that predict how DNA sequence influences gene expression.
- Rapid progress means that ambitions to understand biology are growing, for example towards a goal of building reliable simulations of different cells that could be used to predict how they interact and behave in different situations.
- Design and engineer biology: These advances are in turn enabling AI practitioners to better edit, engineer, and design biological molecules and constructs, with higher fidelity. For example, the availability of high-accuracy protein structure predictions, via tools like AlphaFold, has led to better in-silico protein design models. This could prove transformative for efforts to design new kinds of protein biologic drugs, or other beneficial protein applications, such as plastic-degrading enzymes.
- While we’ve seen most progress in the life sciences, applying AI to other scientific domains could yield significant benefits. At a roundtable we convened recently with the British Science Association at the Royal Society, data access, structure and compatibility emerged as key ingredients to untapping that potential. Domains like chemistry, materials science, and physics have large amounts of data, but it’s largely unstructured. Investing in making it accessible, legible and compatible, including by incentivising talent into the important but often under-recognised “service layer” of data architecture and management, could unlock similar advances in these disciplines, to those we have seen in the life sciences.
From AlphaFold to AlphaMissense
Proteins underpin the biological processes in our body and every living thing. They are the building blocks of life. Currently, there are over 200 million known proteins, with many more found every year. Each one has a unique 3D shape that determines how it works and what it does. For decades, scientists have been trying to find a method to reliably determine a protein’s structure just from its sequence of amino acids. This grand scientific challenge is known as the protein-folding problem.
Google DeepMind’s AI system AlphaFold provides state-of-the-art predictions of a protein’s 3D structure. It was recognised as a solution to the protein folding challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP), a challenge for research groups to test the accuracy of their predictions against real experimental data.
We have collaborated with EMBL-EBI (the European Molecular Biology Laboratory’s European Bioinformatics Institute) to release the predicted structures of over 200 million proteins - nearly all catalogued proteins known to science -, making them freely and openly available to the scientific community via the AlphaFold Protein Structure Database (AFDB). By making AlphaFold even more accessible to scientists around the world, we hope to increase humanity’s understanding of biology by orders of magnitude and herald a new era in digital biology. Our research indicates that by reducing the need for slow and expensive experiments, AlphaFold has potentially saved the research world up to 1 billion years of progress and trillions of dollars.
In September 2023, we published AlphaMissense, a system which built on the breakthrough of AlphaFold. Uncovering the root causes of disease is one of the greatest challenges in human genetics. With millions of possible mutations and limited experimental data, it’s largely still a mystery which ones could give rise to disease. This knowledge is crucial to faster diagnosis and developing life-saving treatments.
Our catalogue of ‘missense’ mutations allows researchers to learn more about what effect they may have. Missense variants are genetic mutations that can affect the function of human proteins. In some cases, they can lead to diseases such as cystic fibrosis, sickle-cell anaemia, or cancer.
- Until recently, only 0.1% of all possible missense variants - a common type of genetic mutation - had been clinically classified as ‘pathogenic’ or ‘benign’ - limiting the diagnostic rate of rare diseases.
- Building on AlphaFold 2, AlphaMissense categorises 89% of all possible 71 million missense variants, giving us an indication of which ones may be benign, and which may be associated with diseases.
- Releasing our predictions marks a significant step in helping scientists and geneticists uncover new disease-causing genes and increases our ability to diagnose rare genetic disorders.
2. Tackling previously intractable challenges: AI for Climate
The challenge: After a brief respite during Covid-19, emissions from fossil fuels rose again in 2022, reaching a world record. Even if all countries update their policies to match their recent NetZero pledges, Climate Action Tracker expects 2 degrees of warming by 2100. As it stands, we’re on track for 2.8 degrees, and the catastrophic effects this will entail.
The opportunity: AI could help to reduce the extent of warming, and prepare society for its effects, in at least three main ways, with analogous benefits for other environmental challenges:
- Improve energy efficiency: We have demonstrated how AI is enabling a new era of digital efficiency, for example by helping to optimise video compression, battery life, and optimising the matrix multiplication algorithms that underpin computation. The biggest opportunities are to transfer these efficiency gains to other high-impact sectors, such as commercial buildings and power flow in energy grids.
- Accelerate breakthrough technologies: 75% of global emissions result from the generation and use of energy. If AI is to make a sizable dent in these emissions it will require transformative breakthroughs. For example, fusion energy is clean, safe and (almost) limitless. To make fusion happen, practitioners need to create and control plasma - a fourth fundamental state of matter. In 2022, in collaboration with the Swiss Plasma Center, we demonstrated how to use reinforcement learning to control the stability of plasma, a key potential enabling step. Beyond fusion, practitioners could use AI to help design and engineer new types of batteries, PV cells, or cleaner biology-based manufacturing.
Forecasting and responding to climate change: Traditional approaches to forecasting weather and climate are limited by the inherent complexity of the dynamics, and the growing volume of data. AI is well-placed to support. For example, we have developed a 10-day weather simulator that scales well with data, as well as a system to improve 2-hour rain predictions, with colleagues at the Met Office. Our weather simulator recently helped predict the path of Hurricane Lee 10 days in advance. Practitioners can also use AI to understand and forecast other environmental effects in much more granular detail. For example, Climate TRACE, is a near-real time greenhouse gas monitoring platform, while Google Earth Engine allows planetary-scale analysis.
From Cooling data centres to Nuclear Fusion
In 2016 we used machine learning technology to reduce the energy needed to cool Google’s data centres by 40% - a significant breakthrough building on years of effort to reduce our energy use. Dynamic environments like data centres are particularly hard to operate optimally for several reasons:
- The equipment, how we operate that equipment, and the environment interact with each other in complex, nonlinear ways. Traditional formula-based engineering and human intuition often do not capture these interactions.
- The system cannot adapt quickly to internal or external changes (like the weather). This is because we cannot come up with rules and heuristics for every operating scenario.
- Each data centre has a unique architecture and environment. A custom-tuned model for one system may not be applicable to another. Therefore, a general intelligence framework is needed to understand the data centre’s interactions.
To address this problem, we began applying machine learning to operate our data centres more efficiently. Using a system of neural networks trained on different operating scenarios and parameters within our data centres, we created a more efficient and adaptive framework to understand data centre dynamics and optimise efficiency.
We accomplished this by taking the historical data that had already been collected by thousands of sensors within the data centre - data such as temperatures, power, pump speeds, setpoints, etc. - and using it to train an ensemble of deep neural networks. Since our objective was to improve data centre energy efficiency, we trained the neural networks on the average future PUE (Power Usage Effectiveness), which is defined as the ratio of the total building energy usage to the IT energy usage. We then trained two additional ensembles of deep neural networks to predict the future temperature and pressure of the data centre over the next hour.
Our machine learning system was able to consistently achieve a 40 percent reduction in the amount of energy used for cooling, which equates to a 15 percent reduction in overall PUE overhead after accounting for electrical losses and other non-cooling inefficiencies. It also produced the lowest PUE the site had ever seen.
We have also applied AI to one of humanity’s greatest challenges - successfully controlling the nuclear fusion plasma in a tokamak reactor using deep learning
To solve the global energy crisis, researchers have long sought a source of clean, limitless energy. Nuclear fusion, the reaction that powers the stars of the universe, is one contender for achieving this. By smashing and fusing hydrogen, a common element of seawater, the powerful process releases huge amounts of energy. Here on earth, one way scientists have recreated these extreme conditions is by using a tokamak, a doughnut-shaped vacuum surrounded by magnetic coils, that is used to contain a plasma of hydrogen that is hotter than the core of the Sun. However, the plasmas in these machines are inherently unstable, making sustaining the process required for nuclear fusion a complex challenge.
Using a learning architecture that combines deep RL and a simulated environment, we produced controllers that can both keep the plasma steady and be used to accurately sculpt it into different shapes. This “plasma sculpting” shows the RL system has successfully controlled the superheated matter and - importantly - allows scientists to investigate how the plasma reacts under different conditions, improving our understanding of fusion reactors - an important piece of the puzzle in making fusion a reality.
This capability of autonomously creating controllers could be used to design new kinds of tokamaks while simultaneously designing their controllers. Our work also points to a bright future for reinforcement learning in the control of complex machines. It’s especially exciting to consider fields where AI could augment human expertise, serving as a tool to discover new and creative approaches for hard real-world problems.
3. Helping realise human potential: AI for Education
- The challenge: Despite steady progress, many groups - including girls, lower-income populations, refugees, non-native English speakers, and students with special educational needs and disabilities - continue to face inequitable access to education. Even when students do get access, the quality of their learning experience is often sub-par. For example, 60% of ten year olds in lower-income countries are unable to read a simple text aimed at younger students. In many higher-income countries, maths, science, reading, and adult literacy are also stagnating. Given the rate of progress in machine learning, there is an imperative to reconsider what we want students to learn, how we want them to learn this, and the role of educators, institutions, and others.
- The opportunity: AI could help improve outcomes in education by:
- Enabling genuine personalised learning: Many applications have long promised a personalised learning experience. However, typically these applications only offer an adjusted pace of learning - learning activities, content and end-outcomes remain homogenous. The upcoming generation of generative AI tools open up the potential for a more genuinely personalised learning, for example by enabling students to create their own personalised ideas, notes, and learning materials, for example with Google’s experimental ‘AI-first’ notebook, NotebookLM.
- Unlocking new kinds of AI tutors: Traditionally, most AI tutors focussed on helping students to memorise narrow sets of knowledge. As a result, they have largely failed to attain the 2 sigma/standard deviation improvement in learning outcomes that Benjamin Bloom’s famous 1984 study suggested human tutoring was capable of. Large AI models hold out the potential for richer tutoring experiences, particularly if progress can be made in areas like dialogue and factuality.
- Supporting educators: Many teachers, particularly in state-funded schools, suffer from high workloads, rising student-teacher ratios, and insufficient support. Teachers are also being asked to cover a wider remit, from financial/digital/environmental literacies, to social-emotional learning and soft skills. Educators can potentially use AI in many ways, including to help generate lesson plans, particularly for time-intensive teaching activities and to adapt content to different students’ needs.
- Preparing students for an AI-enabled society: With our partners Raspberry PI, we have developed the Experience AI programme to help young people understand how AI works, while our AI Diversity Scholarships and Fellowships programme is helping to ensure that the AI community becomes more representative of wider society.