Real-time
Allows for fluid, real-time interaction within the generated world, operating at 20-24 frames per second.
A new frontier for world models
Create and explore infinitely diverse worlds.
Experience the natural world from desert to sea – or witness extreme weather up close.
Generate vibrant ecosystems, from animal behaviors to intricate plant life.
Conjure imaginary worlds, fantastical scenarios and expressive animated characters.
Genie 3 is a general-purpose world model. It uses simple text descriptions to generate photorealistic environments that can be explored in real-time.
Towards world simulation
World models use their deep understanding of physical environments to simulate them. Genie 3 represents a major leap in capabilities – allowing agents to predict how a world evolves, and how their actions affect it.
Genie 3 makes it possible to explore an unlimited range of realistic environments. This is a key stepping stone on the path to AGI – enabling AI agents capable of reasoning, problem solving, and real-world actions.
Project Genie is an experimental research prototype that lets you create and explore infinitely diverse worlds.
Allows for fluid, real-time interaction within the generated world, operating at 20-24 frames per second.
Generates interactive worlds from text, transforming envisioned landscapes into controllable realities ready to be explored.
Renders rich, photorealistic worlds at 720p resolution. This high-fidelity output provides crucial visual detail for training agents on real-world complexities.
Previously seen details are recalled when revisited – and environments can handle sustained interaction without degrading.
Advancing real-time interactivity
To achieve real-time controllability, Genie 3 has to recall previous environments and actions.
So, if the user is revisiting a location after a minute, the model needs to refer back to information from a minute ago. For real-time interactivity, this needs to happen multiple times per second in response to user instructions.
One of the main challenges of generating AI worlds is keeping them consistent over time. This is harder than generating an entire video, as inaccuracies tend to increase the longer the world is actively generated.
Genie 3 environments are far more dynamic and detailed than other methods, such as NeRFs and Gaussian Splatting. This is because they’re “auto-regressive” – created frame by frame based on the world description and user actions. The environments remain largely consistent for several minutes, with memory recalling changes from specific interactions for up to a minute.
Promptable world events make it possible to change the generated world – such as altering weather conditions or introducing new objects and characters.
This increases the range of scenarios agents can use to learn about handling unexpected situations.
Prompting Genie 3 involves two core elements: the world you want to build, and the character you're bringing to life.
Real-world applications
The potential uses for Genie 3 go well beyond gaming.
Genie 3’s realistic controllable realities could offer new ways for people to learn – allowing students to explore historical eras, like Ancient Rome. These simulated environments can also be used to train autonomous vehicles in realistic scenarios, in a completely safe setting.
Genie 3 can maintain consistent worlds, making it possible to explore more complex goals, longer sequences of actions, and real-world complexities. It can also help researchers evaluate agents’ performance, and explore their weaknesses.
SIMA is an agent capable of carrying out tasks in virtual environments – we set it goals to complete within Genie 3. Genie 3 isn’t aware of the goal – but it simulates the future based on the agent's actions.
Although promptable world events allow for a wide range of environmental interventions, they're not necessarily performed by the agent itself. For now, there's a limited range of actions agents can carry out.
Accurately modeling interactions between multiple independent agents in shared environments is an ongoing research challenge.
Genie 3 is currently unable to simulate real-world locations with perfect accuracy.
Clear and legible text is often only generated when it's in the input world description.
The model can support a few minutes of continuous interaction, rather than extended hours.
We believe foundational technologies, like Genie 3, require a deep commitment to responsibility from the very beginning. Technical innovations, particularly open-ended and real-time capabilities, introduce new challenges for safety and responsibility. To address these unique risks while aiming to maximize the benefits, we have worked closely with our Responsible Development & Innovation Team.
At Google DeepMind, we're dedicated to developing our best-in-class models in a way that amplifies human creativity, while limiting unintended impacts. We continue to build our understanding of risks and their appropriate mitigations as we explore the potential applications for Genie 3, to develop this technology in a responsible way.
Acknowledgements
Genie 3 was made possible due to key research and engineering contributions from Phil Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleks Holynski, Jiri Hron, Christos Kaplanis, Marjorie Limont, Matt McGill, Yanko Oliveira, Diego Rivas, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang and Jessica Yung.
We thank Andrew Audibert, Cip Baetu, Jordi Berbel, David Bridson, Jake Bruce, Gavin Buttimore, Sarah Chakera, Bilva Chandra, Kan Chen, Donghyun Cho, Yoni Choukroun, Paul Collins, Alex Cullum, Bogdan Damoc, Vibha Dasagi, Maxime Gazeau, Charles Gbadamosi, Liangke Gui, Shan Han, Woohyun Han, Ed Hirst, Tingbo Hou, Ashyana Kachra, Lucie Kerley, Siavash Khodadadeh, Kristian Kjems, Eva Knoepfel, Vika Koriakin, José Lezama, Jessica Lo, Cong Lu, Zeb Mehring, Alexandre Moufarek, Mark Murphy, Henna Nandwani, Valeria Oliveira, Joseph Ortiz, Fabio Pardo, Jane Park, Andrew Pierson, Ben Poole, Hang Qi, Helen Ran, Nilesh Ray, Tim Salimans, Manuel Sanchez, Igor Saprykin, Amy Shen, Sailesh Sidhwani, Duncan Smith, Joe Stanton, Hamish Tomlinson, Dimple Vijaykumar, Ruben Villegas, Luyu Wang, Will Whitney, Nat Wong, Rundi Wu, Keyang Xu, Minkai Xu, Nick Young, Yuan Zhong, Vadim Zubov.
Thanks to Tim Rocktäschel, Satinder Singh, Adrian Bolton, Inbar Mosseri, Aäron van den Oord, Douglas Eck, Dumitru Erhan, Raia Hadsell, Zoubin Gharamani, Koray Kavukcuoglu and Demis Hassabis for their insightful guidance and support throughout the research process.
Feature video was produced by Matthew Carey, Anoop Chaganty, Suz Chambers, Alex Chen, Jordan Griffith, Filip Havlena, Scotch Johnson, Randeep Katari, Hyeseung Kim, Kaloyan Kolev, Samuel Lawton, Cliff Lungaretti, Heysu Oh, Andrew Rhee, Shashwath Santosh, Arden Schager, JR Schmidt, Hana Tanimura, Khyati Trehan, Dev Valladares, Zach Velasco, Christopher Walker, Ben Wiley, Isabelle Wintaro, Jocelyn Zhao.
We thank Frederic Besse, Tim Harley and the rest of the SIMA team for access to a recent version of their agent.
Finally, we extend our gratitude to Mohammad Babaeizadeh, Gabe Barth-Maron, Parker Beak, Jenny Brennan, Tim Brooks, Max Cant, Harris Chan, Jeff Clune, Kaspar Daugaard, Dumitru Erhan, Ashley Feden, Simon Green, Nik Hemmings, Michael Huber, Jony Hudson, Dirichi Ike-Njoku, Hernan Moraldo, Bonnie Li, Simon Osindero, Georg Ostrovski, Ryan Poplin, Alex Rizkowsky, Giles Ruscoe, Ana Salazar, Guy Simmons, Jeff Stanway, Metin Toksoz-Exley, Xinchen Yan, Petko Yotov, Mingda Zhang and Martin Zlocha for their insights and support.