
Google DeepMind is preparing to release an artificial intelligence model designed to simulate interactive 3D environments, providing a training ground for AI agents such as robots and autonomous systems. Called Genie 3, the AI model enables these agents to learn how to perform tasks in realistic, physics-driven virtual settings.
Step closer to artificial general intelligence
Google’s AI division describes the so-called “world model” as a “stepping stone on the path to AGI,” artificial general intelligence, where the system possesses human-level cognitive abilities. To achieve this, AI agents must be trained in simulations that follow the laws of physics.
But Genie 3 won’t just be useful for training robots learning how to perform logistical tasks in controlled settings like warehouses — it could also provide immersive simulations for people such as experiencing base jumping from the safety of the ground, or practising a mountain rescue mission where errors won’t cost lives.
Longer runtime and spatial memory
Genie 3 can generate environments at 24 frames per second in 720p resolution for several minutes at a time. It uses an auto-regressive approach, where each frame is produced sequentially based on the previous ones, but this can be problematic as errors accumulate and degrade the video quality. Nevertheless, the model is able to maintain a simulation for multiple minutes and remember up to a minute of its output, allowing those in the environment to revisit locations.
Genie 3 builds on its predecessor, Genie 2, which could only generate an environment for up to 20 seconds and did not allow for real-time interaction. The model also incorporates techniques from Google DeepMind’s video generator Veo 3, including its “deep understanding of intuitive physics.”
Tested with SIMA and controllable by prompt
Google tested Genie 3 with its general-purpose SIMA agent, giving a set of goals to accomplish by navigating around a simulated world. “Genie 3 is not aware of the agent’s goal. Instead it simulates the future based on the agent’s actions,” Google said.
The new AI model needs only a text prompt to launch a simulation, and once it’s running, additional prompts can modify the environment in real-time, such as adding a waterfall onto a cliffside scene.
A separate AI tool called Mirage was released last month that transforms live video in real-time, also using auto-regression and text prompts.
Limitations remain ahead of full release
Google has not released Genie 3 yet because it still has several crucial limitations. Primarily, the AI model cannot support more than a few minutes of interactions, which is not enough to be meaningfully useful to agents or people.
Furthermore, while the environments can simulate a wide range of events, agents are not yet able to respond appropriately to all of them, nor to each other. Genie 3 is also not able to render real-world locations with perfect geographic accuracy, or legible text that hasn’t been explicitly spelt out in the prompt.
“We believe Genie 3 is a significant moment for world models, where they will begin to have an impact on many areas of both AI research and generative media,” Google said. “To that end, we’re exploring how we can make Genie 3 available to additional testers in the future.”
Google recently made Deep Think, a multi-agent AI model from the Google DeepMind team, available via the Gemini app.