On Monday, Google’s AI venture DeepMind showcased a live demo of Genie, a generative AI model that can create playable games from a simple prompt after learning game mechanics from hundreds of thousands of gameplay videos.
Developed in a collaboration between Google and the University of British Columbia, Genie—short for Generative Interactive Environments—can create side-scrolling 2D platformer games based on user prompts like Super Mario Brothers and Contra using a single image.
“The last few years have seen an emergence of generative AI, with models capable of generating novel and creative content via language, images, and even videos,” Google DeepMind said. “Today, we introduce a new paradigm for generative AI, generative interactive environments: Genie.”
Genie can create interactive, playable environments from a single image prompt thanks to what Google researchers describe as a latent action model that infers the actions between video frames, a video tokenizer that converts raw video frames into discrete tokens, and a dynamic model that determines the next frame.
“Rather than adding inductive biases, we focus on scale,” Google DeepMind developer Tim Rocktäschel said on Twitter. “We use a dataset of over 200k hours of videos from 2D platformers and train an 11B world model… [then] in an unsupervised way, Genie learns diverse latent actions that control characters in a consistent manner.”
Genie, Rocktäschel continued, can also convert other media types into games. In the accompanying Google DeepMind research paper, Genie can be prompted to generate a variety of action-controllable virtual worlds from a variety of inputs.
“Our model can convert any image into a playable 2D world,” Rocktäsche said. “Genie can bring to life human-designed creations such as sketches, for example, beautiful artwork from Seneca and Caspian, two of the youngest ever world creators.”
While Genie is proficient at creating 2D worlds from text or images, Rocktäschel showed that the AI model can do more than build side-scrollers—including the potential to teach other AI models or “agents” about 3D worlds.
“We also train a Genie on robotics data (RT-1) without actions and demonstrate that we can learn an action controllable simulator there too,” he said. “We think this is a promising step towards general world models for AGI.”
Also known as the singularity, artificial general intelligence (AGI) refers to an AI that can understand and apply learned knowledge across a wide range of tasks, much like a human.
Google DeepMind said the dataset from Genie was generated by filtering publicly available internet videos, specifically those that included titles like “dpeedrun” or “playthrough,” while excluding words like “movie” or “unboxing.”
Advances in AI technology, hardware, and datasets, Google DeepMind said, have led to the ability to create coherent, conversational language and “crisp and aesthetically pleasing” images.
“When selecting keywords, we manually spot checked results to check that they typically produced 2D platformer gameplay videos which are not outnumbered by other sorts of videos which happen to share similar keywords,” the researchers continued.
“With Genie, our future AI agents can be trained in a never-ending curriculum of new, generated worlds,” Google DeepMind said. “In our paper, we have a proof of concept that the latent actions learned by Genie can transfer to real human-designed environments, but this is just scratching the surface of what may be possible in the future.”
Thanks in no small part to the launch of OpenAI’s GPT-4 last year, technology companies—including Google, Microsoft, and Amazon—have invested heavily in generative AI. Earlier this month, Google announced the launch of a subscription-based version of its Gemini AI model after rebranding from Google Bard.
Representatives from Google nor its DeepMind program did not immediately respond to a request for comment from Decrypt.
Edited by Ryan Ozawa.