Google has introduced another generative artificial intelligence (AI) model that can create endless numbers of 2D platformer video games. Genie is being touted as an action-controllable world model that was trained on unsupervised video game data. It uses predictive analysis to generate video game levels and can also control a playable character and determine its movements. Interestingly, OpenAI also introduced a world model earlier this month called Sora, which can generate hyperrealistic videos of up to one minute in length.

The announcement was made by Tim Rocktäschel, Open-Endedness Team Lead, Google DeepMind, via a series of posts on X (formerly known as Twitter). He said, “We introduce Genie, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.” Genie is unique in the aspect that it can only generate one specific thing, and it is also the only video game-generating model that has been publicly announced so far.

Google’s Genie AI model is not open to the public yet and only exists as a research model for now. This is why its user-centric functionalities are not known yet. It can generate video game levels using images, but whether it can take text prompts or even video prompts is not known. A preprint version of the paper was posted online which highlights its technical aspects. The AI model was trained unsupervised on 2,00,000 hours of video game footage and contains 11 billion parameters. The architecture of the model uses three different parts — a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.

How Google Genie Works

To simplify, the spatiotemporal video tokenizer takes video game footage, breaks it down into smaller chunks of datasets, known as tokens, that can be consumed by the foundation model. Spatiotemporal explains that the data is broken down both in time and space (For example, a video was broken down into 2-second clips, but each frame was also broken down into multiple pieces).

The autoregressive dynamic model comes next. Autoregressive models essentially predict the future based on how something has performed in the past, and a dynamic model is responsible for understanding how things change and move over time. So this part is where the predictive analysis begins. The final component is the latent action model. This is where the AI understands how the playable character moves and traverses in the video game world.

“Genie’s learned latent action space is not just diverse and consistent, but also interpretable. After a few turns, humans generally figure out a mapping to semantically meaningful actions (like going left, right, jumping etc.),” said Rocktäschel. This part is important because it highlights that the main problem this AI model solves is not just generating 2D video game levels, but also understanding how basic movements occur, and how that information can be used to navigate real-world terrains.

Highlighting this, he added, “Genie’s model is general and not constrained to 2D. We also train a Genie on robotics data (RT-1) without actions, and demonstrate that we can learn an action controllable simulator there too. We think this is a promising step towards general world models for AGI.”

For details of the latest launches and news from Samsung, Xiaomi, Realme, OnePlus, Oppo and other companies at the Mobile World Congress in Barcelona, visit our MWC 2024 hub.

LEAVE A REPLY

Please enter your comment!
Please enter your name here