Google unveils revolutionary AI model creating video games from text, images: Here’s how it works

0
568
Google announces first-ever AI model that can create video games from text, images; here’s how it works - Times of India

Google DeepMind Introduces Genie: The AI Model that Creates Virtual Worlds

What is Google Genie and how it works

Google DeepMind researchers have announced a new artificial intelligence model that can generate virtual worlds with a text or image prompt. The latest model, named Genie, will allow users to interact and play with the virtual worlds it creates. The tech giant claims that this model was trained on gameplay and other videos found online and is currently only a research preview. The games created by the latest AI model also look to be designed only for 2D platforms.

In an official blog post, Google DeepMind notes that the model can “generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.”

The research paper ‘Genie: Generative Interactive Environments’ states that Genie is the first generative interactive model that has been trained in an unsupervised manner from unlabelled internet videos.

When it comes to size, Genie stands at 11 billion parameters. The model also includes a spatiotemporal video tokeniser, an autoregressive dynamics model and a simple as well as scalable latent action model.

These specs allow Genie to act in generated environments on a frame-by-frame basis even when training, labels, or any other domain-specific requirements are missing.

Despite being trained on video-only data, Genie can be prompted to generate a diverse set of interactive and controllable environments. Unlike numerous generative AI models that can produce creative content with language, images and even videos, Genie will be able to make playable environments from a single image prompt.

Google DeepMind researchers also claim that Genie can be prompted with images it has never seen including real world photographs and sketches. This will allow people to interact with their imagined virtual worlds. This is also known as a foundation world model.

The research paper also highlights that the model has been trained to focus more on videos of 2D platform games and robotics. Google Genie is trained on a general method which allows it to function on any type of domain and is scalable to even larger Internet datasets.

Genie also has the ability to learn and reproduce controls for in-game characters exclusively from internet videos. This is important as internet videos do not have labels about the action that is performed in the video, or even which part of the image has to be controlled.