Artificial Intelligence (AI) is transforming the realms of imagination and reality alike. The recent introduction of Sora by OpenAI, a text-video AI generator, is a testament to this.
The latest in this line of innovation is ‘Genie’, an interactive 2D video game creation model unveiled by Google’s DeepMind team. Google Genie is an AI platform that can generate video games from a single image prompt or text description.
This project, developed by Google DeepMind’s Open-Endedness Team, has the potential to revolutionise entertainment, game development, and even robotics. The ‘world model’ Genie is trained on a large dataset of 200,000 hours of unlabelled video footage, primarily from 2D platformer games. Unlike traditional AI models, Genie learns from the actions and interactions within these videos.
Genie comprises three core components: the Video Tokenizer, the Latent Action Model, and the Dynamics Model. The Video Tokenizer processes video data into manageable units, or ‘tokens’. The Latent Action Model analyses transitions between consecutive frames in the videos, identifying eight fundamental actions.
The Dynamics Model predicts the next frame in the video sequence, taking into account the current state of the game world and generating the subsequent visual result. This process creates the illusion of an interactive game experience.
Notably, Genie is still under development and comes with limitations including:
- Limited visual quality: Currently, Genie can only generate games at a low frame rate (1FPS), impacting the visual fidelity.
- Research-only access: As of now, Genie is not available for public use and remains a research project within Google DeepMind.
- Ethical considerations: As with any powerful technology, the potential misuse of Genie needs careful consideration. Google is working on the ethical aspects to ensure responsible development and implementation.
However, once the Genie is released, it is expected to revolutionise creativity across numerous domains. Its ability to generate interactive worlds from minimal input will open doors for exciting possibilities in the future of entertainment, education, and beyond.