OpenAI’s premier video generator Sora shocked the world on Thursday with worryingly realistic videos, but the model seems even better at creating video game worlds. Sora has an uncanny ability to recreate Minecraft and “simulate digital worlds,” according to a technical paper from OpenAI published late last night, first reported by TechCrunch.
“Sora is also able to simulate artificial processes–one example is video games,” says OpenAI in the paper. “Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning Minecraft.”
OpenAI’s advanced Sora model has the potential to disrupt the video game world, and the reason is because it’s fundamentally different from other AI video generators. Nvidia Senior Researcher Dr. Jim Fan notes this is because Sora is more like a “data-driven physics engine” than an image generator. Sora performs thousands of calculations to predict how an object interacts with its environment. This creates a “world model,” according to Fan, which makes it perfect for generating video games.
OpenAI states these are just early tests, but they show great promise for AI simulators of physical and digital worlds. The company notes several limitations, including that Sora does not accurately model the physics of many basic interactions. This has resulted in some very strange videos from Sora, and these quirks surely need to be worked out before the model creates any video game.
However, Sora has already solved several issues that other video generators can’t do. Sora has shown successful “object permanence,” meaning an object can leave the frame and come back in the same place. Sora also has much better dynamic camera motion than other video generators.
Some have speculated that OpenAI’s Sora was trained on a video game engine, specifically, the Unreal Engine 5 (UE5) from Epic Games. While Sora almost definitely doesn’t use a video game engine to create its mesmerizing senses, it’s possible digital worlds were used to help train Sora’s underlying model. OpenAI has not confirmed these rumors, but UE5 was used to create games like Fortnite, Remnant 2, and Tekken 8. Certain Sora demos do look oddly similar to existing video game worlds.
The question stands: what was Sora trained on? OpenAI is facing a lawsuit for training GPT-2 and GPT-3 on The New York Times articles, without payment. Sora will likely disrupt video games in a similar way it has affected journalism, so proper attribution will be a key factor moving forward.
We don’t know what GPT model was used to build Sora, and OpenAI hasn’t openly released what data it trains GPT-4 with. However, GPT-2 was largely trained on OpenAI’s WebText dataset. WebText scraped Netflix over 42,000 times according to data made public on GitHub. Crunchyroll, Hulu, and “YouTube doubler” were also mentioned as video formats used to train OpenAI’s model. However, Sora likely required more data than just this.
Sora could spell disaster for the world of video game developers, but it also could significantly reduce the barrier to entry. Game developers have already been hammered by layoffs in the last year. Regardless, this AI video generator will likely disrupt the gaming world, much like AI has changed every other field it’s touched.