A Team of Researchers Working on China’s Text-to-Video Model
A team of researchers is making a fresh push to develop China’s answer to OpenAI’s text-to-video model Sora, in the latest sign of the country’s generative artificial intelligence (AI) frenzy.
Open-Sora Plan Launched by Peking University and Rabbitpre
Professors from China’s Peking University and Shenzhen-based AI company Rabbitpre on March 1 jointly launched an Open-Sora plan with a page on GitHub, with a mission to “reproduce OpenAI’s video generation model”.
The Open-Sora plan aims to reproduce a “simple and scalable” version of OpenAI’s video generation model with help from the open-source community. OpenAI started a global AI frenzy in late 2023 with the launch of its ChatGPT generative chat bot.
Recent Developments in AI
According to the project’s GitHub page, the team has developed a three-part framework and showcased four demos of reconstructed videos in varying resolutions and aspect ratios, ranging from three seconds to 24 seconds.
The team’s further tasks include fine-tuning the technology to generate higher resolution as well as training with more data and more graphics processing units (GPUs).
Since OpenAI released demo videos generated by Sora earlier in February, Chinese business and technology communities have expressed mixed feelings about Microsoft-backed OpenAI’s latest progress.
While some companies have shown strong interest in using the text-to-video AI model, others have expressed concerns about China’s ability to compete in this area. The US continues to tighten trade restrictions on the export of advanced chips of US origin and related technology to China.
Tencent AI in January released an open-source video generation and editing toolbox called VideoCrafter2, which is capable of generating videos from text. This is an updated version of VideoCrafter1, released in October 2023, but is limited to videos lasting two seconds only.
Almost around the same time, ByteDance released the MagicVideo-V2 text-to-video model. According to the project’s GitHub page, it combines a “text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline”.
ModelScope, from the Damo Vision Intelligence Lab under Alibaba Group Holding, owner of the South China Morning Post, has launched a text-to-video generation model. It currently only supports English input and video output is limited to two seconds.
The Team Behind Open-Sora
The Open-Sora plan was launched by “Rabbitpre AIGC Joint Lab”, which is a collaboration between Peking University Shenzhen Graduate School and Rabbitpre, founded in June 2023. The joint lab is dedicated to research in the field of AI-produced content.
The Open-Sora project lists 13 members as its initial team. This includes assistant professor Yuan Li from PKU’s School of Electrical and Computer Engineering, and professor Tian Yonghong from the School of Computer Science. The list also includes Rabbitpre’s founder and CEO, Dong Shaoling, and the company’s chief technology officer, Zhou Xing.