Home Ai Tech Witness the Astonishing Results of OpenAI’s Text-to-Video Model Unveiling

Ai Tech

Witness the Astonishing Results of OpenAI’s Text-to-Video Model Unveiling

02/20/2024

236

OpenAI video gnerator still frame — Still frame from a video generated by Sora. OpenAI’s prompt was, “The camera directly faces colorful buildings in burano italy. An adorable dalmation looks through a window on a building on the ground floor. Many people are walking and cycling along the canal streets in front of the buildings.”

OpenAI

Open AI already has market-leading AI models in image and text generation with DALL-E 3 and ChatGPT, respectively. Now, the company is coming for the text-to-video generation space, too, with a brand-new model.

Also: The best AI image generators of 2024: Tested and reviewed

On Thursday, OpenAI unveiled Sora, its text-to-video model that can generate videos up to a minute long with impressive quality and detail, as seen in the demo video below:

Sora can tackle complex scenes, including multiple characters, specific types of motion, and great detail, because of the model’s deep understanding of language, prompts, and how the subjects exist in the world, according to OpenAI.

From watching different demo videos, you can see that OpenAI has managed to tackle two big issues in the video-generating space: continuity and longevity:

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq

— OpenAI (@OpenAI) February 15, 2024

AI-generated videos are often choppy and distorted, making it clear to the audience where every frame ends and begins. For example, Runaway AI released its most advanced text-to-video model, Gen-2, in March. As seen below, the clips don’t quite compare to those of OpenAI’s model today:

OpenAI’s model, on the other hand, can generate fluid video, making each generated clip look like it was lifted from a Hollywood-produced film.

Also: How to use ChatGPT

OpenAI says Sora is a diffusion model that’s able to produce high-quality output by using a transformer architecture similar to the GPT models, as well as past research from DALL-E and GPT models. In addition to generating video from text, Sora can generate video from a still image or fill in missing frames from videos:

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB

— OpenAI (@OpenAI) February 15, 2024

Despite showing all of its advancements, OpenAI also addresses the model’s weaknesses, claiming it can sometimes struggle with “simulating the physics of a complex scene, and may not understand specific instances of cause and effect.” The model could also confuse the spatial details of a prompt.

The model is becoming available to red teamers first to asses the model’s risks, and to a select number of creatives, such as visual artists, designers, and filmmakers, to collect feedback on how to improve the model to meet their needs.

Also: I tried Microsoft Copilot’s new AI image-generating feature, and it solves a real problem

It seems like we are entering a new era in which companies will shift focus to researching, developing, and launching capable AI text-to-video generators. Just two weeks ago, Google Research published a research paper on Lumiere, a text-to-video diffusion model that can also create highly realistic video.

Witness the Astonishing Results of OpenAI’s Text-to-Video Model Unveiling

LEAVE A REPLY Cancel reply

Follow us on Instagram @ainewsera

POPULAR POSTS

Google is increasing the features and availability of its AI-powered search.

Google’s new AI model Gemini: What you need to know

Adding a New Credential to Your LinkedIn Profile: A Step-by-Step Guide...

POPULAR CATEGORY