The company behind ChatGPT, OpenAI, has taken its AI game a notch higher by unveiling ‘Sora’ a generative AI that can take a short text description and turn it into a vividly realistic AI video clip.
OpenAI said Sora can generate 1080p movie-like scenes with multiple characters, different types of motion, and background details when given a brief or detailed description or a still image. Sora can also “extend” existing video clips — doing its best to fill in the missing details, the company claims.
With Sora, OpenAI is following giants like Google and Meta, which had launched video-generating tools before now.
Sora put to the test
While announcing the launch of the tool via a post on X, OpenAI’s CEO, Sam Altman requested his followers to put Sora to the test by giving a command.
- “We’d like to show you what Sora can do, please reply with captions for videos you’d like to see and we’ll start making some,” he said.
This elicited responses from X users providing different prompts from which several videos were created.
More insights
OpenAI in a blog post said it is teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.
- “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
- “The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately portrays characters and visual style,” OpenAI said.
Sora’s weaknesses
OpenAI, however, admitted that the current model of Sora has its weaknesses. According to the company, Sora may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.
- “The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory,” it said.
Sora is not currently available generally. OpenAI said it is currently granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.