OpenAI, a San Francisco-based artificial intelligence company, has unveiled a new AI tool called Sora that can generate highly realistic 60-second videos.
Sora is a generative AI model that creates videos from textual prompts. It interprets a user’s prompt, expands it into a more detailed set of instructions, and then uses an AI model trained on video and images to create the new video.
The model is capable of creating complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.
Sora can also create multiple shots within a single-generated video that accurately reflect characters and visual styles.
The current model has weaknesses, however, and may struggle with accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect.
The Implications of Sora’s Realistic Video Generation Capabilities
Sora is a diffusion model that generates a video by starting off with one that looks like static noise and gradually transforming it by removing the noise step by step.
Its starting point is a video looking like static noise, which is then gradually transformed into the final result by removing the noise step by step.
By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios.
OpenAI highlights a challenging problem they solved in Sora, namely keeping the subject the same even when it goes out of view temporarily and preserving the visual style, by letting the model operate on many frames at a time, which gives it some ability to know what will happen in advance and plan for it.
Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.
OpenAI showed several impressive videos created using Sora, including historical footage of California during the gold rush, a stylish woman walking down a Tokyo street, golden retrievers playing in the snow, and others.
However, some generated videos may show physically implausible motion, as shown in a video showing a man walking on a conveyor belt in the wrong direction or another where sand morphs into a chair and displays counter-intuitive motion.
Read More: Open AI CEO Sam Altman in a Move to Raise Billions for Network of AI Chip Factories
Unleashing the Art of Deception: OpenAI’s Sora Model and the Rise of AI-Generated Deepfakes
OpenAI is working with red teamers—domain experts in areas like misinformation, hateful content, and bias—who will be actually testing the model.
OpenAI is also building tools to help detect misleading content, such as a detection classifier that can tell when a video was generated by AI. Sora will not be released to the public as it undergoes safety testing.
OpenAI is granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.
Sora is not the first text-to-video generation AI model to enter the market. Other solutions include Runway, Pika, Stability AI, Google Lumiere, and others.
However, AI experts and analysts said the length and quality of the Sora videos went beyond what has been seen up to now.
The quality of AI-generated images, audio, and video has rapidly increased over the past year, with companies like OpenAI, Google, Meta, and Stable Diffusion racing to make more capable tools and find ways to sell them.
The Potential Impact of Generative AI on the Future of Entertainment Jobs
The entertainment industry is grappling with AI, and OpenAI’s new tool has the potential to displace huge swaths of labor.
The system can seemingly produce videos of complex scenes with multiple characters, an array of different types of shots, and mostly accurate details of subjects in relation to their backgrounds.
Over the next three years, nearly 204,000 positions will be adversely affected, according to a study surveying 300 leaders across Hollywood, issued in January.
Sound engineers, voice actors, and concept artists stood at the forefront of that displacement, according to the study. Visual effects and other post-production work were also cited as particularly vulnerable.
Final Thoughts
OpenAI’s Sora is a new AI tool that can generate highly realistic 60-second videos. The model is capable of creating complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.
However, the current model has weaknesses, may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect.
Sora is not the first text-to-video generation AI model to enter the market, but the length and quality of the Sora videos went beyond what has been seen up to now.
The entertainment industry is grappling with AI, and OpenAI’s new tool has the potential to displace huge swaths of labor.