Sora: OpenAI’s first video model wows, reshaping the video sector?

The Next Wave of Artificial Intelligence Disruption is Here

After the ChatGPT large language model, OpenAI has introduced a new massive model for generating videos called Sora.

Over the past year, AI has successfully conquered the fields of text and images, but the video sector has lagged behind, despite some progress being made.

Starting now, it seems that the phrase “seeing is believing” might become a thing of the past.

Sam Altman, the founder and CEO of OpenAI, has been creating a buzz on the international social platform X by showcasing the stunning video effects generated by Sora. This has caused quite a stir among netizens, with one enthusiast on Weibo exclaiming: “The quality and camera work of these generated videos just blew my mind…”

From Text to Hyper-Realistic Videos

Sora is capable of producing videos up to 60 seconds long, featuring highly detailed backgrounds, complex multi-angle shots, and emotionally engaging characters.

For example, given a prompt like “A stylish lady strolling through the Tokyo streets illuminated by warm neon lights and vibrant city signs”, Sora can generate a video showing the lady in a black leather jacket and red skirt walking on a neon-lit street. The video includes smooth transitions and multiple shots, from a wide street view to close-ups of the lady’s facial expressions, capturing the reflections of neon lights on the wet pavement.

Screenshot of a video generated by Sora, Image Source: OpenAI official website

Input prompt: “Reflections outside the train window as it passes through the outskirts of Tokyo”.

You can even have a snippet resembling a Hollywood blockbuster trailer:

A 60-second single-shot video like this undoubtedly challenges people’s perceptions of AI’s video creation capabilities. While other AI tools like Runway Gen 2 and Pika struggle with maintaining coherence for 4 seconds, OpenAI’s ability to produce stable 60-second HD videos sets an epic milestone.

Some engineers humorously remark, “I might just lose my job…”

Not Without Flaws

Despite OpenAI having cutting-edge technology, they admit that the model isn’t flawless. They mention:

Sora may struggle to accurately simulate the physical principles of complex scenes and might not understand specific instances of causality. For example, a person might take a bite of a cookie, but then the cookie might disappear. The model could also confuse spatial details of the prompt, like left and right, and struggle to accurately describe events unfolding over time, such as following a specific camera trajectory.”

OpenAI positions Sora as a research preview, refraining from disclosing the data used to train the model (lacking around 10,000 hours of “high-quality” videos) and keeping Sora from widespread use due to the potential for misuse. OpenAI rightly points out that malicious actors could exploit models like Sora in various ways.

They state that they are collaborating with experts to explore vulnerabilities in the model and building tools to detect whether videos are generated by Sora.

The company also mentions that if the model is integrated into a public-facing product, they will ensure that the generated outputs contain metadata sources.

OpenAI writes, “We will collaborate with policymakers, educators, and artists worldwide to understand their concerns and identify positive use cases for this new technology… Even with extensive research and testing, we cannot predict all the beneficial ways people will use our technology nor all the ways it might be misused.”

They believe that learning from real-world usage is essential in creating and releasing more advanced technology, ensuring that AI systems become safer over time.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.