OpenAI Launches Sora, New Text-to-Video Model

Sora can generate one-minute videos while maintaining constant visual quality and fidelity to user prompts. The model employs a diffusion approach, transforming static noise into coherent videos over multiple steps.
A screengrab from a video generated by Sora.
A screengrab from a video generated by Sora.

After dabbling into text-to-text models like DALL.E 3, OpenAI's latest venture, Sora, introduces a text-to-video model aimed at advancing AI's understanding and simulation of real-world scenarios. The primary objective is to train models capable of resolving problems requiring interaction with the physical world.

Sora’s features

Sora can generate one-minute videos while maintaining constant visual quality and fidelity to user prompts. The model employs a diffusion approach, transforming static noise into coherent videos over multiple steps. This development aligns with OpenAI's broader mission of nurturing AI systems that comprehend and replicate the complexities of the physical environment.

A screengrab from a video generated by Sora.
A screengrab from a video generated by Sora.

The model tackles diverse prompts, from a stylish woman strolling through a neon-filled Tokyo street to woolly mammoths traversing a snowy meadow. These examples showcase Sora's proficiency in creating intricate scenes, incorporating specific character attributes, varied motion types, and detailed background elements.

However, Sora does grapple with certain weaknesses. Notably, it may struggle with precise physics simulation, leading to anomalies like unnatural object morphing and spontaneous appearances of entities in scenes. Additionally, the model faces challenges in comprehending cause-and-effect relationships and spatial details, occasionally confusing left and right orientations.

Sora safety measures

OpenAI places a strong emphasis on safety measures surrounding Sora's deployment. The company is already facing multiple lawsuits in this space.

Collaboration with domain experts, known as red teamers, is integral to adversarially testing the model, particularly in areas like misinformation, hateful content, and bias. Tools are being developed to detect misleading content, including a classifier to identify videos generated by Sora. The future deployment may also include C2PA metadata, enhancing transparency.

Building on safety methods established for previous models like DALL·E 3, OpenAI integrates a text classifier to reject prompts violating usage policies, ensuring responsible and ethical use. The company plans to engage with policymakers, educators, and artists to address concerns and identify positive applications for this technology.

“Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI,” said an official OpenAI blog post.

Related Stories

No stories found.
CDO Magazine
www.cdomagazine.tech