AI News Bureau

China’s Vidu Emerges as Sora Challenger in Text-to-Video AI Race

China's ShengShu-AI and Tsinghua University unveiled Vidu, a text-to-video AI model challenging OpenAI's Sora.

Written by: CDO Magazine Bureau

Updated 3:00 AM UTC, Sat May 4, 2024

Screengrab from video generated by Vidu.

Chinese tech firm ShengShu-AI, in collaboration with Tsinghua University, has announced the launch of “Vidu,” a text-to-video AI model, at the Zhongguancun Forum in Beijing.

The unveiling of Vidu marks a significant stride in China’s rapid advancement within the critical AI landscape, rivaling the capabilities of OpenAI’s Sora.

Vidu, capable of generating a 16-second 1080p video clip with a click, is built on the innovative Universal Vision Transformer (U-ViT) architecture. This architecture integrates two state-of-the-art text-to-video AI models: the Diffusion and the Transformer, reports The Global Times.

Zhu Jun, Chief Scientist at Shengshu and Deputy Dean at Tsinghua’s Institute for AI, highlighted Vidu’s revolutionary features during its debut. “Vidu is the latest achievement of self-reliant innovation, with breakthroughs in many areas,” said Jun.

He added that Vidu is imaginative, stimulates the physical world, makes videos with consistent characters, scenes, and timelines, and is able to comprehend “Chinese elements.”

Media reports highlight Vidu’s deep understanding of Chinese factors, enabling it to generate images of culturally significant characters like pandas and dragons.

Inevitably, comparisons with Sora arise — Vidu demonstrates its prowess by not only matching but surpassing Sora in certain aspects, particularly in the temporal consistency of video scenes.

Vidu arrives on the scene merely two months after Sora’s release, showcasing China’s agility in responding to global AI advancements.

Also Read

OpenAI Launches Sora, New Text-to-Video Model

“After the release of Sora, we found that it closely aligned with our technical roadmap, which further motivated us to advance our research with determination,” Jun said at the forum.

During a live demonstration, Vidu exhibited its capability to simulate real-world physics and generate scenes with intricate details, such as realistic light and shadow effects and delicate facial expressions.

Founded in March 2023, Beijing-based Shengshu Technology boasts a core team primarily composed of members from Tsinghua’s Institute for AI, alongside personnel from tech giants Alibaba Group Holding, Tencent Holdings, and ByteDance. Shengshu recently secured substantial investment from notable firms, including Qiming Ventures, Zhipu AI, and Baidu Ventures.